This is an old revision of the document!


In order to boot the HPC VMs the system administrator must run the following commands

vm start thin-01 # start the vm thin-01
ping thin-01 # make sure that you get a response back before starting a different VM

vm start thin-02
ping thin-02

#optionally, if you are not scared

vm start thick-01
ping thick-01

Finally, after all the HPC VMs have booted, you may boot the login nodes

vm start ssh-01
ping ssh-01

vm start ssh-02
ping ssh-02

There might be other VMs for other users that you might need to run

vm start rshiny0
ping rshiny-0

vm start rshiny1
ping rshiny-1

here's how the output of vm list would look like

root@geonosis:~ # vm list
NAME          DATASTORE  LOADER  CPU  MEMORY  VNC             AUTO  STATE
comp0         default    uefi    32   32G     -               No    Stopped
dl-01         default    uefi    2    8G      -               No    Stopped
dl-02         default    uefi    2    8G      -               No    Stopped
dna0          default    uefi    8    32G     -               No    Stopped
mitte-dev-01  default    uefi    2    4G      -               No    Stopped
rshiny0       default    uefi    4    16G     127.0.0.1:5900  No    Running (90735)
rshiny1       default    uefi    4    16G     127.0.0.1:5901  No    Running (6542)
ssh-01        default    uefi    2    4G      -               No    Running (65113)
ssh-02        default    uefi    4    8G      -               No    Running (29385)
thick-01      default    uefi    64   768G    -               No    Locked (geonosis.local.abi.am)
thin-01       default    uefi    64   384G    -               No    Running (93403)
thin-02       default    uefi    64   384G    -               No    Running (69252)

Finally, you will need to update the SLURM configs.

ssh scontrol-01 -l ops
# become root
su -

attach to tmux session that's running SLURM

the following command will show the issues

sinfo -R

you can “idle” the nodes after they have been booted

scontrol update State=idle NodeName=thin-01

note that this command is idempotent and will run once between reboots (unless the state gets corrupted somehow)

finally run the fancy command to see what's happening

clear; (sinfo -R; sinfo -N -o "%15N %9P %8a %5c %10m %20C %10e" ; squeue -o "%.6i %.10P %.10j %.15u %.10t %.10M %.10D %.20R %.3C %.10m") | column -t

to detach from tmux in tmux, you need to run prefix + prefix, in this case: ctrl-b, ctrl-b, d

never ever shut down the tmux session in scontrol. tmux is running in the foreground. you may exit now