In order to boot the HPC VMs the system administrator must run the following commands
vm start thin-01 # start the vm thin-01
ping thin-01 # make sure that you get a response back before starting a different VM
vm start thin-02
ping thin-02
#optionally, if you are not scared
vm start thick-01
ping thick-01
Finally, after all the HPC VMs have booted, you may boot the login nodes
vm start ssh-01
ping ssh-01
vm start ssh-02
ping ssh-02
There might be other VMs for other users that you might need to run
vm start rshiny0
ping rshiny-0
vm start rshiny1
ping rshiny-1
here's how the output of vm list would look like
root@geonosis:~ # vm list
NAME DATASTORE LOADER CPU MEMORY VNC AUTO STATE
comp0 default uefi 32 32G - No Stopped
dl-01 default uefi 2 8G - No Stopped
dl-02 default uefi 2 8G - No Stopped
dna0 default uefi 8 32G - No Stopped
mitte-dev-01 default uefi 2 4G - No Stopped
rshiny0 default uefi 4 16G 127.0.0.1:5900 No Running (90735)
rshiny1 default uefi 4 16G 127.0.0.1:5901 No Running (6542)
ssh-01 default uefi 2 4G - No Running (65113)
ssh-02 default uefi 4 8G - No Running (29385)
thick-01 default uefi 64 768G - No Locked (geonosis.local.abi.am)
thin-01 default uefi 64 384G - No Running (93403)
thin-02 default uefi 64 384G - No Running (69252)
Finally, you will need to update the SLURM configs.
ssh scontrol-01 -l ops
# become root
su -
attach to tmux session that's running SLURM
the following command will show the issues
sinfo -R
you can "idle" the nodes after they have been booted
scontrol update State=idle NodeName=thin-01
note that this command is idempotent and will run once between reboots (unless the state gets corrupted somehow)
finally run the fancy command to see what's happening
clear; (sinfo -R; sinfo -N -o "%15N %9P %8a %5c %10m %20C %10e" ; squeue -o "%.6i %.10P %.10j %.15u %.10t %.10M %.10D %.20R %.3C %.10m") | column -t
to detach from tmux in tmux, you need to run prefix + prefix, in this case: ctrl-b, ctrl-b, d
never ever shut down the tmux session in scontrol. tmux is running in the foreground. you may exit now