Cluster Basics

Cluster Basics

This page describes ABI's computing infrastructure at a level suitable for researchers. For detailed system administration documentation, see Infrastructure.

What is an HPC Cluster?

A High-Performance Computing (HPC) cluster is a collection of interconnected computers (called nodes) that work together to run computationally intensive tasks. Instead of running everything on your laptop, you submit jobs to the cluster, which distributes them across available resources.

Key concepts:

Term	Meaning
Node	A single server/computer in the cluster
Login node	The server you SSH into. Used for file management and job submission – not for heavy computation
Compute node	Servers dedicated to running jobs. Jobs are dispatched here by Slurm
Partition	A group of nodes with shared properties (e.g., memory size, GPU availability). Also called a “queue”
Job	A task you submit to run on a compute node
Slurm	The job scheduler that manages the queue and assigns resources

ABI Cluster Overview

Component	Details
Login node(s)	`ssh.abi.am` (resolves to VMs `ssh-01` and `ssh-02`)
Compute nodes	`thin-01` (64C/384G), `thin-02` (64C/384G), `thick-01` (64C/768G)
Download nodes	`dl-01` (2C/8G), `dl-02` (2C/8G)
Total compute vCPUs	192
Total compute RAM	1536G
Scheduler	Slurm (controller runs on a separate VM)
Virtualization	All nodes are bhyve VMs running on a FreeBSD physical host

Partitions

Partitions define groups of compute resources. When you submit a job, you can specify which partition to use.

Partition	Nodes	CPUs	Total Memory	Default?	Purpose
`compute`	thin-01, thin-02, thick-01	64 per node	384G-768G	Yes	General purpose computation (default partition)
`thin`	thin-01, thin-02	64 per node	~384G each	No	Jobs that fit in standard memory
`thick`	thick-01	64	~768G	No	Memory-intensive jobs (e.g., large genome assembly, pilon)
`download`	dl-01, dl-02	2 per node	~8G each	No	Data download tasks only (not for computation)

Notes:

The compute partition is the default. If you do not specify --partition, your job goes here.
Use thick explicitly when you need more than ~384G of RAM (e.g., --partition=thick --mem=512G).
Use download only for downloading data (e.g., SRA downloads). These nodes have minimal CPU and memory.
Nodes may appear in multiple partitions (e.g., thick-01 is in both compute and thick).

To see current partition and node status:

sinfo

For a detailed view including memory and CPU allocation:

sinfo -N -o "%.10N %.10P %.5a %.4c %.20m %.20F %.10e"

Current cluster state (for reference):

NODELIST  PARTITION     CPUS  MEMORY     PURPOSE
dl-01     download       2     ~8G       Data downloads only
dl-02     download       2     ~8G       Data downloads only
thick-01  compute/thick  64   ~768G      High-memory computation
thin-01   compute/thin   64   ~384G      General computation
thin-02   compute/thin   64   ~384G      General computation

Storage

ABI has several storage areas. Understanding them is important for organizing your work and avoiding issues.

Storage is served from two ZFS-based NAS servers over NFS. ZFS provides transparent compression, so you do not need to manually compress old files – the filesystem handles it automatically. Home directories and selected projects are backed up to a separate server using ZFS send/recv.

Path	Purpose	Served from	Quota	Notes
`/mnt/home/<user>`	Home directory – configs, scripts	mustafar (nas1)	~12G per user	Keep this small; use project/user dirs for data
`/mnt/nas0/user/<user>`	Personal user workspace	geonosis (nas0)	~100G per user	For personal datasets, experiments, conda envs
`/mnt/nas0/proj/<project>`	Project data (some projects)	geonosis (nas0)	Per-project	TODO: clarify which projects are on nas0 vs nas1
`/mnt/nas1/proj/<project>`	Project data (most projects)	mustafar (nas1)	Per-project (typically 14-25 TB)	Shared with all project members
`/mnt/nas1/db/`	Shared databases and reference genomes	mustafar (nas1)	~32 TB total	Read-only for users. See Databases

Example current usage:

/mnt/home/<user>           ~12G quota    (personal configs, scripts)
/mnt/nas0/user/<user>     ~100G quota    (personal workspace)
/mnt/nas1/proj/armwgs      ~25 TB        (Armenian WGS project)
/mnt/nas1/proj/cfdna       ~14 TB        (cfDNA project)
/mnt/nas1/db/              ~32 TB        (reference genomes, indexes)

Best practices

Do not store large data in your home directory. Home has a ~12G quota. Use /mnt/nas0/user/<user> for personal data or /mnt/nas1/proj/<project> for project data.
Do not run jobs from your home directory if they produce many output files. Use project space.
You do not need to compress old files. The storage uses ZFS with transparent compression – it is handled automatically at the filesystem level.
Clean up temporary and intermediate files you no longer need to free up quota for others.

How Jobs Work

You (laptop) --SSH--> Login Node --sbatch--> Slurm Scheduler --> Compute Node(s)

You connect to the login node via SSH.
You write a job script and submit it with sbatch.
Slurm puts your job in the queue.
When resources are available, Slurm starts your job on a compute node.
Output is written to a log file you specified.

Important rules:

Do not run heavy computation on the login node. It is shared by all users for file management and job submission.
Always request the resources you need (CPU, memory, time) in your Slurm script.
If you need an interactive session (e.g., for debugging), use srun or salloc (see Interactive Sessions).

Quick Slurm Commands

Command	Purpose
`sbatch script.sh`	Submit a batch job
`squeue`	View all jobs in the queue (see recommended format below)
`squeue –me`	View only your jobs
`scancel <jobid>`	Cancel a job
`sinfo`	View partition and node status
`sacct -j <jobid>`	View job accounting info after completion
`srun –pty bash`	Start an interactive session

Recommended squeue format

The default squeue output is hard to read. We recommend this format:

squeue -o "%.6i %.10P %.10j %.15u %.10t %.10M %.10D %.20R %.3C %.10m"

Example output:

 JOBID  PARTITION       NAME            USER         ST       TIME      NODES             NODELIST(REASON) CPU MIN_MEMORY
  2313    compute  computel           anahit          R    1:53:18          1              thin-01  20        35G
  2293    compute  kneaddata           nelli          R   11:12:15          1              thin-01  20        30G
  2299    compute  glasso_j1          davith          R   11:12:15          1              thin-01   8        60G
  2282    compute  run_som.sh         melina          R   11:12:16          1              thin-01   8        50G
  2309    compute  plot_cover          mherk         PD       0:00          1          (Resources)   1         0
  2121      thick  pilon                nate         PD       0:00          1       (Nodes requi..   4       512G

You can add this as an alias in your ~/.bashrc for convenience:

alias sq='squeue -o "%.6i %.10P %.10j %.15u %.10t %.10M %.10D %.20R %.3C %.10m"'

For a full guide, see Using Slurm.

Environment and Software

All commonly used bioinformatics tools are installed globally on the cluster. There is no module system – tools are available directly by name:

# Check if a tool is available
which bwa
bwa --version
 
which samtools
samtools --version

See Software for a list of available tools.

If you need software that is not installed globally, you can install it locally using Conda.

Important: When using Conda, do not let it add itself to your ~/.bashrc. This slows down every login for you and can cause issues on login nodes. Instead, activate Conda manually when you need it. See the Conda Guide for details.

Network & Connectivity

The cluster is accessible via SSH at ssh.abi.am from anywhere on the internet.
No VPN is required for regular access.
Project leaders may be required by IT to set up two-factor authentication (2FA) on SSH. IT will inform you if this applies to you.
For slow connection troubleshooting, see the Support page.

Next Steps

Slurm Guide – Full job submission reference with examples.
Available Software – What tools are installed and how to use them.
Pipelines – Ready-to-use bioinformatics workflows.
Databases & Reference Data – Reference genomes available on the server.

Table of Contents