Slurm (Simple Linux Utility for Resource Management) is the job scheduler on ABI's cluster. All compute jobs must be submitted through Slurm – do not run heavy computation on the login nodes (ssh-01, ssh-02).
| Partition | Nodes | CPUs | Memory | Purpose |
|---|---|---|---|---|
compute (default) | thin-01, thin-02, thick-01 | 64/node | 384G-768G | General purpose – use this for most jobs |
thin | thin-01, thin-02 | 64/node | ~384G each | Explicit thin-node targeting |
thick | thick-01 | 64 | ~768G | Memory-intensive jobs (e.g., pilon, large assemblies) |
download | dl-01, dl-02 | 2/node | ~8G each | Data downloads only (not for computation) |
compute partition is the default. If you omit --partition, your job goes here.--partition=thick explicitly when you need >384G of RAM.--partition=download only for data download tasks.| Command | Purpose | Example |
|---|---|---|
sbatch | Submit a batch job | sbatch my_job.sh |
squeue | View the job queue | squeue –me |
scancel | Cancel a job | scancel 12345 |
sinfo | View partitions & node status | sinfo |
sacct | View completed job info | sacct -j 12345 |
srun | Run an interactive command | srun –pty bash |
salloc | Allocate resources interactively | salloc –mem=4G |
A batch job is a shell script with special #SBATCH directives that tell Slurm what resources you need.
Create a file my_job.sh:
#!/bin/bash #SBATCH --mem=10gb # Memory required #SBATCH --cpus-per-task=4 # Number of CPU cores #SBATCH --output=slurm-%j.log # Log file (%j = job ID) echo "Job started at $(date)" echo "Running on node: $(hostname)" echo "Using $SLURM_CPUS_PER_TASK CPUs" # Your commands here your_command --threads $SLURM_CPUS_PER_TASK input.fastq -o output/ echo "Job finished at $(date)"
Submit it:
sbatch my_job.sh
| Directive | Purpose | Example |
|---|---|---|
--mem | Total memory for the job | --mem=10gb |
--cpus-per-task | Number of CPU cores | --cpus-per-task=4 |
--output | Standard output log file | --output=slurm-%j.log |
--error | Standard error log file | --error=slurm-%j.err |
--job-name | Name shown in squeue | --job-name=alignment |
--time | Maximum wall time | --time=24:00:00 |
--partition | Which partition to use | --partition=thick |
--mail-type | Email notifications | --mail-type=BEGIN,END,FAIL |
--mail-user | Email address | --mail-user=you@abi.am |
--array | Submit a job array | --array=1-10 |
#!/bin/bash #SBATCH --job-name=align_sample01 #SBATCH --mem=40gb #SBATCH --cpus-per-task=8 #SBATCH --time=48:00:00 #SBATCH --output=log/align_sample01_%j.log #SBATCH --error=log/align_sample01_%j.err #SBATCH --mail-type=END,FAIL #SBATCH --mail-user=your_email@abi.am # Print job info for debugging echo "Job ID: $SLURM_JOB_ID" echo "Node: $(hostname)" echo "Start: $(date)" echo "Directory: $(pwd)" # Use the SLURM variable for thread count (keeps it consistent) THREADS=$SLURM_CPUS_PER_TASK # Create output directory mkdir -p bam/ # Run alignment bwa mem -t $THREADS \ /mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/GCF_000001405.40_GRCh38.p14_genomic.fna \ fastq/sample01_1.fq.gz \ fastq/sample01_2.fq.gz \ | samtools sort -@ $THREADS -o bam/sample01.sorted.bam - samtools index bam/sample01.sorted.bam echo "Finished: $(date)"
# View all jobs squeue # View only your jobs squeue --me # Detailed formatting (recommended -- add this as an alias) squeue -o "%.6i %.10P %.10j %.15u %.10t %.10M %.10D %.20R %.3C %.10m"
Job state codes:
| Code | Meaning |
|---|---|
PD | Pending (waiting for resources) |
R | Running |
CG | Completing |
CD | Completed |
F | Failed |
CA | Cancelled |
TO | Timed out |
# Basic accounting sacct -j <jobid> # Detailed resource usage sacct -j <jobid> --format=JobID,JobName,Partition,State,Elapsed,MaxRSS,MaxVMSize,NCPUS
# Cancel a specific job scancel <jobid> # Cancel all your jobs scancel -u $USER # Cancel all your pending jobs scancel -u $USER --state=PENDING
Sometimes you need to work interactively on a compute node (e.g., for testing, debugging, or running tools that require interaction).
# Start an interactive bash session on a compute node srun --pty --mem=4gb --cpus-per-task=2 bash # With a specific partition and time limit srun --pty --mem=8gb --cpus-per-task=4 --time=2:00:00 --partition=thin bash
Once the session starts, you will be on a compute node and can run commands directly. Type exit to end the session.
Important: Interactive sessions consume resources just like batch jobs. End them when you are done.
Job arrays let you submit many similar jobs with a single command. This is useful for processing multiple samples with the same script.
#!/bin/bash #SBATCH --job-name=qc_array #SBATCH --mem=4gb #SBATCH --cpus-per-task=2 #SBATCH --output=log/qc_%A_%a.log #SBATCH --array=1-10 # %A = array master job ID, %a = array task ID # Read the sample name from a file (one sample per line) SAMPLE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" samples.txt) echo "Processing sample: $SAMPLE (task $SLURM_ARRAY_TASK_ID)" fastqc -o fastqc/ fastq/${SAMPLE}_1.fq.gz fastq/${SAMPLE}_2.fq.gz
Where samples.txt contains:
sample01 sample02 sample03 ... sample10
Submit:
sbatch qc_array.sh
Limit the number of simultaneous tasks with %N:
#SBATCH --array=1-100%10 # Run 100 tasks, but only 10 at a time
Requesting the right amount of resources is important:
Guidelines for common bioinformatics tasks:
| Task | Typical Memory |
|---|---|
| FastQC | 2-4 GB |
| fastp trimming | 4-8 GB |
| BWA mem alignment | 10-40 GB (depends on genome size) |
| GATK HaplotypeCaller | 8-16 GB |
| samtools sort | 4-10 GB |
| *TODO: add more based on your workloads* |
If you are unsure, start with a moderate amount and check the actual usage after the job completes:
sacct -j <jobid> --format=JobID,MaxRSS,Elapsed
MaxRSS shows the peak memory usage. Adjust your next job accordingly.
–threads or -t parameter. Set --cpus-per-task to match.$SLURM_CPUS_PER_TASK variable in your script to keep the thread count consistent.--time, Slurm uses the partition's default limit.sinfo -l.log/ directory before submitting jobs that write to log/.$SLURM_CPUS_PER_TASK instead of hardcoding thread counts.--job-name so you can identify them in squeue.align_sample01.sh, align_sample02.sh) or use job arrays.
Add this to your ~/.bashrc for nicer job listing:
alias sq='squeue -o "%.6i %.10P %.10j %.15u %.10t %.10M %.10D %.20R %.3C %.10m"'
Then just type sq to see formatted output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) CPU MIN_MEMORY 2313 compute computel anahit R 1:53:18 1 thin-01 20 35G 2293 compute kneaddata nelli R 11:12:15 1 thin-01 20 30G 2299 compute glasso_j1 davith R 11:12:15 1 thin-01 8 60G 2282 compute run_som.sh melina R 11:12:16 1 thin-01 8 50G 2309 compute plot_cover mherk PD 0:00 1 (Resources) 1 0 2121 thick pilon nate PD 0:00 1 (Nodes requi.. 4 512G
| Problem | Likely Cause | Solution |
|---|---|---|
Job stays in PD state | Not enough free resources | Wait, or reduce resource request |
| Job immediately fails | Script error or bad path | Check the log file for error messages |
slurmstepd: error: Exceeded job memory limit | Requested too little memory | Increase --mem |
CANCELLED AT … DUE TO TIME LIMIT | Job took longer than --time | Increase the time limit |
error: Batch job submission failed: Invalid partition | Wrong partition name | Valid partitions: compute, thin, thick, download. Check with sinfo |