Table of Contents
Bioinformatics Pipelines
This section documents end-to-end bioinformatics workflows used at ABI. Each pipeline page describes a complete workflow from raw data to results, including the tools, scripts, and parameters used.
For individual reusable scripts, see the Scripts section.
Common Pipelines
The typical NGS analysis workflow follows these steps:
Raw FASTQ --> QC --> Trimming --> Alignment --> Post-processing --> Variant Calling / Analysis
Each step is documented in detail:
Pipeline Steps
| Step | Description | Tools | Guide |
|---|---|---|---|
| 1. Download data | Download FASTQ files from sequencing facility or public databases | wget, sra-tools | Download FASTQ |
| 2. Quality control | Assess read quality before and after trimming | FastQC, MultiQC, fastp | Quality Control |
| 3. Adapter & quality trimming | Remove adapters and low-quality bases | fastp, cutadapt | Trimming |
| 4. Alignment | Map reads to a reference genome | BWA mem, Bowtie2, STAR | Alignment |
| 5. Post-alignment processing | Sort, index, mark duplicates | samtools, Picard | *TODO: create page* |
| 6. Variant calling | Call SNPs and indels | GATK HaplotypeCaller, bcftools | *TODO: create page* |
| 7. Variant filtering & annotation | Filter and annotate variants | GATK, SnpEff, VEP | *TODO: create page* |
| 8. Downstream analysis | Statistical analysis, visualization | R, Python | *TODO: project-specific* |
Workflow by Data Type
Different types of sequencing data require different pipelines:
Whole Genome Sequencing (WGS) / Whole Exome Sequencing (WES)
FASTQ --> FastQC --> fastp --> BWA mem --> samtools sort --> Mark Duplicates --> GATK HaplotypeCaller --> Filter --> Annotate
Relevant guides:
- *TODO: Add variant calling and annotation guides*
RNA-seq
FASTQ --> FastQC --> fastp --> STAR --> featureCounts/HTSeq --> DESeq2/edgeR
*TODO: Create RNA-seq specific pipeline pages when needed.*
Metagenomics / Microbiome
FASTQ --> FastQC --> fastp --> Kraken2/MetaPhlAn --> Diversity analysis
*TODO: Create metagenomics pipeline pages when needed.*
Script Organization
ABI uses a parent/daughter script pattern for Slurm jobs:
- Daughter script – A reusable function or tool wrapper (e.g.,
src/fastqc.sh,src/align.sh). Takes parameters like input/output directories. - Parent script – A Slurm job script that sets parameters and calls the daughter script. Contains
#SBATCHdirectives.
Example:
project/
src/
fastqc.sh # Daughter: runs FastQC
align.sh # Daughter: runs BWA mem
fastqc_00.sh # Parent: Slurm job calling src/fastqc.sh
align_00.sh # Parent: Slurm job calling src/align.sh
log/ # Job output logs
fastq/ # Input FASTQ files
bam/ # Output BAM files
This approach allows you to:
- Reuse daughter scripts across projects
- Keep Slurm parameters separate from tool logic
- Track each run via its parent script and log file
See Running Jobs on Slurm for more on this pattern.
Tips
- Create a
log/directory before submitting jobs. - Use one parent script per run – name them descriptively (e.g.,
align_sample01.sh,align_sample02.sh) or use job arrays. - Document your parameters – add comments in parent scripts noting why you chose specific settings.
- Check QC at every step – run FastQC/MultiQC after trimming and after alignment.
See Also
- Scripts – Individual reusable scripts
- Using Slurm – Job submission and management
- Databases & Reference Data – Reference genomes and indexes
- Projects – Project-specific documentation
