This section documents end-to-end bioinformatics workflows used at ABI. Each pipeline page describes a complete workflow from raw data to results, including the tools, scripts, and parameters used.
For individual reusable scripts, see the Scripts section.
The typical NGS analysis workflow follows these steps:
Raw FASTQ --> QC --> Trimming --> Alignment --> Post-processing --> Variant Calling / Analysis
Each step is documented in detail:
| Step | Description | Tools | Guide |
|---|---|---|---|
| 1. Download data | Download FASTQ files from sequencing facility or public databases | wget, sra-tools | Download FASTQ |
| 2. Quality control | Assess read quality before and after trimming | FastQC, MultiQC, fastp | Quality Control |
| 3. Adapter & quality trimming | Remove adapters and low-quality bases | fastp, cutadapt | Trimming |
| 4. Alignment | Map reads to a reference genome | BWA mem, Bowtie2, STAR | Alignment |
| 5. Post-alignment processing | Sort, index, mark duplicates | samtools, Picard | *TODO: create page* |
| 6. Variant calling | Call SNPs and indels | GATK HaplotypeCaller, bcftools | *TODO: create page* |
| 7. Variant filtering & annotation | Filter and annotate variants | GATK, SnpEff, VEP | *TODO: create page* |
| 8. Downstream analysis | Statistical analysis, visualization | R, Python | *TODO: project-specific* |
Different types of sequencing data require different pipelines:
FASTQ --> FastQC --> fastp --> BWA mem --> samtools sort --> Mark Duplicates --> GATK HaplotypeCaller --> Filter --> Annotate
Relevant guides:
FASTQ --> FastQC --> fastp --> STAR --> featureCounts/HTSeq --> DESeq2/edgeR
*TODO: Create RNA-seq specific pipeline pages when needed.*
FASTQ --> FastQC --> fastp --> Kraken2/MetaPhlAn --> Diversity analysis
*TODO: Create metagenomics pipeline pages when needed.*
ABI uses a parent/daughter script pattern for Slurm jobs:
src/fastqc.sh, src/align.sh). Takes parameters like input/output directories.#SBATCH directives.Example:
project/
src/
fastqc.sh # Daughter: runs FastQC
align.sh # Daughter: runs BWA mem
fastqc_00.sh # Parent: Slurm job calling src/fastqc.sh
align_00.sh # Parent: Slurm job calling src/align.sh
log/ # Job output logs
fastq/ # Input FASTQ files
bam/ # Output BAM files
This approach allows you to:
See Running Jobs on Slurm for more on this pattern.
log/ directory before submitting jobs.align_sample01.sh, align_sample02.sh) or use job arrays.