Table of Contents

Bioinformatics Pipelines

This section documents end-to-end bioinformatics workflows used at ABI. Each pipeline page describes a complete workflow from raw data to results, including the tools, scripts, and parameters used.

For individual reusable scripts, see the Scripts section.

Common Pipelines

The typical NGS analysis workflow follows these steps:

Raw FASTQ  -->  QC  -->  Trimming  -->  Alignment  -->  Post-processing  -->  Variant Calling / Analysis

Each step is documented in detail:

Pipeline Steps

Step Description Tools Guide
1. Download data Download FASTQ files from sequencing facility or public databases wget, sra-tools Download FASTQ
2. Quality control Assess read quality before and after trimming FastQC, MultiQC, fastp Quality Control
3. Adapter & quality trimming Remove adapters and low-quality bases fastp, cutadapt Trimming
4. Alignment Map reads to a reference genome BWA mem, Bowtie2, STAR Alignment
5. Post-alignment processing Sort, index, mark duplicates samtools, Picard *TODO: create page*
6. Variant calling Call SNPs and indels GATK HaplotypeCaller, bcftools *TODO: create page*
7. Variant filtering & annotation Filter and annotate variants GATK, SnpEff, VEP *TODO: create page*
8. Downstream analysis Statistical analysis, visualization R, Python *TODO: project-specific*

Workflow by Data Type

Different types of sequencing data require different pipelines:

Whole Genome Sequencing (WGS) / Whole Exome Sequencing (WES)

FASTQ --> FastQC --> fastp --> BWA mem --> samtools sort --> Mark Duplicates --> GATK HaplotypeCaller --> Filter --> Annotate

Relevant guides:

RNA-seq

FASTQ --> FastQC --> fastp --> STAR --> featureCounts/HTSeq --> DESeq2/edgeR

*TODO: Create RNA-seq specific pipeline pages when needed.*

Metagenomics / Microbiome

FASTQ --> FastQC --> fastp --> Kraken2/MetaPhlAn --> Diversity analysis

*TODO: Create metagenomics pipeline pages when needed.*


Script Organization

ABI uses a parent/daughter script pattern for Slurm jobs:

Example:

project/
  src/
    fastqc.sh          # Daughter: runs FastQC
    align.sh            # Daughter: runs BWA mem
  fastqc_00.sh          # Parent: Slurm job calling src/fastqc.sh
  align_00.sh           # Parent: Slurm job calling src/align.sh
  log/                  # Job output logs
  fastq/                # Input FASTQ files
  bam/                  # Output BAM files

This approach allows you to:

See Running Jobs on Slurm for more on this pattern.


Tips


See Also