You've loaded an old revision of the document! If you save it, you will create a new version with this data. Media Files====== Bioinformatics Pipelines ====== This section documents end-to-end bioinformatics workflows used at ABI. Each pipeline page describes a complete workflow from raw data to results, including the tools, scripts, and parameters used. For individual reusable scripts, see the [[scripts:start|Scripts]] section. ===== Common Pipelines ===== The typical NGS analysis workflow follows these steps: <code> Raw FASTQ --> QC --> Trimming --> Alignment --> Post-processing --> Variant Calling / Analysis </code> Each step is documented in detail: ===== Pipeline Steps ===== ^ Step ^ Description ^ Tools ^ Guide ^ | 1. Download data | Download FASTQ files from sequencing facility or public databases | wget, sra-tools | [[scripts:download_fastq|Download FASTQ]] | | 2. Quality control | Assess read quality before and after trimming | FastQC, MultiQC, fastp | [[scripts:qc|Quality Control]] | | 3. Adapter & quality trimming | Remove adapters and low-quality bases | fastp, cutadapt | [[scripts:adapter_and_quality_trimming|Trimming]] | | 4. Alignment | Map reads to a reference genome | BWA mem, Bowtie2, STAR | [[scripts:alignment|Alignment]] | | 5. Post-alignment processing | Sort, index, mark duplicates | samtools, Picard | *TODO: create page* | | 6. Variant calling | Call SNPs and indels | GATK HaplotypeCaller, bcftools | *TODO: create page* | | 7. Variant filtering & annotation | Filter and annotate variants | GATK, SnpEff, VEP | *TODO: create page* | | 8. Downstream analysis | Statistical analysis, visualization | R, Python | *TODO: project-specific* | ---- ===== Workflow by Data Type ===== Different types of sequencing data require different pipelines: ==== Whole Genome Sequencing (WGS) / Whole Exome Sequencing (WES) ==== <code> FASTQ --> FastQC --> fastp --> BWA mem --> samtools sort --> Mark Duplicates --> GATK HaplotypeCaller --> Filter --> Annotate </code> Relevant guides: * [[scripts:qc|QC]] --> [[scripts:adapter_and_quality_trimming|Trimming]] --> [[scripts:alignment|Alignment (BWA)]] * *TODO: Add variant calling and annotation guides* ==== RNA-seq ==== <code> FASTQ --> FastQC --> fastp --> STAR --> featureCounts/HTSeq --> DESeq2/edgeR </code> *TODO: Create RNA-seq specific pipeline pages when needed.* ==== Metagenomics / Microbiome ==== <code> FASTQ --> FastQC --> fastp --> Kraken2/MetaPhlAn --> Diversity analysis </code> *TODO: Create metagenomics pipeline pages when needed.* ---- ===== Script Organization ===== ABI uses a **parent/daughter script pattern** for Slurm jobs: * **Daughter script** -- A reusable function or tool wrapper (e.g., ''src/fastqc.sh'', ''src/align.sh''). Takes parameters like input/output directories. * **Parent script** -- A Slurm job script that sets parameters and calls the daughter script. Contains ''#SBATCH'' directives. Example: <code> project/ src/ fastqc.sh # Daughter: runs FastQC align.sh # Daughter: runs BWA mem fastqc_00.sh # Parent: Slurm job calling src/fastqc.sh align_00.sh # Parent: Slurm job calling src/align.sh log/ # Job output logs fastq/ # Input FASTQ files bam/ # Output BAM files </code> This approach allows you to: * Reuse daughter scripts across projects * Keep Slurm parameters separate from tool logic * Track each run via its parent script and log file See [[scripts:run_job_on_slurm|Running Jobs on Slurm]] for more on this pattern. ---- ===== Tips ===== * **Create a ''log/'' directory** before submitting jobs. * **Use one parent script per run** -- name them descriptively (e.g., ''align_sample01.sh'', ''align_sample02.sh'') or use [[software:slurm#job_arrays|job arrays]]. * **Document your parameters** -- add comments in parent scripts noting why you chose specific settings. * **Check QC at every step** -- run FastQC/MultiQC after trimming and after alignment. ---- ===== See Also ===== * [[scripts:start|Scripts]] -- Individual reusable scripts * [[software:slurm|Using Slurm]] -- Job submission and management * [[databases:start|Databases & Reference Data]] -- Reference genomes and indexes * [[projects:start|Projects]] -- Project-specific documentation SavePreviewCancel Edit summary