==== Quality Control with FastQC, Fastp and MultiQC ====
The following scripts perform quality check with FastQC and MultiQC. You can find Fastp in [[scripts:adapter_and_quality_trimming|How to perform quality control and adapter trimming of your reads with cutadapt and fastp]]. Fastp performs quality trimming and also generates reports, which can be used as input to MultiQC.
=== FastQC ===
Uses fastq files as inputs and generates reports. Below daughter and parent scripts are provided.
== Dauther script ==
fastqc.sh
# SCRIPT FOR PERFORMING FASTQC
# NOTE: Run this script from the directory where the "log" directory is located,
# Example: /mnt/proj/ibd/ds-06_cd-fecal/common
#
# PURPOSE:
# This script performs FastQC.
#
# PARAMETERS:
# 1: input_dir - Directory where FASTQ files are located.
# 2: output_dir - Directory where FastQC reports will be located
# SAMPLE USAGE:
# In a parent script: src/fastqc.sh
#
# IMPORTANT:
# - Run from a parent script.
# Check if correct number of arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 "
exit 1
fi
# Input parameters
input_dir="$1"
output_dir="$2"
echo "Running FastQC on reads..."
mkdir -p "$output_dir"
fastqc -o "$output_dir" "$input_dir/"*.fq.gz
== Parent script ==
fastqc_00.sh
#!/bin/bash
#SBATCH --mem=10gb
#SBATCH --cpus-per-task=10
#SBATCH --job-name=fastqc_00
#SBATCH --output=log/fastqc_00%j.log # %j will be replaced with the job ID
#parameters
input=fq_renamed
output=fastqc
src/fastqc.sh $input $output
=== MultiQC ===
MultiQC generates a report based on multiple FastQC reports and enables to view then simultaneously. MultiQC also takes Fastp reports as input and allocates a separate section.
== Dauther script ==
multiqc.sh
# SCRIPT FOR PERFORMING MultiQC
# NOTE: Run this script from the directory where the "log" directory is located,
# Example: /mnt/proj/ibd/ds-06_cd-fecal/common
#
# PURPOSE:
# This script performs MultiQC.
#
# PARAMETERS:
# 1: input_dir - Directory where FastQC reports are located.
# 2: output_dir - Directory where MultiQC report will be located
# SAMPLE USAGE:
# In a parent script: src/multiqc.sh
#
# IMPORTANT:
# - Run from a parent script.
# Check if correct number of arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 "
exit 1
fi
# Input parameters
input_dir="$1"
output_dir="$2"
echo "Running MultiQC..."
mkdir -p "$output_dir"
multiqc -o "$output_dir" "$input_dir"
== Parent script ==
multiqc_00.sh
#!/bin/bash
#SBATCH --mem=10gb
#SBATCH --cpus-per-task=10
#SBATCH --job-name=multiqc_00
#SBATCH --output=log/multiqc_00%j.log # %j will be replaced with the job ID
#parameters
input=fastqc
output=multiqc
src/multiqc.sh $input $output
You can also perform them simultaneously with one parent script:
== Combined parent script ==
qc_00.sh
#!/bin/bash
#SBATCH --mem=10gb
#SBATCH --cpus-per-task=10
#SBATCH --job-name=fastqc_multiqc_00
#SBATCH --output=log/fastqc_multiqc_00%j.log # %j will be replaced with the job ID
#parameters
input_fqc=fq
output_fqc=fastqc
input_mqc=fastqc
output_mqc=multiqc
src/fastqc.sh $input_fqc $output_fqc
src/multiqc.sh $input_mqc $output_mqc
You can also use the same scripts on trimmed files. Just add "_trimmed" to the inputs and outputs like this. Remember to save them in separate folders (fastqc_trimmed, multiqc_trimmed)
fastqc_post_00.sh
#!/bin/bash
#SBATCH --mem=10gb
#SBATCH --cpus-per-task=10
#SBATCH --job-name=fastqc_post_00
#SBATCH --output=log/fastqc_post_00%j.log # %j will be replaced with the job ID
#parameters
input=fq_trimmed
output=fastqc_trimmed
src/fastqc.sh $input $output
multiqc_post_00.sh
#!/bin/bash
#SBATCH --mem=10gb
#SBATCH --cpus-per-task=10
#SBATCH --job-name=multiqc_post_00
#SBATCH --output=log/multiqc_post_00%j.log # %j will be replaced with the job ID
#parameters
input=fastqc_trimmed
output=multiqc_trimmed
src/multiqc.sh $input $output
=== Fastp ===
If you want to add Fastp reports to MultiQC, you can add "fq_trimmed" directory as input to MultiQC (multiqc_fastp.sh).
== Daughter script ==
multiqc_fastp.sh:
# SCRIPT FOR PERFORMING MultiQC
# NOTE: Run this script from the directory where the "log" directory is located,
# Example: /mnt/proj/ibd/ds-06_cd-fecal/common
#
# PURPOSE:
# This script performs MultiQC.
#
# PARAMETERS:
# 1: input_dir - Directory where FastQC reports are located.
# 2: output_dir - Directory where MultiQC report will be located
# SAMPLE USAGE:
# In a parent script: src/multiqc.sh
#
# IMPORTANT:
# - Run from a parent script.
# Check if correct number of arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 "
exit 1
fi
# Input parameters
input_dir=$1
input_fastp=$2
output_dir=$3
echo "Running MultiQC..."
mkdir -p "$output_dir"
multiqc -o "$output_dir" "$input_dir" "$input_fastp"
== Parent ==
multiqc_fastp_00.sh
#!/bin/bash
#SBATCH --mem=10gb
#SBATCH --cpus-per-task=10
#SBATCH --job-name=multiqc_fastp_00
#SBATCH --output=log/multiqc_fastp_00%j.log # %j will be replaced with the job ID
#parameters
input=fastqc_trimmed
input_fastp=fq_trimmed
output=multiqc_trimmed
src/multiqc.sh $input $input_fastp $output