The following scripts perform quality check with FastQC and MultiQC. You can find Fastp in How to perform quality control and adapter trimming of your reads with cutadapt and fastp. Fastp performs quality trimming and also generates reports, which can be used as input to MultiQC.
Uses fastq files as inputs and generates reports. Below daughter and parent scripts are provided.
fastqc.sh
# SCRIPT FOR PERFORMING FASTQC
# NOTE: Run this script from the directory where the "log" directory is located,
# Example: /mnt/proj/ibd/ds-06_cd-fecal/common
#
# PURPOSE:
# This script performs FastQC.
#
# PARAMETERS:
# 1: input_dir - Directory where FASTQ files are located.
# 2: output_dir - Directory where FastQC reports will be located
# SAMPLE USAGE:
# In a parent script: src/fastqc.sh <input_dir> <output_dir>
#
# IMPORTANT:
# - Run from a parent script.
# Check if correct number of arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 <input_dir> <output_dir>"
exit 1
fi
# Input parameters
input_dir="$1"
output_dir="$2"
echo "Running FastQC on reads..."
mkdir -p "$output_dir"
fastqc -o "$output_dir" "$input_dir/"*.fq.gz
fastqc_00.sh
#!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=fastqc_00 #SBATCH --output=log/fastqc_00%j.log # %j will be replaced with the job ID #parameters input=fq_renamed output=fastqc src/fastqc.sh $input $output
MultiQC generates a report based on multiple FastQC reports and enables to view then simultaneously. MultiQC also takes Fastp reports as input and allocates a separate section.
multiqc.sh
# SCRIPT FOR PERFORMING MultiQC
# NOTE: Run this script from the directory where the "log" directory is located,
# Example: /mnt/proj/ibd/ds-06_cd-fecal/common
#
# PURPOSE:
# This script performs MultiQC.
#
# PARAMETERS:
# 1: input_dir - Directory where FastQC reports are located.
# 2: output_dir - Directory where MultiQC report will be located
# SAMPLE USAGE:
# In a parent script: src/multiqc.sh <input_dir> <output_dir>
#
# IMPORTANT:
# - Run from a parent script.
# Check if correct number of arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 <input_dir> <output_dir>"
exit 1
fi
# Input parameters
input_dir="$1"
output_dir="$2"
echo "Running MultiQC..."
mkdir -p "$output_dir"
multiqc -o "$output_dir" "$input_dir"
multiqc_00.sh
#!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=multiqc_00 #SBATCH --output=log/multiqc_00%j.log # %j will be replaced with the job ID #parameters input=fastqc output=multiqc src/multiqc.sh $input $output
You can also perform them simultaneously with one parent script:
qc_00.sh
#!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=fastqc_multiqc_00 #SBATCH --output=log/fastqc_multiqc_00%j.log # %j will be replaced with the job ID #parameters input_fqc=fq output_fqc=fastqc input_mqc=fastqc output_mqc=multiqc src/fastqc.sh $input_fqc $output_fqc src/multiqc.sh $input_mqc $output_mqc
You can also use the same scripts on trimmed files. Just add “_trimmed” to the inputs and outputs like this. Remember to save them in separate folders (fastqc_trimmed, multiqc_trimmed)
fastqc_post_00.sh
#!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=fastqc_post_00 #SBATCH --output=log/fastqc_post_00%j.log # %j will be replaced with the job ID #parameters input=fq_trimmed output=fastqc_trimmed src/fastqc.sh $input $output
multiqc_post_00.sh
#!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=multiqc_post_00 #SBATCH --output=log/multiqc_post_00%j.log # %j will be replaced with the job ID #parameters input=fastqc_trimmed output=multiqc_trimmed src/multiqc.sh $input $output
If you want to add Fastp reports to MultiQC, you can add “fq_trimmed” directory as input to MultiQC (multiqc_fastp.sh).
multiqc_fastp.sh:
# SCRIPT FOR PERFORMING MultiQC
# NOTE: Run this script from the directory where the "log" directory is located,
# Example: /mnt/proj/ibd/ds-06_cd-fecal/common
#
# PURPOSE:
# This script performs MultiQC.
#
# PARAMETERS:
# 1: input_dir - Directory where FastQC reports are located.
# 2: output_dir - Directory where MultiQC report will be located
# SAMPLE USAGE:
# In a parent script: src/multiqc.sh <input_dir> <output_dir>
#
# IMPORTANT:
# - Run from a parent script.
# Check if correct number of arguments are provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 <input_dir> <output_dir>"
exit 1
fi
# Input parameters
input_dir=$1
input_fastp=$2
output_dir=$3
echo "Running MultiQC..."
mkdir -p "$output_dir"
multiqc -o "$output_dir" "$input_dir" "$input_fastp"
multiqc_fastp_00.sh
#!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=multiqc_fastp_00 #SBATCH --output=log/multiqc_fastp_00%j.log # %j will be replaced with the job ID #parameters input=fastqc_trimmed input_fastp=fq_trimmed output=multiqc_trimmed src/multiqc.sh $input $input_fastp $output