==== Quality Control with FastQC, Fastp and MultiQC ==== The following scripts perform quality check with FastQC and MultiQC. You can find Fastp in [[scripts:adapter_and_quality_trimming|How to perform quality control and adapter trimming of your reads with cutadapt and fastp]]. Fastp performs quality trimming and also generates reports, which can be used as input to MultiQC. === FastQC === Uses fastq files as inputs and generates reports. Below daughter and parent scripts are provided. == Dauther script == fastqc.sh # SCRIPT FOR PERFORMING FASTQC # NOTE: Run this script from the directory where the "log" directory is located, # Example: /mnt/proj/ibd/ds-06_cd-fecal/common # # PURPOSE: # This script performs FastQC. # # PARAMETERS: # 1: input_dir - Directory where FASTQ files are located. # 2: output_dir - Directory where FastQC reports will be located # SAMPLE USAGE: # In a parent script: src/fastqc.sh # # IMPORTANT: # - Run from a parent script. # Check if correct number of arguments are provided if [ "$#" -ne 2 ]; then echo "Usage: $0 " exit 1 fi # Input parameters input_dir="$1" output_dir="$2" echo "Running FastQC on reads..." mkdir -p "$output_dir" fastqc -o "$output_dir" "$input_dir/"*.fq.gz == Parent script == fastqc_00.sh #!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=fastqc_00 #SBATCH --output=log/fastqc_00%j.log # %j will be replaced with the job ID #parameters input=fq_renamed output=fastqc src/fastqc.sh $input $output === MultiQC === MultiQC generates a report based on multiple FastQC reports and enables to view then simultaneously. MultiQC also takes Fastp reports as input and allocates a separate section. == Dauther script == multiqc.sh # SCRIPT FOR PERFORMING MultiQC # NOTE: Run this script from the directory where the "log" directory is located, # Example: /mnt/proj/ibd/ds-06_cd-fecal/common # # PURPOSE: # This script performs MultiQC. # # PARAMETERS: # 1: input_dir - Directory where FastQC reports are located. # 2: output_dir - Directory where MultiQC report will be located # SAMPLE USAGE: # In a parent script: src/multiqc.sh # # IMPORTANT: # - Run from a parent script. # Check if correct number of arguments are provided if [ "$#" -ne 2 ]; then echo "Usage: $0 " exit 1 fi # Input parameters input_dir="$1" output_dir="$2" echo "Running MultiQC..." mkdir -p "$output_dir" multiqc -o "$output_dir" "$input_dir" == Parent script == multiqc_00.sh #!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=multiqc_00 #SBATCH --output=log/multiqc_00%j.log # %j will be replaced with the job ID #parameters input=fastqc output=multiqc src/multiqc.sh $input $output You can also perform them simultaneously with one parent script: == Combined parent script == qc_00.sh #!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=fastqc_multiqc_00 #SBATCH --output=log/fastqc_multiqc_00%j.log # %j will be replaced with the job ID #parameters input_fqc=fq output_fqc=fastqc input_mqc=fastqc output_mqc=multiqc src/fastqc.sh $input_fqc $output_fqc src/multiqc.sh $input_mqc $output_mqc You can also use the same scripts on trimmed files. Just add "_trimmed" to the inputs and outputs like this. Remember to save them in separate folders (fastqc_trimmed, multiqc_trimmed) fastqc_post_00.sh #!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=fastqc_post_00 #SBATCH --output=log/fastqc_post_00%j.log # %j will be replaced with the job ID #parameters input=fq_trimmed output=fastqc_trimmed src/fastqc.sh $input $output multiqc_post_00.sh #!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=multiqc_post_00 #SBATCH --output=log/multiqc_post_00%j.log # %j will be replaced with the job ID #parameters input=fastqc_trimmed output=multiqc_trimmed src/multiqc.sh $input $output === Fastp === If you want to add Fastp reports to MultiQC, you can add "fq_trimmed" directory as input to MultiQC (multiqc_fastp.sh). == Daughter script == multiqc_fastp.sh: # SCRIPT FOR PERFORMING MultiQC # NOTE: Run this script from the directory where the "log" directory is located, # Example: /mnt/proj/ibd/ds-06_cd-fecal/common # # PURPOSE: # This script performs MultiQC. # # PARAMETERS: # 1: input_dir - Directory where FastQC reports are located. # 2: output_dir - Directory where MultiQC report will be located # SAMPLE USAGE: # In a parent script: src/multiqc.sh # # IMPORTANT: # - Run from a parent script. # Check if correct number of arguments are provided if [ "$#" -ne 2 ]; then echo "Usage: $0 " exit 1 fi # Input parameters input_dir=$1 input_fastp=$2 output_dir=$3 echo "Running MultiQC..." mkdir -p "$output_dir" multiqc -o "$output_dir" "$input_dir" "$input_fastp" == Parent == multiqc_fastp_00.sh #!/bin/bash #SBATCH --mem=10gb #SBATCH --cpus-per-task=10 #SBATCH --job-name=multiqc_fastp_00 #SBATCH --output=log/multiqc_fastp_00%j.log # %j will be replaced with the job ID #parameters input=fastqc_trimmed input_fastp=fq_trimmed output=multiqc_trimmed src/multiqc.sh $input $input_fastp $output