scripts:adapter_and_quality_trimming
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| scripts:adapter_and_quality_trimming [2025/03/09 07:35] – created 37.26.174.181 | scripts:adapter_and_quality_trimming [2025/03/25 10:37] (current) – 37.26.174.181 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== How to perform quality and adapter trimming of your reads ====== | + | ====== How to perform quality |
| ===== What to trim ===== | ===== What to trim ===== | ||
| Line 161: | Line 161: | ||
| src/ | src/ | ||
| + | </ | ||
| + | |||
| + | ===== Example script with Fastp ===== | ||
| + | |||
| + | ==== Daughter script ==== | ||
| + | |||
| + | This script takes the adapters, number of bases to trim, and number of threads for parallel processing as input and generates fastp reports (html, json). It operates in single-end and paired-end mode. Please note, it is important to treat your paired-end files in parallel mode, otherwise, some reads may be removed from one file, and stay in the other, which will lead to problems during alignment. | ||
| + | |||
| + | The script will search for **Illumina Universal Adapter**. You can modify the script to select from another two available in it (just set the one you choose as the '' | ||
| + | |||
| + | **fastp.sh** | ||
| + | |||
| + | < | ||
| + | |||
| + | |||
| + | # ------------------------------------------ | ||
| + | # SCRIPT TO CUT ADAPTERS AND TRIM LOW-QUALITY READS | ||
| + | # Processes FASTQ files using `fastp` to: | ||
| + | # 1. Remove adapter sequences. | ||
| + | # 2. Trim low-quality bases (Q < 20) from ends of reads. | ||
| + | # 3. Discard reads shorter than 20 bp after trimming. | ||
| + | # 4. Works for both single-end and paired-end reads. | ||
| + | # ------------------------------------------ | ||
| + | |||
| + | # Input parameters | ||
| + | input_dir=" | ||
| + | output_dir=" | ||
| + | trim_5p=" | ||
| + | trim_3p=" | ||
| + | threads=" | ||
| + | paired=" | ||
| + | |||
| + | # ------------------------------------------ | ||
| + | # DEFAULT ADAPTER SETS | ||
| + | # ------------------------------------------ | ||
| + | # Illumina universal adapter (most common) | ||
| + | illumina_universal_adapter=" | ||
| + | # Illumina Nextera adapter | ||
| + | nextera_adapter=" | ||
| + | # TruSeq adapter | ||
| + | truseq_adapter=" | ||
| + | |||
| + | # Use the Illumina universal adapter by default | ||
| + | adapter=" | ||
| + | |||
| + | # ------------------------------------------ | ||
| + | # VALIDATION | ||
| + | # ------------------------------------------ | ||
| + | if [[ -z " | ||
| + | echo " | ||
| + | exit 1 | ||
| + | fi | ||
| + | |||
| + | # Create output directory if it doesn' | ||
| + | mkdir -p " | ||
| + | |||
| + | # Log the command execution | ||
| + | echo " | ||
| + | echo "Using default adapter: $adapter" | ||
| + | echo " | ||
| + | |||
| + | # ------------------------------------------ | ||
| + | # PROCESS FILES WITH FASTP | ||
| + | # ------------------------------------------ | ||
| + | |||
| + | if [[ " | ||
| + | # ---------- PAIRED-END MODE ---------- | ||
| + | echo " | ||
| + | |||
| + | for file1 in " | ||
| + | if [[ -f " | ||
| + | file2=" | ||
| + | file2=" | ||
| + | |||
| + | if [[ -f " | ||
| + | filename=$(basename " | ||
| + | |||
| + | echo " | ||
| + | |||
| + | # Run fastp for paired-end | ||
| + | fastp --in1 " | ||
| + | --out1 " | ||
| + | --out2 " | ||
| + | --trim_poly_g --trim_poly_x \ | ||
| + | --low_complexity_filter \ | ||
| + | --qualified_quality_phred 20 \ | ||
| + | --length_required 20 \ | ||
| + | --thread " | ||
| + | --trim_front1 " | ||
| + | --trim_front2 " | ||
| + | --adapter_sequence " | ||
| + | # | ||
| + | --html " | ||
| + | --json " | ||
| + | |||
| + | echo " | ||
| + | else | ||
| + | echo " | ||
| + | fi | ||
| + | fi | ||
| + | done | ||
| + | else | ||
| + | # ---------- SINGLE-END MODE ---------- | ||
| + | echo " | ||
| + | |||
| + | for file in " | ||
| + | if [[ -f " | ||
| + | filename=$(basename " | ||
| + | |||
| + | echo " | ||
| + | |||
| + | # Run fastp for single-end | ||
| + | fastp --in1 " | ||
| + | --out1 " | ||
| + | --trim_poly_g --trim_poly_x \ | ||
| + | --low_complexity_filter \ | ||
| + | --qualified_quality_phred 20 \ | ||
| + | --length_required 20 \ | ||
| + | --thread " | ||
| + | --trim_front1 " | ||
| + | --adapter_sequence " | ||
| + | --html " | ||
| + | --json " | ||
| + | |||
| + | echo " | ||
| + | fi | ||
| + | done | ||
| + | fi | ||
| + | |||
| + | # Completion message | ||
| + | echo "All files processed. Output saved in $output_dir." | ||
| + | |||
| + | |||
| + | </ | ||
| + | |||
| + | ==== Parent script ==== | ||
| + | |||
| + | **fastp_00.sh** | ||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #SBATCH --mem=32gb | ||
| + | #SBATCH --cpus-per-task=10 | ||
| + | #SBATCH --job-name=fastp_00 | ||
| + | #SBATCH --output=log/ | ||
| + | |||
| + | # Parameters (example, modify as needed) | ||
| + | input_dir=fq_renamed | ||
| + | output_dir=fq_trimmed | ||
| + | trim_5p=10 # | ||
| + | trim_3p=0 # Number of bases to trim from the 3' end | ||
| + | threads=10 # | ||
| + | paired=true # | ||
| + | |||
| + | src/ | ||
| </ | </ | ||
scripts/adapter_and_quality_trimming.1741505733.txt.gz · Last modified: by 37.26.174.181 · Currently locked by: 37.26.174.181
