User Tools

Site Tools


scripts:download_fastq

This is an old revision of the document!


Download FASTQ from SRA

An easy intro

The easiest way to download FASTQ files from SRA is using the fastq-dump command from the SRA Toolkit with the command:

fastq-dump –gzip –split-files SRR[accession ID]

Options:

 ''--gzip'' will compress the downloaded FASTQ files (this is strictly recommended to save space)
 ''--split-files'' will split your files into _1 and _2 files if it's a paired-end sequencing. If it's not, then skip this command

However, you would like to put this command inside a script to download multiple files via slurm. In addition, it's recommended to complicate the script a bit to account for possible connection issues during download. Below is an example script that downloads a set of files with the provided accessions with multiple attempts to download each file in case of failures.

Script with download retry attempts

#!/bin/bash
#SBATCH --mem=10gb
#SBATCH --cpus-per-task=10
#SBATCH --job-name=dwnld_fq
#SBATCH --output=log/download_fq%j.log  # %j will be replaced with the job ID

# SCRIPT FOR DOWNLOADING SRA FILES USING THE NCBI SRA TOOLKIT
# NOTE: Run this script from the directory where the "log" directory is located
#
# PURPOSE:
#   This script reads SRA accession IDs from a given file (one per line)
#   and downloads each corresponding SRA file using fastq-dump.
#
# PARAMETERS:
#   1: OUTPUT DIRECTORY   - Directory where downloaded files will be stored.
#   2: ACCESSION FILE     - Text file containing SRA accession IDs (one per line).
#
# SAMPLE USAGE:
#   sbatch src/download_sra.sh <output_directory> <accession_file>
#
# IMPORTANT:
#   - This script downloads files using fastq-dump, gzips and splits paired-end reads (it does nothing to single-end read).
#   - Ensure that the SRA Toolkit is installed and available.

# Check for required parameters
if [ "$#" -ne 2 ]; then
    echo "Usage: $0 <output_directory> <accession_file>"
    exit 1
fi

# Parameters
outdir="$1"
accession_file="$2"

# Ensure the accession file exists
if [ ! -f "$accession_file" ]; then
    echo "Error: Accession file '$accession_file' does not exist."
    exit 1
fi

# Create the output directory if it doesn't exist
mkdir -p "$outdir"

# Define the log file with a fallback if SLURM_JOB_ID is not set
log_file="log/download_sra_retry${SLURM_JOB_ID:-manual}.log"
echo "Command: $0 $@" > "$log_file"
echo "Job started on: $(date)" >> "$log_file"

# Function to download an SRA accession using fastq-dump (without split/gzip)
download_sra() {
    local acc="$1"
    echo "Downloading accession: $acc" >> "$log_file"
    fastq-dump --gzip --split-files "$acc" -O "$outdir"
    if [ "$?" -ne 0 ]; then
        echo "Error: fastq-dump failed for accession: $acc" >> "$log_file"
        return 1
    else
        echo "Successfully downloaded: $acc" >> "$log_file"
        return 0
    fi
}

# Function to download with retries
download_with_retry() {
    local acc="$1"
    local max_retries=10
    local attempt=1
    while [ $attempt -le $max_retries ]; do
        echo "Attempt $attempt for $acc" >> "$log_file"
        if download_sra "$acc"; then
            return 0  # Success
        fi
        ((attempt++))
    done
    echo "Failed all $max_retries attempts for $acc" >> "$log_file"
    return 1
}

# Export the functions and variables for use in GNU Parallel
export -f download_sra download_with_retry
export outdir
export log_file

# Process all accessions in parallel
accessions=$(grep -v '^#' "$accession_file")
if [ -z "$accessions" ]; then
    echo "Error: No valid accessions found in '$accession_file'." >> "$log_file"
    exit 1
fi

echo "Processing accessions in parallel..." >> "$log_file"
parallel -j 20 download_with_retry ::: $accessions

# Check overall exit status and log the result
if [ "$?" -eq 0 ]; then
    echo "All accessions processed successfully." >> "$log_file"
else
    echo "One or more accessions encountered errors." >> "$log_file"
fi

echo "Job completed on: $(date)" >> "$log_file"
 
scripts/download_fastq.1741344490.txt.gz · Last modified: by 37.26.174.181

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki