scripts:download_fastq
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| scripts:download_fastq [2025/05/27 10:21] – 37.26.174.181 | scripts:download_fastq [2025/08/06 08:38] (current) – 37.26.174.181 | ||
|---|---|---|---|
| Line 7: | Line 7: | ||
| The easiest way to download FASTQ files from SRA is using the fastq-dump command from the SRA Toolkit with the command: | The easiest way to download FASTQ files from SRA is using the fastq-dump command from the SRA Toolkit with the command: | ||
| - | '' | + | '' |
| Options: | Options: | ||
| Line 67: | Line 67: | ||
| local acc=" | local acc=" | ||
| echo " | echo " | ||
| - | if fastq-dump --gzip --split3 | + | if fastq-dump --gzip --split-3 |
| echo " | echo " | ||
| return 0 | return 0 | ||
| Line 152: | Line 152: | ||
| In this way, you can have a name for a parent script for each download attempt. And you'll have a log file with the same name. And you won't have to copy paste the whole code in each of the parent scripts, just call the daughter script and that's all! | In this way, you can have a name for a parent script for each download attempt. And you'll have a log file with the same name. And you won't have to copy paste the whole code in each of the parent scripts, just call the daughter script and that's all! | ||
| + | |||
| + | ====== Download files from EGA ====== | ||
| + | |||
| + | With this script you can download files from EGA database using pyega3 and have them located directly in the output directory. | ||
| + | |||
| + | You need to set some inputs: | ||
| + | |||
| + | * Credentials json file | ||
| + | * Connections | ||
| + | * List of file IDs that are to be downloaded | ||
| + | * Output directory | ||
| + | * Specify the files format that are to be downloaded | ||
| + | |||
| + | Make sure you set the number of cpus-per-task the multiplication of the number of files to be downloaded and the number of connections. | ||
| + | |||
| + | Here is an example of a credentials json file: | ||
| + | |||
| + | < | ||
| + | { | ||
| + | " | ||
| + | " | ||
| + | } | ||
| + | </ | ||
| + | |||
| + | ==The script== | ||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #SBATCH --mem=10gb | ||
| + | #SBATCH --cpus-per-task=1 | ||
| + | #SBATCH --job-name=dwnld_ega | ||
| + | #SBATCH --output=log/ | ||
| + | |||
| + | # Set cpus-per-task the number of files to be downloaded | ||
| + | # Set common variables | ||
| + | CREDENTIALS_FILE=" | ||
| + | CONNECTIONS=1 | ||
| + | |||
| + | # Define the paths to the text files containing the file IDs | ||
| + | FILE_ID_LIST=" | ||
| + | |||
| + | # Define output directories | ||
| + | FILE_OUTPUT_DIR=" | ||
| + | |||
| + | # Define file format | ||
| + | file_format=" | ||
| + | |||
| + | # --- Step 1: Create directories if they don't exist --- | ||
| + | echo " | ||
| + | mkdir -p $FILE_OUTPUT_DIR meta/md5sum log | ||
| + | |||
| + | # --- Step 3: Download files, move, and clean up temporary folders --- | ||
| + | echo " | ||
| + | # Check if the RNA-seq ID list file exists | ||
| + | if [ ! -f " | ||
| + | echo " | ||
| + | exit 1 | ||
| + | fi | ||
| + | |||
| + | while IFS= read -r file_id; do | ||
| + | if [ -z " | ||
| + | continue | ||
| + | fi | ||
| + | echo " | ||
| + | pyega3 -c " | ||
| + | done < " | ||
| + | wait | ||
| + | |||
| + | # Move files to the final location and remove temporary folders | ||
| + | echo " | ||
| + | while IFS= read -r file_id; do | ||
| + | if [ -z " | ||
| + | continue | ||
| + | fi | ||
| + | mv " | ||
| + | rm -r " | ||
| + | done < " | ||
| + | |||
| + | # --- Step 4: Perform md5sum on all final files --- | ||
| + | echo " | ||
| + | # Loop through all files in the directories | ||
| + | for file in $FILE_OUTPUT_DIR/ | ||
| + | if [ -f " | ||
| + | filename=$(basename " | ||
| + | md5sum " | ||
| + | echo " | ||
| + | fi | ||
| + | done | ||
| + | |||
| + | # --- Step 5: Remove unnecessary log files created by pyega3 --- | ||
| + | if [ -f pyega3_output.log ]; then | ||
| + | rm pyega3_output.log | ||
| + | echo " | ||
| + | fi | ||
| + | |||
| + | echo " | ||
| + | </ | ||
| + | |||
scripts/download_fastq.1748341299.txt.gz · Last modified: by 37.26.174.181
