====== Databases & Reference Data ====== ABI maintains a collection of reference genomes, indexes, and shared databases for use in bioinformatics analyses. These are stored on shared storage and are **read-only** for regular users. ===== Location ===== All shared databases are located under: /mnt/nas1/db/ This volume is served from the NAS (''nas1:/znas1/abi/collections/db'') and has approximately 32 TB of total space (~1.8 TB currently used). ===== Reference Genomes ===== Pre-built reference genomes and their indexes are stored at: /mnt/nas1/db/genomes/ === Available Genomes === > TODO: Fill in the table below with the actual contents of /mnt/nas1/db/genomes/. Run ''ls /mnt/nas1/db/genomes/'' to get the full list. ^ Organism ^ Assembly ^ Path ^ Indexes Available ^ | Human | GRCh38.p14 | ''/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/'' | BWA, *TODO: others?* | | *TODO* | *TODO* | *TODO* | *TODO* | === BWA Indexes === BWA indexes for the human reference genome are located at: /mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/ This directory contains: * ''GCF_000001405.40_GRCh38.p14_genomic.fna'' -- the reference FASTA * ''.amb'', ''.ann'', ''.bwt'', ''.pac'', ''.sa'' -- BWA index files **Important:** The FASTA file (.fna/.fa) must be in the same directory as the index files for BWA to work. Usage example: REF="/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/GCF_000001405.40_GRCh38.p14_genomic.fna" bwa mem -t 8 $REF reads_1.fq.gz reads_2.fq.gz | samtools sort -o aligned.sorted.bam - === Building Your Own Index === If you need an index for a genome not listed above, you can build it yourself: # BWA index bwa index reference.fasta # samtools index (for BAM operations) samtools faidx reference.fasta # STAR index (for RNA-seq) STAR --runMode genomeGenerate --genomeDir star_index/ --genomeFastaFiles reference.fasta --sjdbGTFfile annotations.gtf **Request addition:** If you think a genome or index should be added to the shared collection, email **[[mailto:it-support@abi.am|it-support@abi.am]]** with: * Organism and assembly version * Download source (e.g., NCBI, Ensembl, UCSC) * Which indexes you need (BWA, Bowtie2, STAR, etc.) ---- ===== Other Shared Databases ===== > TODO: List any other shared databases available at ABI. Examples might include: ^ Database ^ Description ^ Path ^ | *TODO: e.g., BLAST NT/NR* | *NCBI nucleotide/protein databases* | *TODO: /mnt/nas1/db/blast/* | | *TODO: e.g., Kraken2 DB* | *Taxonomic classification database* | *TODO: /mnt/nas1/db/kraken2/* | | *TODO: e.g., dbSNP* | *Known human variants* | *TODO* | | *TODO* | *Add more as needed* | *TODO* | ---- ===== Directory Structure ===== > TODO: Run ''ls -la /mnt/nas1/db/'' on the server and paste the top-level structure here. Example: /mnt/nas1/db/ genomes/ homo_sapiens/ GRCh38.p14/ bwa_mem_0.7.17-r1188/ TODO: other organisms TODO: other database directories ---- ===== Best Practices ===== * **Do not copy reference data to your home or project directory.** Use the shared paths directly to save disk space. * **Always use absolute paths** to reference data in your scripts, so they work from any directory. * **Check the version** of the reference genome and index before starting an analysis. Mixing different versions will cause errors. * **Document which reference you used** in your project notes for reproducibility. ---- ===== See Also ===== * [[pipelines:start|Pipelines]] -- workflows that use these reference data * [[scripts:alignment|Alignment Scripts]] -- examples using BWA with the shared references * [[software:start|Software]] -- tools for working with genomic data