====== Databases & Reference Data ======
ABI maintains a collection of reference genomes, indexes, and shared databases for use in bioinformatics analyses. These are stored on shared storage and are **read-only** for regular users.
===== Location =====
All shared databases are located under:
/mnt/nas1/db/
This volume is served from the NAS (''nas1:/znas1/abi/collections/db'') and has approximately 32 TB of total space (~1.8 TB currently used).
===== Reference Genomes =====
Pre-built reference genomes and their indexes are stored at:
/mnt/nas1/db/genomes/
=== Available Genomes ===
> TODO: Fill in the table below with the actual contents of /mnt/nas1/db/genomes/. Run ''ls /mnt/nas1/db/genomes/'' to get the full list.
^ Organism ^ Assembly ^ Path ^ Indexes Available ^
| Human | GRCh38.p14 | ''/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/'' | BWA, *TODO: others?* |
| *TODO* | *TODO* | *TODO* | *TODO* |
=== BWA Indexes ===
BWA indexes for the human reference genome are located at:
/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/
This directory contains:
* ''GCF_000001405.40_GRCh38.p14_genomic.fna'' -- the reference FASTA
* ''.amb'', ''.ann'', ''.bwt'', ''.pac'', ''.sa'' -- BWA index files
**Important:** The FASTA file (.fna/.fa) must be in the same directory as the index files for BWA to work.
Usage example:
REF="/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/GCF_000001405.40_GRCh38.p14_genomic.fna"
bwa mem -t 8 $REF reads_1.fq.gz reads_2.fq.gz | samtools sort -o aligned.sorted.bam -
=== Building Your Own Index ===
If you need an index for a genome not listed above, you can build it yourself:
# BWA index
bwa index reference.fasta
# samtools index (for BAM operations)
samtools faidx reference.fasta
# STAR index (for RNA-seq)
STAR --runMode genomeGenerate --genomeDir star_index/ --genomeFastaFiles reference.fasta --sjdbGTFfile annotations.gtf
**Request addition:** If you think a genome or index should be added to the shared collection, email **[[mailto:it-support@abi.am|it-support@abi.am]]** with:
* Organism and assembly version
* Download source (e.g., NCBI, Ensembl, UCSC)
* Which indexes you need (BWA, Bowtie2, STAR, etc.)
----
===== Other Shared Databases =====
> TODO: List any other shared databases available at ABI. Examples might include:
^ Database ^ Description ^ Path ^
| *TODO: e.g., BLAST NT/NR* | *NCBI nucleotide/protein databases* | *TODO: /mnt/nas1/db/blast/* |
| *TODO: e.g., Kraken2 DB* | *Taxonomic classification database* | *TODO: /mnt/nas1/db/kraken2/* |
| *TODO: e.g., dbSNP* | *Known human variants* | *TODO* |
| *TODO* | *Add more as needed* | *TODO* |
----
===== Directory Structure =====
> TODO: Run ''ls -la /mnt/nas1/db/'' on the server and paste the top-level structure here. Example:
/mnt/nas1/db/
genomes/
homo_sapiens/
GRCh38.p14/
bwa_mem_0.7.17-r1188/
TODO: other organisms
TODO: other database directories
----
===== Best Practices =====
* **Do not copy reference data to your home or project directory.** Use the shared paths directly to save disk space.
* **Always use absolute paths** to reference data in your scripts, so they work from any directory.
* **Check the version** of the reference genome and index before starting an analysis. Mixing different versions will cause errors.
* **Document which reference you used** in your project notes for reproducibility.
----
===== See Also =====
* [[pipelines:start|Pipelines]] -- workflows that use these reference data
* [[scripts:alignment|Alignment Scripts]] -- examples using BWA with the shared references
* [[software:start|Software]] -- tools for working with genomic data