Table of Contents
Databases & Reference Data
ABI maintains a collection of reference genomes, indexes, and shared databases for use in bioinformatics analyses. These are stored on shared storage and are read-only for regular users.
Location
All shared databases are located under:
/mnt/nas1/db/
This volume is served from the NAS (nas1:/znas1/abi/collections/db) and has approximately 32 TB of total space (~1.8 TB currently used).
Reference Genomes
Pre-built reference genomes and their indexes are stored at:
/mnt/nas1/db/genomes/
Available Genomes
TODO: Fill in the table below with the actual contents of /mnt/nas1/db/genomes/. Runls /mnt/nas1/db/genomes/to get the full list.
| Organism | Assembly | Path | Indexes Available |
|---|---|---|---|
| Human | GRCh38.p14 | /mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/ | BWA, *TODO: others?* |
| *TODO* | *TODO* | *TODO* | *TODO* |
BWA Indexes
BWA indexes for the human reference genome are located at:
/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/
This directory contains:
GCF_000001405.40_GRCh38.p14_genomic.fna– the reference FASTA.amb,.ann,.bwt,.pac,.sa– BWA index files
Important: The FASTA file (.fna/.fa) must be in the same directory as the index files for BWA to work.
Usage example:
REF="/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/GCF_000001405.40_GRCh38.p14_genomic.fna" bwa mem -t 8 $REF reads_1.fq.gz reads_2.fq.gz | samtools sort -o aligned.sorted.bam -
Building Your Own Index
If you need an index for a genome not listed above, you can build it yourself:
# BWA index bwa index reference.fasta # samtools index (for BAM operations) samtools faidx reference.fasta # STAR index (for RNA-seq) STAR --runMode genomeGenerate --genomeDir star_index/ --genomeFastaFiles reference.fasta --sjdbGTFfile annotations.gtf
Request addition: If you think a genome or index should be added to the shared collection, email it-support@abi.am with:
- Organism and assembly version
- Download source (e.g., NCBI, Ensembl, UCSC)
- Which indexes you need (BWA, Bowtie2, STAR, etc.)
Other Shared Databases
TODO: List any other shared databases available at ABI. Examples might include:
| Database | Description | Path |
|---|---|---|
| *TODO: e.g., BLAST NT/NR* | *NCBI nucleotide/protein databases* | *TODO: /mnt/nas1/db/blast/* |
| *TODO: e.g., Kraken2 DB* | *Taxonomic classification database* | *TODO: /mnt/nas1/db/kraken2/* |
| *TODO: e.g., dbSNP* | *Known human variants* | *TODO* |
| *TODO* | *Add more as needed* | *TODO* |
Directory Structure
TODO: Runls -la /mnt/nas1/db/on the server and paste the top-level structure here. Example:
/mnt/nas1/db/
genomes/
homo_sapiens/
GRCh38.p14/
bwa_mem_0.7.17-r1188/
TODO: other organisms
TODO: other database directories
Best Practices
- Do not copy reference data to your home or project directory. Use the shared paths directly to save disk space.
- Always use absolute paths to reference data in your scripts, so they work from any directory.
- Check the version of the reference genome and index before starting an analysis. Mixing different versions will cause errors.
- Document which reference you used in your project notes for reproducibility.
See Also
- Pipelines – workflows that use these reference data
- Alignment Scripts – examples using BWA with the shared references
- Software – tools for working with genomic data
