Databases & Reference Data

ABI maintains a collection of reference genomes, indexes, and shared databases for use in bioinformatics analyses. These are stored on shared storage and are read-only for regular users.

Location

All shared databases are located under:

/mnt/nas1/db/

This volume is served from the NAS (nas1:/znas1/abi/collections/db) and has approximately 32 TB of total space (~1.8 TB currently used).

Reference Genomes

Pre-built reference genomes and their indexes are stored at:

/mnt/nas1/db/genomes/

Available Genomes

TODO: Fill in the table below with the actual contents of /mnt/nas1/db/genomes/. Run ls /mnt/nas1/db/genomes/ to get the full list.
Organism Assembly Path Indexes Available
Human GRCh38.p14 /mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/ BWA, *TODO: others?*
*TODO* *TODO* *TODO* *TODO*

BWA Indexes

BWA indexes for the human reference genome are located at:

/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/

This directory contains:

  • GCF_000001405.40_GRCh38.p14_genomic.fna – the reference FASTA
  • .amb, .ann, .bwt, .pac, .sa – BWA index files

Important: The FASTA file (.fna/.fa) must be in the same directory as the index files for BWA to work.

Usage example:

REF="/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/GCF_000001405.40_GRCh38.p14_genomic.fna"
bwa mem -t 8 $REF reads_1.fq.gz reads_2.fq.gz | samtools sort -o aligned.sorted.bam -

Building Your Own Index

If you need an index for a genome not listed above, you can build it yourself:

# BWA index
bwa index reference.fasta
 
# samtools index (for BAM operations)
samtools faidx reference.fasta
 
# STAR index (for RNA-seq)
STAR --runMode genomeGenerate --genomeDir star_index/ --genomeFastaFiles reference.fasta --sjdbGTFfile annotations.gtf

Request addition: If you think a genome or index should be added to the shared collection, email it-support@abi.am with:

  • Organism and assembly version
  • Download source (e.g., NCBI, Ensembl, UCSC)
  • Which indexes you need (BWA, Bowtie2, STAR, etc.)

Other Shared Databases

TODO: List any other shared databases available at ABI. Examples might include:
Database Description Path
*TODO: e.g., BLAST NT/NR* *NCBI nucleotide/protein databases* *TODO: /mnt/nas1/db/blast/*
*TODO: e.g., Kraken2 DB* *Taxonomic classification database* *TODO: /mnt/nas1/db/kraken2/*
*TODO: e.g., dbSNP* *Known human variants* *TODO*
*TODO* *Add more as needed* *TODO*

Directory Structure

TODO: Run ls -la /mnt/nas1/db/ on the server and paste the top-level structure here. Example:
/mnt/nas1/db/
  genomes/
    homo_sapiens/
      GRCh38.p14/
        bwa_mem_0.7.17-r1188/
    TODO: other organisms
  TODO: other database directories

Best Practices

  • Do not copy reference data to your home or project directory. Use the shared paths directly to save disk space.
  • Always use absolute paths to reference data in your scripts, so they work from any directory.
  • Check the version of the reference genome and index before starting an analysis. Mixing different versions will cause errors.
  • Document which reference you used in your project notes for reproducibility.

See Also

  • Pipelines – workflows that use these reference data
  • Alignment Scripts – examples using BWA with the shared references
  • Software – tools for working with genomic data