Table of Contents

Databases & Reference Data

ABI maintains a collection of reference genomes, indexes, and shared databases for use in bioinformatics analyses. These are stored on shared storage and are read-only for regular users.

Location

All shared databases are located under:

/mnt/nas1/db/

This volume is served from the NAS (nas1:/znas1/abi/collections/db) and has approximately 32 TB of total space (~1.8 TB currently used).

Reference Genomes

Pre-built reference genomes and their indexes are stored at:

/mnt/nas1/db/genomes/

Available Genomes

TODO: Fill in the table below with the actual contents of /mnt/nas1/db/genomes/. Run ls /mnt/nas1/db/genomes/ to get the full list.
Organism Assembly Path Indexes Available
Human GRCh38.p14 /mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/ BWA, *TODO: others?*
*TODO* *TODO* *TODO* *TODO*

BWA Indexes

BWA indexes for the human reference genome are located at:

/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/

This directory contains:

Important: The FASTA file (.fna/.fa) must be in the same directory as the index files for BWA to work.

Usage example:

REF="/mnt/nas1/db/genomes/homo_sapiens/GRCh38.p14/bwa_mem_0.7.17-r1188/GCF_000001405.40_GRCh38.p14_genomic.fna"
bwa mem -t 8 $REF reads_1.fq.gz reads_2.fq.gz | samtools sort -o aligned.sorted.bam -

Building Your Own Index

If you need an index for a genome not listed above, you can build it yourself:

# BWA index
bwa index reference.fasta
 
# samtools index (for BAM operations)
samtools faidx reference.fasta
 
# STAR index (for RNA-seq)
STAR --runMode genomeGenerate --genomeDir star_index/ --genomeFastaFiles reference.fasta --sjdbGTFfile annotations.gtf

Request addition: If you think a genome or index should be added to the shared collection, email it-support@abi.am with:


Other Shared Databases

TODO: List any other shared databases available at ABI. Examples might include:
Database Description Path
*TODO: e.g., BLAST NT/NR* *NCBI nucleotide/protein databases* *TODO: /mnt/nas1/db/blast/*
*TODO: e.g., Kraken2 DB* *Taxonomic classification database* *TODO: /mnt/nas1/db/kraken2/*
*TODO: e.g., dbSNP* *Known human variants* *TODO*
*TODO* *Add more as needed* *TODO*

Directory Structure

TODO: Run ls -la /mnt/nas1/db/ on the server and paste the top-level structure here. Example:
/mnt/nas1/db/
  genomes/
    homo_sapiens/
      GRCh38.p14/
        bwa_mem_0.7.17-r1188/
    TODO: other organisms
  TODO: other database directories

Best Practices


See Also