SraMetaDB#

Add to module run order:
#BioModule biolockj.module.getData.sra.SraMetaDB

Description#

Makes sure that the SRAmetadb exists, downloads if it does not already exist.

Properties#

Properties are the name=value pairs in the configuration file.

SraMetaDB properties:#

Property Description
exe.gunzip executable
Path for the "gunzip" executable; if not supplied, any script that needs the gunzip command will assume it is on the PATH.
default: null
exe.wget executable
Path for the "wget" executable; if not supplied, any script that needs the wget command will assume it is on the PATH.
default: null
sra.forceDbUpdate boolean
Y/N: download a newer verionsion if available.
default: N
sra.metaDataDir file path
path to the directory where the SRAmetadb.sqlite database is stored.
default: null

General properties applicable to this module:#

Property Description
cluster.batchCommand string
Terminal command used to submit jobs on the cluster
default: null
cluster.jobHeader string
Header written at top of worker scripts
default: null
cluster.modules list
List of cluster modules to load at start of worker scripts
default: null
cluster.prologue string
To run at the start of every script after loading cluster modules (if any)
default: null
cluster.statusCommand string
Terminal command used to check the status of jobs on the cluster
default: null
docker.saveContainerOnExit boolean
If Y, docker run command will NOT include the --rm flag
default: null
docker.verifyImage boolean
In check dependencies, run a test to verify the docker image.
default: null
script.defaultHeader string
Store default script header for MAIN script and locally run WORKER scripts.
default: #!/bin/bash
script.numThreads integer
Used to reserve cluster resources and passed to any external application call that accepts a numThreads parameter.
default: 8
script.numWorkers integer
Set number of samples to process per script (if parallel processing)
default: 1
script.permissions string
Used as chmod permission parameter (ex: 774)
default: 770
script.timeout integer
Sets # of minutes before worker scripts times out.
default: null

Details#

version: 0.0.0 If sra.forceDbUpdate is set to Y, then the zipped form of the database is downloaded, and kept and used to compare the local version to the server version; and the server version is downloaded if it is newer.

Server version location: https://starbuck1.s3.amazonaws.com/sradb/SRAmetadb.sqlite.gz

sra.metaDataDir directory must exist. If the database does not exist at that location, it will be downloaded.

The download process is somewhat error-prone, especially in docker. The download is about 4GB and the unzipped database is up to 30GB.It is generally recommended to download and unzip the database manually:

wget https://starbuck1.s3.amazonaws.com/sradb/SRAmetadb.sqlite.gz;
gunzip SRAmetadb.sqlite

Adds modules#

pre-requisite modules
none found
post-requisite modules
none found

Docker#

If running in docker, this module will run in a docker container from this image:

biolockjdevteam/blj_basic:v1.3.18

This can be modified using the following properties:
SraMetaDB.imageOwner
SraMetaDB.imageName
SraMetaDB.imageTag

Citation#

Module developed by Malcolm Zapatas and Ivory Blakley
BioLockJ v1.4.2