SraMetaDB#
Add to module run order:
#BioModule biolockj.module.getData.sra.SraMetaDB
Description#
Makes sure that the SRAmetadb exists, downloads if it does not already exist.
Properties#
Properties are the name=value
pairs in the configuration file.
SraMetaDB properties:#
Property | Description |
---|---|
exe.gunzip | executable Path for the "gunzip" executable; if not supplied, any script that needs the gunzip command will assume it is on the PATH. default: null |
exe.wget | executable Path for the "wget" executable; if not supplied, any script that needs the wget command will assume it is on the PATH. default: null |
sra.forceDbUpdate | boolean Y/N: download a newer verionsion if available. default: N |
sra.metaDataDir | file path path to the directory where the SRAmetadb.sqlite database is stored. default: null |
General properties applicable to this module:#
Property | Description |
---|---|
cluster.batchCommand | string Terminal command used to submit jobs on the cluster default: null |
cluster.jobHeader | string Header written at top of worker scripts default: null |
cluster.modules | list List of cluster modules to load at start of worker scripts default: null |
cluster.prologue | string To run at the start of every script after loading cluster modules (if any) default: null |
cluster.statusCommand | string Terminal command used to check the status of jobs on the cluster default: null |
docker.saveContainerOnExit | boolean If Y, docker run command will NOT include the --rm flag default: null |
docker.verifyImage | boolean In check dependencies, run a test to verify the docker image. default: null |
script.defaultHeader | string Store default script header for MAIN script and locally run WORKER scripts. default: #!/bin/bash |
script.numThreads | integer Used to reserve cluster resources and passed to any external application call that accepts a numThreads parameter. default: 8 |
script.numWorkers | integer Set number of samples to process per script (if parallel processing) default: 1 |
script.permissions | string Used as chmod permission parameter (ex: 774) default: 770 |
script.timeout | integer Sets # of minutes before worker scripts times out. default: null |
Details#
version: 0.0.0 If sra.forceDbUpdate is set to Y, then the zipped form of the database is downloaded, and kept and used to compare the local version to the server version; and the server version is downloaded if it is newer.
Server version location: https://starbuck1.s3.amazonaws.com/sradb/SRAmetadb.sqlite.gz
sra.metaDataDir directory must exist. If the database does not exist at that location, it will be downloaded.
The download process is somewhat error-prone, especially in docker. The download is about 4GB and the unzipped database is up to 30GB.It is generally recommended to download and unzip the database manually:
wget https://starbuck1.s3.amazonaws.com/sradb/SRAmetadb.sqlite.gz;
gunzip SRAmetadb.sqlite
Adds modules#
pre-requisite modules
none found
post-requisite modules
none found
Docker#
If running in docker, this module will run in a docker container from this image:
biolockjdevteam/blj_basic:v1.3.18
This can be modified using the following properties:
SraMetaDB.imageOwner
SraMetaDB.imageName
SraMetaDB.imageTag
Citation#
Module developed by Malcolm Zapatas and Ivory Blakley
BioLockJ v1.4.2