SraDownload#
Add to module run order:
#BioModule biolockj.module.getData.sra.SraDownload
Description#
SraDownload downloads and compresses short read archive (SRA) files to fastq.gz
Properties#
Properties are the name=value
pairs in the configuration file.
SraDownload properties:#
Property | Description |
---|---|
exe.fasterq-dump | executable Path for the "fasterq-dump" executable; if not supplied, any script that needs the fasterq-dump command will assume it is on the PATH. default: null |
sra.accessionIdColumn | string Specifies the metadata file column name containing SRA run ids default: null |
sra.destinationDir | file path Path to directory where downloaded files should be saved. If specified, it must exist. default: null |
sra.sraAccList | file path A file that has one SRA accession per line and nothing else. default: null |
sra.sraProjectId | list The project id(s) referencesing a project in the NCBI SRA. example: SRP009633, ERP016051 default: null |
General properties applicable to this module:#
Property | Description |
---|---|
cluster.batchCommand | string Terminal command used to submit jobs on the cluster default: null |
cluster.jobHeader | string Header written at top of worker scripts default: null |
cluster.modules | list List of cluster modules to load at start of worker scripts default: null |
cluster.prologue | string To run at the start of every script after loading cluster modules (if any) default: null |
cluster.statusCommand | string Terminal command used to check the status of jobs on the cluster default: null |
docker.saveContainerOnExit | boolean If Y, docker run command will NOT include the --rm flag default: null |
docker.verifyImage | boolean In check dependencies, run a test to verify the docker image. default: null |
exe.gzip | executable Path for the "gzip" executable; if not supplied, any script that needs the gzip command will assume it is on the PATH. default: null |
metadata.filePath | file path If absolute file path, use file as metadata. If directory path, must find exactly 1 file within, to use as metadata. default: null |
script.defaultHeader | string Store default script header for MAIN script and locally run WORKER scripts. default: #!/bin/bash |
script.numThreads | integer Used to reserve cluster resources and passed to any external application call that accepts a numThreads parameter. default: 8 |
script.numWorkers | integer Set number of samples to process per script (if parallel processing) default: 1 |
script.permissions | string Used as chmod permission parameter (ex: 774) default: 770 |
script.timeout | integer Sets # of minutes before worker scripts times out. default: null |
Details#
version: 1.0.0
Downloading and compressing files requires fasterq-dump and gzip.The accessions to download can be specified using any ONE of the following:
1. A metadata file (given by metadata.filePath that has column sra.accessionIdColumn.
2. sra.sraProjectId, OR
3. sra.sraAccList
sra.destinationDir gives an external directory that can be shared across pipelines. This is recommended. If it is not specified, the files will be downlaoded to this modules output directory.
Suggested: input.dirPaths = ${sra.destinationDir}
Typically, BioLockJ will automatically determine modules to add to the pipeline to process sequence data. If the files are not present on the system when the pipeline starts, then it is up to the user to configure any and all sequence processing modules.
Adds modules#
pre-requisite modules
none found
post-requisite modules
none found
Docker#
If running in docker, this module will run in a docker container from this image:
biolockjdevteam/sratoolkit:v1.3.18
This can be modified using the following properties:
SraDownload.imageOwner
SraDownload.imageName
SraDownload.imageTag
Citation#
sra-tools
Module developed by Philip Badzuh.