RarefySeqs#
Add to module run order:
#BioModule biolockj.module.seq.RarefySeqs
Description#
Randomly sub-sample sequences to reduce all samples to the configured maximum.
Properties#
Properties are the name=value
pairs in the configuration file.
RarefySeqs properties:#
Property | Description |
---|---|
rarefySeqs.max | numeric Randomly select this number of sequences to keep in each sample default: null |
rarefySeqs.min | numeric Discard samples with less than minimum number of sequences default: 1 |
General properties applicable to this module:#
Property | Description |
---|---|
cluster.batchCommand | string Terminal command used to submit jobs on the cluster default: null |
cluster.jobHeader | string Header written at top of worker scripts default: null |
cluster.modules | list List of cluster modules to load at start of worker scripts default: null |
cluster.prologue | string To run at the start of every script after loading cluster modules (if any) default: null |
cluster.statusCommand | string Terminal command used to check the status of jobs on the cluster default: null |
docker.saveContainerOnExit | boolean If Y, docker run command will NOT include the --rm flag default: null |
docker.verifyImage | boolean In check dependencies, run a test to verify the docker image. default: null |
pipeline.defaultSeqMerger | string Java class name for default module used combined paired read files default: biolockj.module.seq.PearMergeReads |
script.defaultHeader | string Store default script header for MAIN script and locally run WORKER scripts. default: #!/bin/bash |
script.numThreads | integer Used to reserve cluster resources and passed to any external application call that accepts a numThreads parameter. default: 8 |
script.numWorkers | integer Set number of samples to process per script (if parallel processing) default: 1 |
script.permissions | string Used as chmod permission parameter (ex: 774) default: 770 |
script.timeout | integer Sets # of minutes before worker scripts times out. default: null |
Details#
version: 0.0.0
Randomly sub-sample sequences to reduce all samples to the configured maximum rarefySeqs.max
. Samples with less than the minimum number of reads rarefySeqs.min
are discarded.
This module will add biolockj.module.implicit.RegisterNumReads if there is not already a module to count starting reads per sample.
If the input data are paired reads, this module will add a sequence merger, based on property pipeline.defaultSeqMerger
(currently: biolockj.module.seq.PearMergeReads).
Adds modules#
pre-requisite modules
pipeline-dependent
post-requisite modules
none found
Docker#
If running in docker, this module will run in a docker container from this image:
biolockjdevteam/biolockj_controller:v1.4.2
This can be modified using the following properties:
RarefySeqs.imageOwner
RarefySeqs.imageName
RarefySeqs.imageTag
Citation#
Module developed by Mike Sioda
BioLockJ v1.4.2