RarefySeqs#
Add to module run order:                  
#BioModule biolockj.module.seq.RarefySeqs
Description#
Randomly sub-sample sequences to reduce all samples to the configured maximum.
Properties#
Properties are the name=value pairs in the configuration file.                   
RarefySeqs properties:#
| Property | Description | 
|---|---|
| rarefySeqs.max | numeric Randomly select this number of sequences to keep in each sample default: null | 
| rarefySeqs.min | numeric Discard samples with less than minimum number of sequences default: 1 | 
General properties applicable to this module:#
| Property | Description | 
|---|---|
| cluster.batchCommand | string Terminal command used to submit jobs on the cluster default: null | 
| cluster.jobHeader | string Header written at top of worker scripts default: null | 
| cluster.modules | list List of cluster modules to load at start of worker scripts default: null | 
| cluster.prologue | string To run at the start of every script after loading cluster modules (if any) default: null | 
| cluster.statusCommand | string Terminal command used to check the status of jobs on the cluster default: null | 
| docker.saveContainerOnExit | boolean If Y, docker run command will NOT include the --rm flag default: null | 
| docker.verifyImage | boolean In check dependencies, run a test to verify the docker image. default: null | 
| pipeline.defaultSeqMerger | string Java class name for default module used combined paired read files default: biolockj.module.seq.PearMergeReads | 
| script.defaultHeader | string Store default script header for MAIN script and locally run WORKER scripts. default: #!/bin/bash | 
| script.numThreads | integer Used to reserve cluster resources and passed to any external application call that accepts a numThreads parameter. default: 8 | 
| script.numWorkers | integer Set number of samples to process per script (if parallel processing) default: 1 | 
| script.permissions | string Used as chmod permission parameter (ex: 774) default: 770 | 
| script.timeout | integer Sets # of minutes before worker scripts times out. default: null | 
Details#
version: 0.0.0 
Randomly sub-sample sequences to reduce all samples to the configured maximum rarefySeqs.max.  Samples with less than the minimum number of reads rarefySeqs.min are discarded.
This module will add biolockj.module.implicit.RegisterNumReads if there is not already a module to count starting reads per sample.
If the input data are paired reads, this module will add a sequence merger, based on property pipeline.defaultSeqMerger (currently: biolockj.module.seq.PearMergeReads).
Adds modules#
pre-requisite modules                  
pipeline-dependent                 
post-requisite modules                  
none found                   
Docker#
If running in docker, this module will run in a docker container from this image:
biolockjdevteam/biolockj_controller:v1.4.2
This can be modified using the following properties:
RarefySeqs.imageOwner
RarefySeqs.imageName
RarefySeqs.imageTag
Citation#
Module developed by Mike Sioda                 
BioLockJ v1.4.2