Demultiplexer#
Add to module run order:
#BioModule biolockj.module.implicit.Demultiplexer
Description#
Demultiplex samples into separate files for each sample.
Properties#
Properties are the name=value
pairs in the configuration file.
Demultiplexer properties:#
Property | Description |
---|---|
demultiplexer.barcodeCutoff | numeric Options: (0.0 - 1.0); if defined, pipeline will fail if the percentage of reads with a barcode is less than this cutoff. -> (DeuxUtil) default: 0.05 |
demultiplexer.barcodeRevComp | boolean Options: Y/N. Use reverse compliment of metadata.barcodeColumn if demultimplexer.strategy = barcode_in_header or barcode_in_seq. -> (DeuxUtil) default: null |
demultiplexer.strategy | string Options: barcode_in_header, barcode_in_seq, id_in_header, do_not_demux.If using barcodes, they must be provided in the metadata file within column defined by metadata.barcodeColumn. -> (DeuxUtil) default: null |
metadata.barcodeColumn | string metadata column with identifying barcodes -> Values must be unique. default: BarcodeSequence |
metadata.filePath | file path If absolute file path, use file as metadata. If directory path, must find exactly 1 file within, to use as metadata. -> Used for matching sample id to barcodes. default: null |
General properties applicable to this module:#
Property | Description |
---|---|
cluster.batchCommand | string Terminal command used to submit jobs on the cluster default: null |
cluster.jobHeader | string Header written at top of worker scripts default: null |
cluster.modules | list List of cluster modules to load at start of worker scripts default: null |
cluster.prologue | string To run at the start of every script after loading cluster modules (if any) default: null |
cluster.statusCommand | string Terminal command used to check the status of jobs on the cluster default: null |
demultiplexer.barcodeCutoff | numeric Options: (0.0 - 1.0); if defined, pipeline will fail if the percentage of reads with a barcode is less than this cutoff. -> (DeuxUtil) default: 0.05 |
demultiplexer.barcodeRevComp | boolean Options: Y/N. Use reverse compliment of metadata.barcodeColumn if demultimplexer.strategy = barcode_in_header or barcode_in_seq. -> (DeuxUtil) default: null |
demultiplexer.strategy | string Options: barcode_in_header, barcode_in_seq, id_in_header, do_not_demux.If using barcodes, they must be provided in the metadata file within column defined by metadata.barcodeColumn. -> (DeuxUtil) default: null |
docker.saveContainerOnExit | boolean If Y, docker run command will NOT include the --rm flag default: null |
docker.verifyImage | boolean In check dependencies, run a test to verify the docker image. default: null |
metadata.barcodeColumn | string metadata column with identifying barcodes -> Values must be unique. default: BarcodeSequence |
metadata.filePath | file path If absolute file path, use file as metadata. If directory path, must find exactly 1 file within, to use as metadata. -> Used for matching sample id to barcodes. default: null |
pipeline.defaultDemultiplexer | string Java class name for default module used to demultiplex data default: biolockj.module.implicit.Demultiplexer |
script.defaultHeader | string Store default script header for MAIN script and locally run WORKER scripts. default: #!/bin/bash |
script.numThreads | integer Used to reserve cluster resources and passed to any external application call that accepts a numThreads parameter. default: 8 |
script.numWorkers | integer Set number of samples to process per script (if parallel processing) default: 1 |
script.permissions | string Used as chmod permission parameter (ex: 774) default: 770 |
script.timeout | integer Sets # of minutes before worker scripts times out. default: null |
Details#
version: 0.0.0
When BioLockJ detects that the input is multiplexed data, BioLockJ automatically adds a Demultiplexer as the 2nd module, using the class path supplied via the pipeline.defaultDemultiplexer
property. (ImportMetadata is added as the first module.)
This Demultiplexer requires that the sequence headers contain either the Sample ID or an identifying barcode. Optionally, the barcode can be contained in the sequence itself. If your data does not conform to one of the following scenarios you will need to pre-process your sequence data to conform to a valid format.
If samples are not identified by sample ID in the sequence headers:#
- Set demux.strategy=id_in_header
- Set input.trimPrefix to a character string that precedes the sample ID for all samples.
- Set input.trimSuffix to a character string that comes after the sample ID for all samples.
Sample IDs = mbs1, mbs2, mbs3, mbs4
Scenario 1: Your multiplexed files include Sample IDs in the fastq sequence headers
@mbs1_134_M01825:384:000000000-BCYPK:1:2106:23543:1336 1:N:0
@mbs2_12_M02825:384:000000000-BCYPK:1:1322:23543:1336 1:N:0
@mbs3_551_M03825:384:000000000-BCYPK:1:1123:23543:1336 1:N:0
@mbs4_1234_M04825:384:000000000-BCYPK:1:9872:23543:1336 1:N:0
Required Config
+ input.trimPrefix=@
+ input.trimSuffix=_
All characters before (and including) the 1st "@" in the sequence header are trimmed
All characters after (and including) the 1st "_" in the sequence header are trimmed
If samples are identified by barcode (in the header or sequence):#
- Set demux.strategy=barcode_in_header or demux.strategy=barcode_in_seq
- Set metadata.filePath to metadata file path.
- Set metadata.barcodeColumn to the barcode column name.
- If the metadata barcodes are listed as reverse compliments, set demultiplexer.barcodeRevComp=Y.
The metadata file must be prepared by adding a unique sequence barcode in the metadata.barcodeColumn column. This information is often available in a mapping file provided by the sequencing center that produced the raw data.
Metadata file
ID | BarcodeColumn |
---|---|
mbs1 | GAGGCATGACTGGATA |
mbs2 | NAGGCATATTTGCACA |
mbs3 | GACCCATGACTGCATA |
mbs4 | TACCCAGCACCGCTTA |
Scenario 2: Your multiplexed files include a barcode in the headers
@M01825:384:000000000-BCYPK:1:2106:23543:1336 1:N:0:GAGGCATGACTGGATA
@M01825:384:000000000-BCYPK:1:1322:23543:1336 1:N:0:NAGGCATATTTGCACA
@M01825:384:000000000-BCYPK:1:1123:23543:1336 1:N:0:GACCCATGACTGCATA
@M01825:384:000000000-BCYPK:1:9872:23543:1336 1:N:0:TACCCAGCACCGCTTA
Required Config
+ demux.strategy=barcode_in_header
+ metadata.barcodeColumn=BarcodeColumn
+ metadata.filePath=
Scenario 3: Your multiplexed files include a barcode in the sequences
>M01825:384:000000000-BCYPK:1:2106:23543:1336 1:N:0:
GAGGCATGACTGGATATATACATACTGAGGCATGACTACTTACTATAAGGCTTACTGACTGGTTACTGACTGGGAGGCATGACTACTTACTATAA
>M01825:384:000000000-BCYPK:1:1322:23543:1336 1:N:0:
CAGGCATATTTGCACACTAGAGGCAAGTTACTGACTGGATATACTGAGGCATGGGAGGCATGACTCTATAAGGCTTACTGACTGGTTACTGACTG
>M01825:384:000000000-BCYPK:1:1123:23543:1336 1:N:0: CCATGAGACCTGCATA
CCATGAGACCTGCATACACTGTGGGAGGCATGACTCACTATAAACTACTACTGACTGGATATACTGAGGCATACTGACTGGTTACTTATAAGGCT
>M01825:384:000000000-BCYPK:1:9872:23543:1336 1:N:0:TACCCAGCACCGCTTA
TACCCAGCACCGCTTCCTTGACTTGGGAGGCATGACTCACTATAAACTACTACTGACTGGATATACTGAGGCATACTGACTGGTTACTTATAAGG
Adds modules#
pre-requisite modules
none found
post-requisite modules
none found
Docker#
If running in docker, this module will run in a docker container from this image:
biolockjdevteam/biolockj_controller:v1.4.2
This can be modified using the following properties:
Demultiplexer.imageOwner
Demultiplexer.imageName
Demultiplexer.imageTag
Citation#
Module developed by Mike Sioda
BioLockJ v1.4.2