NormalizeTaxaTables#

Add to module run order:
#BioModule biolockj.module.report.taxa.NormalizeTaxaTables

Description#

Normalize taxa tables for sequencing depth.

Properties#

Properties are the name=value pairs in the configuration file.

NormalizeTaxaTables properties:#

none

General properties applicable to this module:#

Property	Description
cluster.batchCommand	string Terminal command used to submit jobs on the cluster default: null
cluster.jobHeader	string Header written at top of worker scripts default: null
cluster.modules	list List of cluster modules to load at start of worker scripts default: null
cluster.prologue	string To run at the start of every script after loading cluster modules (if any) default: null
cluster.statusCommand	string Terminal command used to check the status of jobs on the cluster default: null
docker.saveContainerOnExit	boolean If Y, docker run command will NOT include the --rm flag default: null
docker.verifyImage	boolean In check dependencies, run a test to verify the docker image. default: null
report.logBase	string Options: 10,e,null. If e, use natural log (base e); if 10, use log base 10; if not set, counts will not be converted to a log scale. default: 10
script.defaultHeader	string Store default script header for MAIN script and locally run WORKER scripts. default: #!/bin/bash
script.numThreads	integer Used to reserve cluster resources and passed to any external application call that accepts a numThreads parameter. default: 8
script.numWorkers	integer Set number of samples to process per script (if parallel processing) default: 1
script.permissions	string Used as chmod permission parameter (ex: 774) default: 770
script.timeout	integer Sets # of minutes before worker scripts times out. default: null

Details#

version: 1.0.0

Normalize taxa tables based on formula:

$counts_{normalized} = \frac{counts_{raw}}{n} \frac{\sum (x)}{N} +1$

Where:

$counts_{raw}$ = raw count; the cell value before normalizing
$n$ = number of sequences in the sample (total within a sample)
$\sum (x)$ = total number of counts in the table (total across samples)
$N$ = total number of samples

Typically the data is put on a $Log_{10}$ scale, so the full forumula is:

$counts_{final} = Log_{10} \biggl( \frac{counts_{raw}}{n} \frac{\sum (x)}{N} +1 \biggr)$

The $counts_{final}$ values will be in output dir of the LogTransformTaxaTables module. The $counts_{normalized}$ values will be in the output of the NormalizeTaxaTables module.

For further explanation regarding the normalization scheme, please read The ISME Journal 2013 paper by Dr. Anthony Fodor: "Stochastic changes over time and not founder effects drive cage effects in microbial community assembly in a mouse model"

If report.logBase is not null, then the LogTransformTaxaTables will be added as a post-requisite module.

Adds modules#

pre-requisite modules
pipeline-dependent
post-requisite modules
biolockj.module.report.taxa.LogTransformTaxaTables

Docker#

If running in docker, this module will run in a docker container from this image:

biolockjdevteam/biolockj_controller:v1.4.2

This can be modified using the following properties:
NormalizeTaxaTables.imageOwner
NormalizeTaxaTables.imageName
NormalizeTaxaTables.imageTag

Citation#

"Stochastic changes over time and not founder effects drive cage effects in microbial community assembly in a mouse model"
Module developed by Mike Sioda.