Input#

Specify the input data for the pipeline by providing the path to one or more directories using input.dirPaths. If using multiple paths, they should be separated by a comma.

Property	Description
input.allowDuplicateNames	boolean Should files with the same name be permitted in inputs. File names are used to link data to metadata, and duplicated names create ambiguity. However in some pipelines, duplicates are appropriate. default: N
input.dirPaths	list of file paths List of one or more directories containing the pipeline input data. default: null
input.ignoreFiles	list file names to ignore if found in input directories default: null
input.requireCompletePairs	boolean Require all sequence input files have matching paired reads default: Y
input.suffixFw	regex file suffix used to identify forward reads ininput.dirPaths default: _R1
input.suffixRv	regex file suffix used to identify reverse reads ininput.dirPaths default: _R2
input.trimPrefix	string Prefix to trim from sequence file names or headers to obtain Sample ID; this string can appear anywhere in the filename and all text before it will be removed. default: null
input.trimSuffix	string Suffix to trim from sequence file names or headers to obtain Sample ID; this string can appear anywhere in the filename and all text after it will be removed. default: null

BioLockJ will assume that the sample name for a given file is the same as the file name after removing the file suffix. This is often not-quite-enough. Use input.trimPrefix and input.trimSuffix to indicate additional text to remove from the file name to get the sample name. If using paired-end sequences, use input.suffixFw and input.suffixRv to indicate the forward and reverse reads for a given sample; these will also be removed when deriving the sample name.

Example#

Sample IDs = mbs1, mbs2, mbs3, mbs4

Example File names
+ gut_mbs1.fq.gz
+ gut_mbs2.fq.gz
+ oral_mbs3.fq
+ oral_mbs4.fq

Config Properties
+ input.trimPrefix=_
+ input.trimSuffix=.fq

All characters before (and including) the 1st "_" in the file name are trimmed

All characters after (and including) the 1st ".fq" in the file name are trimmed

BioLockJ automatically trims extensions ".fasta" and ".fastq" as if configured in input.trimSuffix.

Sometimes, there is no way to derive the sample name from the file name; or its simply inconvenient to. An alternative way to link files to sample names is to list the file names in the metadata in one or more columns (one file name per cell) and list the names of these columns in metadata.fileNameColumn; see Metatdata.

If you want process only a subset of the files in your input directories, then specifying the file names in the metadata is much more effecient than list all files to ignore in input.ignoreFiles.

Note that BioLockJ determines some information based on the type of data in the input directories. This is very helpful in determining appropriate sequence pre-processing steps. However it can be problematic when using an unusual input type. To avoid this automatic determineation, manually set pipeline.inputTypes. Setting this to "other" will avoid all assumptions that BioLockJ might make based on the input types.