public class SeqFileValidator extends JavaModuleImpl implements SeqModule, ApiModule, ReadCounter
Modifier and Type | Field and Description |
---|---|
protected static String |
INPUT_SEQ_MAX
Config Integer property "seqFileValidator.seqMaxLen" defines the maximum number of bases per read |
protected static String |
INPUT_SEQ_MIN
Config Integer property "seqFileValidator.seqMinLen" defines the minimum number of bases per read |
static String |
NUM_VALID_READS
Column name that holds number of valid reads per sample: "Num_Valid_Reads"
|
protected static String |
REQUIRE_EUQL_NUM_PAIRS
Config Boolean property "seqFileValidator.requireEqualNumPairs" determines if module requires equal
number of forward and reverse reads (simple check). |
BLJ_OPTIONS
GZIP_EXT, LOG_EXT, PDF_EXT, RETURN, SH_EXT, TAB_DELIM, TSV_EXT, TXT_EXT
LOG_DIR, MAIN_SCRIPT_PREFIX, NO_VERSION, OUTPUT_DIR, RES_DIR, TEMP_DIR
Constructor and Description |
---|
SeqFileValidator() |
Modifier and Type | Method and Description |
---|---|
void |
checkDependencies()
Validate module dependencies:
Require
Config . exists
Require Config . is positive integer
Require Config . is positive integer
Verify Config . is positive integer if set
Start the AWS DB sync to S3 if a novel DB has been configure and
"aws.copyDbToS3" is enabled
|
void |
cleanUp()
Set "Num_Valid_Reads" as the number of reads field.
|
String |
getCitationString()
At a minimum, this should return the name and/or url for the wrapped tool.
|
String |
getDescription()
Briefly describe what this module does.
|
List<File> |
getSeqFiles(Collection<File> files)
Return only sequence files for sample IDs found in the metadata file.
If Config ."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row. |
String |
getSummary()
Produce a summary message with counts on total number of reads and number of valid reads containing a barcode
defined in the metadata file.
|
Boolean |
isValidProp(String property)
Tests to see if the value val is valid for property prop; primarily tests format.
|
protected void |
removeBadFiles()
Remove sequence files in which all reads failed validation checks, leaving only an empty file.
|
void |
runModule()
Cache sampleIds to compare to validated sampleIds post-processing.
|
protected void |
validateFile(File file,
Integer fileCount)
Validate sequence files:
Validate valid 1st sequence header character is expected character
Validate fastq files have same number of bases and quality scores per read
Remove reads below minimum threshold: "seqFileValidator.seqMinLen"
Trim reads if above the maximum threshold: "seqFileValidator.seqMaxLen"
Invalid reads are saved to a file in the module temp directory for analysis/review.
|
protected void |
verifyPairedSeqs()
Verify equal number of forward and reverse read files.
if "seqFileValidator.requireEqualNumPairs"="Y", verify forward and reverse read files have an equal number of reads. |
buildScript, executeTask, getDockerImageName, getDockerImageOwner, getDockerImageTag, getWorkerScriptFunctions, isValidInputModule, markStatus, moduleComplete, moduleFailed, runBioLockJ_CMD
buildScriptForPairedReads, getJobParams, getMainScript, getRuntimeParams, getScriptDir, getScriptErrors, getTimeout, hasScripts
addGeneralProperty, addGeneralProperty, addGeneralProperty, addNewProperty, addNewProperty, cacheInputFiles, compareTo, equals, findModuleInputFiles, getAlias, getDescription, getDetails, getFileCache, getID, getInputFiles, getLogDir, getMenuPlacement, getMetadata, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getPropDefault, getPropDescMap, getPropType, getPropTypeMap, getResourceDir, getTempDir, getTitle, hashCode, init, listProps, setAlias, toString
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
buildScript, buildScriptForPairedReads, getJobParams, getMainScript, getScriptDir, getScriptErrors, getTimeout, getWorkerScriptFunctions
executeTask, getAlias, getDockerImageName, getDockerImageOwner, getDockerImageTag, getID, getInputFiles, getLogDir, getMetadata, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getPropDefault, getResourceDir, getTempDir, init, isValidInputModule, setAlias, version
getDescription, getDetails, getMenuPlacement, getPropType, getTitle, listProps
public static final String NUM_VALID_READS
protected static final String INPUT_SEQ_MAX
Config
Integer property "seqFileValidator.seqMaxLen" defines the maximum number of bases per readprotected static final String INPUT_SEQ_MIN
Config
Integer property "seqFileValidator.seqMinLen" defines the minimum number of bases per readprotected static final String REQUIRE_EUQL_NUM_PAIRS
Config
Boolean property "seqFileValidator.requireEqualNumPairs" determines if module requires equal
number of forward and reverse reads (simple check).public void checkDependencies() throws Exception
ScriptModuleImpl
Config
. exists
Config
. is positive integer
Config
. is positive integer
Config
. is positive integer if set
checkDependencies
in interface BioModule
checkDependencies
in class ScriptModuleImpl
Exception
- thrown if missing or invalid dependencies are foundpublic Boolean isValidProp(String property) throws Exception
ApiModule
BioModule.checkDependencies()
. Using switch/case or a stack of if/else is recommended.
Within each case, call any/all method that is used by this module to access the value from the config file,
leveraging the checks in the Config.get* methods.
This method should never actually return false. If the value is not valid, it should throw an exception that
includes a helpful message about whats not valid. As part of a throwable, that message is passed along to
wherever the call started. Any time that "false" is actually the desired form, this method should be wrapped in
a try/catch.isValidProp
in interface ApiModule
isValidProp
in class ScriptModuleImpl
Exception
public void cleanUp() throws Exception
cleanUp
in interface BioModule
cleanUp
in class BioModuleImpl
Exception
- thrown if any runtime error occurspublic List<File> getSeqFiles(Collection<File> files) throws SequnceFormatException
SeqModule
Config
."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row.getSeqFiles
in interface SeqModule
files
- Module input filesSequnceFormatException
- If Config
."metadata.required" =
"Y" but sequence files found that do not have a corresponding record in the metadata
file or if invalid metadata prevents parsing SEQ files.public String getSummary() throws Exception
getSummary
in interface BioModule
getSummary
in class ScriptModuleImpl
Exception
- if any error occurspublic void runModule() throws Exception
validateFile(File, Integer)
for
each input file.removeBadFiles()
to remove empty files (cases where all reads fail validation).verifyPairedSeqs()
if module input files are paired read files.MetaUtil.addColumn(String, Map, File, boolean)
runModule
in interface JavaModule
runModule
in class JavaModuleImpl
Exception
- thrown if any runtime error occursprotected void removeBadFiles()
protected void validateFile(File file, Integer fileCount) throws Exception
file
- Sequence filefileCount
- Integer countException
- if I/O errors occur while processing sequence filesprotected void verifyPairedSeqs() throws Exception
Exception
- if validations fail or errors occurpublic String getDescription()
ApiModule
getDetails
.getDescription
in interface ApiModule
public String getCitationString()
ApiModule
getCitationString
in interface ApiModule