public class SeqFileValidator extends JavaModuleImpl implements SeqModule, ApiModule, ReadCounter
| Modifier and Type | Field and Description |
|---|---|
protected static String |
INPUT_SEQ_MAX
Config Integer property "seqFileValidator.seqMaxLen" defines the maximum number of bases per read |
protected static String |
INPUT_SEQ_MIN
Config Integer property "seqFileValidator.seqMinLen" defines the minimum number of bases per read |
static String |
NUM_VALID_READS
Column name that holds number of valid reads per sample: "Num_Valid_Reads"
|
protected static String |
REQUIRE_EUQL_NUM_PAIRS
Config Boolean property "seqFileValidator.requireEqualNumPairs" determines if module requires equal
number of forward and reverse reads (simple check). |
BLJ_OPTIONSGZIP_EXT, LOG_EXT, PDF_EXT, RETURN, SH_EXT, TAB_DELIM, TSV_EXT, TXT_EXTLOG_DIR, MAIN_SCRIPT_PREFIX, NO_VERSION, OUTPUT_DIR, RES_DIR, TEMP_DIR| Constructor and Description |
|---|
SeqFileValidator() |
| Modifier and Type | Method and Description |
|---|---|
void |
checkDependencies()
Validate module dependencies:
Require
Config. exists
Require Config. is positive integer
Require Config. is positive integer
Verify Config. is positive integer if set
Start the AWS DB sync to S3 if a novel DB has been configure and
"aws.copyDbToS3" is enabled
|
void |
cleanUp()
Set "Num_Valid_Reads" as the number of reads field.
|
String |
getCitationString()
At a minimum, this should return the name and/or url for the wrapped tool.
|
String |
getDescription()
Briefly describe what this module does.
|
List<File> |
getSeqFiles(Collection<File> files)
Return only sequence files for sample IDs found in the metadata file.
If Config."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row. |
String |
getSummary()
Produce a summary message with counts on total number of reads and number of valid reads containing a barcode
defined in the metadata file.
|
Boolean |
isValidProp(String property)
Tests to see if the value val is valid for property prop; primarily tests format.
|
protected void |
removeBadFiles()
Remove sequence files in which all reads failed validation checks, leaving only an empty file.
|
void |
runModule()
Cache sampleIds to compare to validated sampleIds post-processing.
|
protected void |
validateFile(File file,
Integer fileCount)
Validate sequence files:
Validate valid 1st sequence header character is expected character
Validate fastq files have same number of bases and quality scores per read
Remove reads below minimum threshold: "seqFileValidator.seqMinLen"
Trim reads if above the maximum threshold: "seqFileValidator.seqMaxLen"
Invalid reads are saved to a file in the module temp directory for analysis/review.
|
protected void |
verifyPairedSeqs()
Verify equal number of forward and reverse read files.
if "seqFileValidator.requireEqualNumPairs"="Y", verify forward and reverse read files have an equal number of reads. |
buildScript, executeTask, getDockerImageName, getDockerImageOwner, getDockerImageTag, getWorkerScriptFunctions, isValidInputModule, markStatus, moduleComplete, moduleFailed, runBioLockJ_CMDbuildScriptForPairedReads, getJobParams, getMainScript, getRuntimeParams, getScriptDir, getScriptErrors, getTimeout, hasScriptsaddGeneralProperty, addGeneralProperty, addGeneralProperty, addNewProperty, addNewProperty, cacheInputFiles, compareTo, equals, findModuleInputFiles, getAlias, getDescription, getDetails, getFileCache, getID, getInputFiles, getLogDir, getMenuPlacement, getMetadata, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getPropDefault, getPropDescMap, getPropType, getPropTypeMap, getResourceDir, getTempDir, getTitle, hashCode, init, listProps, setAlias, toStringclone, finalize, getClass, notify, notifyAll, wait, wait, waitbuildScript, buildScriptForPairedReads, getJobParams, getMainScript, getScriptDir, getScriptErrors, getTimeout, getWorkerScriptFunctionsexecuteTask, getAlias, getDockerImageName, getDockerImageOwner, getDockerImageTag, getID, getInputFiles, getLogDir, getMetadata, getModuleDir, getOutputDir, getPostRequisiteModules, getPreRequisiteModules, getPropDefault, getResourceDir, getTempDir, init, isValidInputModule, setAlias, versiongetDescription, getDetails, getMenuPlacement, getPropType, getTitle, listPropspublic static final String NUM_VALID_READS
protected static final String INPUT_SEQ_MAX
Config Integer property "seqFileValidator.seqMaxLen" defines the maximum number of bases per readprotected static final String INPUT_SEQ_MIN
Config Integer property "seqFileValidator.seqMinLen" defines the minimum number of bases per readprotected static final String REQUIRE_EUQL_NUM_PAIRS
Config Boolean property "seqFileValidator.requireEqualNumPairs" determines if module requires equal
number of forward and reverse reads (simple check).public void checkDependencies()
throws Exception
ScriptModuleImplConfig. exists
Config. is positive integer
Config. is positive integer
Config. is positive integer if set
checkDependencies in interface BioModulecheckDependencies in class ScriptModuleImplException - thrown if missing or invalid dependencies are foundpublic Boolean isValidProp(String property) throws Exception
ApiModuleBioModule.checkDependencies(). Using switch/case or a stack of if/else is recommended.
Within each case, call any/all method that is used by this module to access the value from the config file,
leveraging the checks in the Config.get* methods.
This method should never actually return false. If the value is not valid, it should throw an exception that
includes a helpful message about whats not valid. As part of a throwable, that message is passed along to
wherever the call started. Any time that "false" is actually the desired form, this method should be wrapped in
a try/catch.isValidProp in interface ApiModuleisValidProp in class ScriptModuleImplExceptionpublic void cleanUp()
throws Exception
cleanUp in interface BioModulecleanUp in class BioModuleImplException - thrown if any runtime error occurspublic List<File> getSeqFiles(Collection<File> files) throws SequnceFormatException
SeqModuleConfig."metadata.required" = "Y", an
error is thrown to list the files that cannot be matched to a metadata row.getSeqFiles in interface SeqModulefiles - Module input filesSequnceFormatException - If Config."metadata.required" =
"Y" but sequence files found that do not have a corresponding record in the metadata
file or if invalid metadata prevents parsing SEQ files.public String getSummary() throws Exception
getSummary in interface BioModulegetSummary in class ScriptModuleImplException - if any error occurspublic void runModule()
throws Exception
validateFile(File, Integer) for
each input file.removeBadFiles() to remove empty files (cases where all reads fail validation).verifyPairedSeqs() if module input files are paired read files.MetaUtil.addColumn(String, Map, File, boolean)runModule in interface JavaModulerunModule in class JavaModuleImplException - thrown if any runtime error occursprotected void removeBadFiles()
protected void validateFile(File file, Integer fileCount) throws Exception
file - Sequence filefileCount - Integer countException - if I/O errors occur while processing sequence filesprotected void verifyPairedSeqs()
throws Exception
Exception - if validations fail or errors occurpublic String getDescription()
ApiModulegetDetails.getDescription in interface ApiModulepublic String getCitationString()
ApiModulegetCitationString in interface ApiModule