EstimateError¶

Calculates annotation set error rates

usage: EstimateError [-h] -s SEQ_FILES [SEQ_FILES ...] [–log LOG_FILE] [–delim DELIMITER DELIMITER DELIMITER] [–nproc NPROC] [–outdir OUT_DIR] [–outname OUT_NAME] [–version] [-f SET_FIELD] [-n MIN_COUNT] [–mode {freq,qual}] [-q MIN_QUAL] [–freq MIN_FREQ] [–maxdiv MAX_DIVERSITY]

-h, --help¶: show this help message and exit

-s <seq_files>¶: A list of FASTA/FASTQ files containing sequences to process.

--log <log_file>¶: Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>¶: The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--version¶: show program’s version number and exit

-f <set_field>¶: The name of the annotation field to group sequences by

-n <min_count>¶: The minimum number of sequences needed to consider a set

--mode {freq,qual}¶: Specifies which method to use to determine the consensus sequence. The “freq” method will determine the consensus by nucleotide frequency at each position and assign the most common value. The “qual” method will weight values by their quality scores to determine the consensus nucleotide at each position.

-q <min_qual>¶: Consensus quality score cut-off under which an ambiguous character is assigned.

--freq <min_freq>¶: Fraction of character occurrences under which an ambiguous character is assigned.

--maxdiv <max_diversity>¶: Specify to calculate the nucleotide diversity of each read group (average pairwise error rate) and exclude groups which exceed the given diversity threshold.

output files:

error-position: estimated error by read position.
error-quality: estimated error by the quality score assigned within the input file.
error-nucleotide: estimated error by nucleotide.
error-set: estimated error by barcode read group size.

output fields:

POSITION: read position with base zero indexing.
Q: Phred quality score.
OBSERVED: observed nucleotide value.
REFERENCE: consensus nucleotide for the barcode read group.
SET_COUNT: barcode read group size.
REPORTED_Q: mean Phred quality score reported within the input file for the given position, quality score, nucleotide or read group.
MISMATCHES: count of observed mismatches from consensus for the given position, quality score, nucleotide or read group.
OBSERVATIONS: total count of observed values for each position, quality score, nucleotide or read group size.
ERROR: estimated error rate.
EMPIRICAL_Q: estimated error rate converted to a Phred quality score.