presto.Sequence¶
Sequence processing functions
-
presto.Sequence.calculateDiversity(seq_list, score_dict=getDNAScoreDict())¶ Determine the average pairwise error rate for a list of sequences
Parameters: - seq_list – List of SeqRecord objects to score
- score_dict – Optional dictionary of alignment scores as {(char1, char2): score}
Returns: Average pairwise error rate for the list of sequences
Return type:
-
presto.Sequence.calculateSetError(seq_list, ref_seq, ignore_chars=['n', 'N'], score_dict=getDNAScoreDict())¶ Counts the occurrence of nucleotide mismatches from a reference in a set of sequences
Parameters: - seq_list – List of SeqRecord objects with aligned sequences
- ref_seq – SeqRecord object containing the reference sequence to match against
- ignore_chars – List of characters to exclude from mismatch counts
- score_dict – Optional dictionary of alignment scores as {(char1, char2): score}
Returns: Error rate for the set
Return type:
-
presto.Sequence.checkSeqEqual(seq1, seq2, ignore_chars={'.', '-', 'n', 'N'})¶ Determine if two sequences are equal, excluding missing positions
Parameters: - seq1 – SeqRecord object
- seq2 – SeqRecord object
- ignore_chars – Set of characters to ignore
Returns: True if the sequences are equal
Return type:
-
presto.Sequence.compilePrimers(primers)¶ Translates IUPAC Ambiguous Nucleotide characters to regular expressions and compiles them
Parameters: key – Dictionary of sequences to translate Returns: Dictionary of compiled regular expressions Return type: dict
-
presto.Sequence.deleteSeqPositions(seq, positions)¶ Deletes a list of positions from a SeqRecord
Parameters: - seq – SeqRecord objects
- positions – Set of positions (indices) to delete
Returns: Modified SeqRecord with the specified positions removed
Return type: SeqRecord
-
presto.Sequence.findGapPositions(seq_list, max_gap, gap_chars={'.', '-'})¶ Finds positions in a set of aligned sequences with a high number of gap characters.
Parameters: - seq_list – List of SeqRecord objects with aligned sequences
- max_gap – Float of the maximum gap frequency to consider a position as non-gapped
- gap_chars – Set of characters to consider as gaps
Returns: Positions (indices) with gap frequency greater than max_gap
Return type:
-
presto.Sequence.frequencyConsensus(seq_list, min_freq=0.6, ignore_chars={'.', '-', 'n', 'N'})¶ Builds a consensus sequence from a set of sequences
Parameters: - set_seq – List of SeqRecord objects
- min_freq – Frequency cutoff to assign a base
- ignore_chars – Set of characters to exclude when building a consensus sequence
Returns: Consensus SeqRecord object
Return type: SeqRecord
-
presto.Sequence.getAAScoreDict(mask_score=None, gap_score=None)¶ Generates a score dictionary
Parameters: - mask_score – Tuple of length two defining scores for all matches against an X character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
- gap_score – Tuple of length two defining score for all matches against a [-, .] character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
Returns: Score dictionary with keys (char1, char2) mapping to scores
Return type:
-
presto.Sequence.getDNAScoreDict(mask_score=None, gap_score=None)¶ Generates a score dictionary
Parameters: - mask_score – Tuple of length two defining scores for all matches against an N character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
- gap_score – Tuple of length two defining score for all matches against a [-, .] character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
Returns: Score dictionary with keys (char1, char2) mapping to scores
Return type:
-
presto.Sequence.indexSeqSets(seq_dict, field='BARCODE', delimiter=('|', '=', ', '))¶ Identifies sets of sequences with the same ID field
Parameters: - seq_dict – a dictionary index of sequences returned from SeqIO.index()
- field – the annotation field containing set IDs
- delimiter – a tuple of delimiters for (fields, values, value lists)
Returns: Dictionary mapping set name to a list of record names
Return type:
-
presto.Sequence.qualityConsensus(seq_list, min_qual=20, min_freq=0.6, dependent=False, ignore_chars={'.', '-', 'n', 'N'})¶ Builds a consensus sequence from a set of sequences
Parameters: - seq_list – List of SeqRecord objects
- min_qual – Quality cutoff to assign a base
- min_freq – Frequency cutoff to assign a base
- dependent – If False assume sequences are independent for quality calculation
- ignore_chars – Set of characters to exclude when building a consensus sequence
Returns: Consensus SeqRecord object
Return type: SeqRecord
-
presto.Sequence.reverseComplement(seq)¶ Takes the reverse complement of a sequence
Parameters: seq – a SeqRecord object, Seq object or string to reverse complement Returns: Object of the same type as the input with the reverse complement sequence Return type: Seq
-
presto.Sequence.scoreAA(a, b, mask_score=None, gap_score=None)¶ Returns the score for a pair of IUPAC Extended Protein characters
Parameters: - a – First character
- b – Second character
- mask_score – Tuple of length two defining scores for all matches against an X character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
- gap_score – Tuple of length two defining score for all matches against a gap (-, .) character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
Returns: Score for the character pair
Return type:
-
presto.Sequence.scoreDNA(a, b, mask_score=None, gap_score=None)¶ Returns the score for a pair of IUPAC Ambiguous Nucleotide characters
Parameters: - a – First characters
- b – Second character
- n_score – Tuple of length two defining scores for all matches against an N character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
- gap_score – Tuple of length two defining score for all matches against a gap (-, .) character for (a, b), with the score for character (a) taking precedence; if None score symmetrically according to IUPAC character identity
Returns: Score for the character pair
Return type:
-
presto.Sequence.scoreSeqPair(seq1, seq2, ignore_chars=set(), score_dict=getDNAScoreDict())¶ Determine the error rate for a pair of sequences
Parameters: - seq1 – SeqRecord object
- seq2 – SeqRecord object
- ignore_chars – Set of characters to ignore when scoring and counting the weight
- score_dict – Optional dictionary of alignment scores
Returns: Tuple of the (score, minimum weight, error rate) for the pair of sequences
Return type: Tuple
-
presto.Sequence.subsetSeqIndex(seq_dict, field, values, delimiter=('|', '=', ', '))¶ Subsets a sequence set by annotation value
Parameters: - seq_dict – Dictionary index of sequences returned from SeqIO.index()
- field – Annotation field to select keys by
- values – List of annotation values that define the retained keys
- delimiter – Tuple of delimiters for (annotations, field/values, value lists)
Returns: List of keys
Return type:
-
presto.Sequence.subsetSeqSet(seq_iter, field, values, delimiter=('|', '=', ', '))¶ Subsets a sequence set by annotation value
Parameters: - seq_iter – Iterator or list of SeqRecord objects
- field – Annotation field to select by
- values – List of annotation values that define the retained sequences
- delimiter – Tuple of delimiters for (annotations, field/values, value lists)
Returns: Modified list of SeqRecord objects
Return type: