presto.Annotation¶
Annotation functions
-
presto.Annotation.annotationConsensus(seq_iter, field, delimiter=('|', '=', ', '))¶ Calculate a consensus annotation for a set of sequences
Parameters: - seq_iter – an iterator or list of SeqRecord objects
- field – the annotation field to take a consensus of
- delimiter – a tuple of delimiters for (annotations, field/values, value lists)
Returns: - Dictionary with keys
set containing a list of unique annotation values, count containing annotation counts, cons containing the consensus annotation, freq containing the majority annotation frequency
Return type:
-
presto.Annotation.collapseAnnotation(ann_dict, action, fields=None, delimiter=('|', '=', ', '))¶ Collapses multiple annotations into new single annotations for each field
Parameters: - ann_dict – Dictionary of field/value pairs
- action – Collapse action to take; one of {min, max, sum, first, last, set, cat}
- fields – Subset of ann_dict to _collapse; if None _collapse all but the ID field
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Modified field dictionary
Return type: OrderedDict
-
presto.Annotation.flattenAnnotation(ann_dict, delimiter=('|', '=', ', '))¶ Converts annotations from a dictionary to a FASTA/FASTQ sequence description
Parameters: - ann_dict – Dictionary of field/value pairs
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Formatted sequence description string
Return type:
-
presto.Annotation.getAnnotationValues(seq_iter, field, unique=False, delimiter=('|', '=', ', '))¶ Gets the set of unique annotation values in a sequence set
Parameters: - seq_iter – Iterator or list of SeqRecord objects
- field – Annotation field to retrieve values for
- unique – If True return a list of only the unique values; if False return a list of all values
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: List of values for the field
Return type:
-
presto.Annotation.getCoordKey(header, coord_type='presto', delimiter=('|', '=', ', '))¶ Return the coordinate identifier for a sequence description
Parameters: - header – Sequence header string
- coord_type – Sequence header format; one of [‘illumina’, ‘solexa’, ‘sra’, ‘454’, ‘presto’]; if unrecognized type or None return sequence ID.
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Coordinate identifier as a string
Return type:
-
presto.Annotation.mergeAnnotation(ann_dict_1, ann_dict_2, prepend=False, delimiter=('|', '=', ', '))¶ Merges non-ID field annotations from one field dictionary into another
Parameters: - ann_dict_1 – Dictionary of field/value pairs to append to
- ann_dict_2 – Dictionary of field/value pairs to merge with ann_dict_2
- prepend – If True then add ann_dict_2 values to the front of any ann_dict_1 values that are already present, rather than the default behavior of appending ann_dict_2 values.
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Modified ann_dict_1 dictonary of field/value pairs
Return type: OrderedDict
-
presto.Annotation.parseAnnotation(record, fields=None, delimiter=('|', '=', ', '))¶ Extracts annotations from a FASTA/FASTQ sequence description
Parameters: - record – Description string to extract annotations from
- fields – List of fields to subset the return dictionary to; if None return all fields
- delimiter – a tuple of delimiters for (fields, values, value lists)
Returns: An OrderedDict of field/value pairs
Return type: OrderedDict
-
presto.Annotation.renameAnnotation(ann_dict, old_field, new_field, delimiter=('|', '=', ', '))¶ Renames an annotation and merges annotations if the new name already exists
Parameters: - ann_dict – Dictionary of field/value pairs
- old_field – Old field name
- new_field – New field name
- delimiter – Tuple of delimiters for (fields, values, value lists)
Returns: Modified fields dictonary
Return type: OrderedDict