presto.Annotation¶

Annotation functions

presto.Annotation.annotationConsensus(seq_iter, field, delimiter=('|', '=', ', '))¶

Calculate a consensus annotation for a set of sequences

Parameters:

seq_iter – an iterator or list of SeqRecord objects
field – the annotation field to take a consensus of
delimiter – a tuple of delimiters for (annotations, field/values, value lists)

Returns:

Dictionary with keys: set containing a list of unique annotation values, count containing annotation counts, cons containing the consensus annotation, freq containing the majority annotation frequency

Return type:

presto.Annotation.collapseAnnotation(ann_dict, action, fields=None, delimiter=('|', '=', ', '))¶

Collapses multiple annotations into new single annotations for each field

Parameters:	ann_dict – Dictionary of field/value pairs action – Collapse action to take; one of {min, max, sum, first, last, set, cat} fields – Subset of ann_dict to _collapse; if None _collapse all but the ID field delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:	Modified field dictionary
Return type:	OrderedDict

presto.Annotation.flattenAnnotation(ann_dict, delimiter=('|', '=', ', '))¶

Converts annotations from a dictionary to a FASTA/FASTQ sequence description

Parameters:	ann_dict – Dictionary of field/value pairs delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:	Formatted sequence description string
Return type:	str

presto.Annotation.getAnnotationValues(seq_iter, field, unique=False, delimiter=('|', '=', ', '))¶

Gets the set of unique annotation values in a sequence set

Parameters:	seq_iter – Iterator or list of SeqRecord objects field – Annotation field to retrieve values for unique – If True return a list of only the unique values; if False return a list of all values delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:	List of values for the field
Return type:	list

presto.Annotation.getCoordKey(header, coord_type='presto', delimiter=('|', '=', ', '))¶

Return the coordinate identifier for a sequence description

Parameters:	header – Sequence header string coord_type – Sequence header format; one of [‘illumina’, ‘solexa’, ‘sra’, ‘454’, ‘presto’]; if unrecognized type or None return sequence ID. delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:	Coordinate identifier as a string
Return type:	str

presto.Annotation.mergeAnnotation(ann_dict_1, ann_dict_2, prepend=False, delimiter=('|', '=', ', '))¶

Merges non-ID field annotations from one field dictionary into another

Parameters:	ann_dict_1 – Dictionary of field/value pairs to append to ann_dict_2 – Dictionary of field/value pairs to merge with ann_dict_2 prepend – If True then add ann_dict_2 values to the front of any ann_dict_1 values that are already present, rather than the default behavior of appending ann_dict_2 values. delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:	Modified ann_dict_1 dictonary of field/value pairs
Return type:	OrderedDict

presto.Annotation.parseAnnotation(record, fields=None, delimiter=('|', '=', ', '))¶

Extracts annotations from a FASTA/FASTQ sequence description

Parameters:	record – Description string to extract annotations from fields – List of fields to subset the return dictionary to; if None return all fields delimiter – a tuple of delimiters for (fields, values, value lists)
Returns:	An OrderedDict of field/value pairs
Return type:	OrderedDict

presto.Annotation.renameAnnotation(ann_dict, old_field, new_field, delimiter=('|', '=', ', '))¶

Renames an annotation and merges annotations if the new name already exists

Parameters:	ann_dict – Dictionary of field/value pairs old_field – Old field name new_field – New field name delimiter – Tuple of delimiters for (fields, values, value lists)
Returns:	Modified fields dictonary
Return type:	OrderedDict