ClusterSets¶

Cluster sequences by group

usage: ClusterSets [-h] -s SEQ_FILES [SEQ_FILES ...] [–fasta] [–failed] [–log LOG_FILE] [–delim DELIMITER DELIMITER DELIMITER] [–nproc NPROC] [–outdir OUT_DIR] [–outname OUT_NAME] [–version] [-f BARCODE_FIELD] [-k CLUSTER_FIELD] [–id IDENT] [–start SEQ_START] [–end SEQ_END] [–exec USEARCH_EXEC]

-h, --help¶: show this help message and exit

-s <seq_files>¶: A list of FASTA/FASTQ files containing sequences to process.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--failed¶: If specified create files containing records that fail processing.

--log <log_file>¶: Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>¶: The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--version¶: show program’s version number and exit

-f <barcode_field>¶: The annotation field containing annotations, such as UID barcode, for sequence grouping.

-k <cluster_field>¶: The name of the output annotation field to add with the cluster information for each sequence.

--id <ident>¶: The sequence identity threshold for the usearch algorithm.

--start <seq_start>¶: The start of the region to be used for clustering. Together with –end, this parameter can be used to specify a subsequence of each read to use in the clustering algorithm.

--end <seq_end>¶: The end of the region to be used for clustering.

--exec <usearch_exec>¶: The location of the USEARCH executable.

output files:

cluster-pass: clustered reads.
cluster-fail: raw reads failing clustering.

output annotation fields:

CLUSTER: a numeric cluster identifier defining the within-group cluster.