ClusterSets¶
Cluster sequences by group
usage: ClusterSets [-h] -s SEQ_FILES [SEQ_FILES ...] [–fasta] [–failed] [–log LOG_FILE] [–delim DELIMITER DELIMITER DELIMITER] [–nproc NPROC] [–outdir OUT_DIR] [–outname OUT_NAME] [–version] [-f BARCODE_FIELD] [-k CLUSTER_FIELD] [–id IDENT] [–start SEQ_START] [–end SEQ_END] [–exec USEARCH_EXEC]
-
-h,--help¶ show this help message and exit
-
-s<seq_files>¶ A list of FASTA/FASTQ files containing sequences to process.
-
--fasta¶ Specify to force output as FASTA rather than FASTQ.
-
--failed¶ If specified create files containing records that fail processing.
-
--log<log_file>¶ Specify to write verbose logging to a file. May not be specified with multiple input files.
-
--delim<delimiter>¶ A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.
-
--nproc<nproc>¶ The number of simultaneous computational processes to execute (CPU cores to utilized).
-
--outdir<out_dir>¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname<out_name>¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
--version¶ show program’s version number and exit
-
-f<barcode_field>¶ The annotation field containing annotations, such as UID barcode, for sequence grouping.
-
-k<cluster_field>¶ The name of the output annotation field to add with the cluster information for each sequence.
-
--id<ident>¶ The sequence identity threshold for the usearch algorithm.
-
--start<seq_start>¶ The start of the region to be used for clustering. Together with –end, this parameter can be used to specify a subsequence of each read to use in the clustering algorithm.
-
--end<seq_end>¶ The end of the region to be used for clustering.
-
--exec<usearch_exec>¶ The location of the USEARCH executable.
- output files:
- cluster-pass
- clustered reads.
- cluster-fail
- raw reads failing clustering.
- output annotation fields:
- CLUSTER
- a numeric cluster identifier defining the within-group cluster.