SplitSeq¶
Sorts, samples and splits FASTA/FASTQ sequence files
usage: SplitSeq [-h] [–version] ...
-
-h,--help¶ show this help message and exit
-
--version¶ show program’s version number and exit
- output files:
- part<part>
- reads partitioned by count, where <part> is the partition number.
- <field>-<value>
- reads partitioned by annotation <field> and <value>.
- under-<number>
- reads partitioned by numeric threshold where the annotation value is strictly less than the threshold <number>.
- atleast-<number>
- reads partitioned by numeric threshold where the annotation value is greater than or equal to the threshold <number>.
- sorted
- reads sorted by annotation value.
- sorted-part<part>
- reads sorted by annotation value and partitioned by count, where <part> is the partition number.
- sample<i>-n<count>
- randomly sampled reads where <i> is a number specifying the sampling instance and <count> is the number of sampled reads.
- output annotation fields:
- None
SplitSeq count¶
Splits sequences files by number of records.
usage: SplitSeq [-h] [–version] ...
-
-h,--help¶ show this help message and exit
-
-s<seq_files>¶ A list of FASTA/FASTQ files containing sequences to process.
-
--fasta¶ Specify to force output as FASTA rather than FASTQ.
-
--outdir<out_dir>¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname<out_name>¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
-n<max_count>¶ Maximum number of sequences in each new file
SplitSeq group¶
Splits sequences files by annotation.
usage: SplitSeq [-h] [–version] ...
-
-h,--help¶ show this help message and exit
-
-s<seq_files>¶ A list of FASTA/FASTQ files containing sequences to process.
-
--fasta¶ Specify to force output as FASTA rather than FASTQ.
-
--delim<delimiter>¶ A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.
-
--outdir<out_dir>¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname<out_name>¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
-f<field>¶ Annotation field to split sequence files by
-
--num<threshold>¶ Specify to define the split field as numeric and group sequences by value
SplitSeq sample¶
Randomly samples from unpaired sequences files.
usage: SplitSeq [-h] [–version] ...
-
-h,--help¶ show this help message and exit
-
-s<seq_files>¶ A list of FASTA/FASTQ files containing sequences to process.
-
--fasta¶ Specify to force output as FASTA rather than FASTQ.
-
--delim<delimiter>¶ A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.
-
--outdir<out_dir>¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname<out_name>¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
-n<max_count>¶ Maximum number of sequences to sample from each file
-
-f<field>¶ The annotation field for sampling criteria
-
-u<values>¶ A list of annotation values that sequences must contain one of; requires the -f argument
SplitSeq samplepair¶
Randomly samples from paired-end sequences files.
usage: SplitSeq [-h] [–version] ...
-
-h,--help¶ show this help message and exit
-
-1<seq_files_1>¶ An ordered list of FASTA/FASTQ files containing head/primary sequences.
-
-2<seq_files_2>¶ An ordered list of FASTA/FASTQ files containing tail/secondary sequences.
-
--fasta¶ Specify to force output as FASTA rather than FASTQ.
-
--delim<delimiter>¶ A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.
-
--outdir<out_dir>¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname<out_name>¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
-n<max_count>¶ A list of the number of sequences to sample from each file
-
-f<field>¶ The annotation field for sampling criteria
-
-u<values>¶ A list of annotation values that both paired sequences must contain one of; requires the -f argument
-
--coord{illumina,solexa,sra,454,presto}¶ The format of the sequence identifier which defines shared coordinate information across paired ends
SplitSeq sort¶
Sorts sequences files by annotation.
usage: SplitSeq [-h] [–version] ...
-
-h,--help¶ show this help message and exit
-
-s<seq_files>¶ A list of FASTA/FASTQ files containing sequences to process.
-
--fasta¶ Specify to force output as FASTA rather than FASTQ.
-
--delim<delimiter>¶ A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.
-
--outdir<out_dir>¶ Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.
-
--outname<out_name>¶ Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.
-
-f<field>¶ The annotation field to sort sequences by
-
-n<max_count>¶ Maximum number of sequences in each new file
-
--num¶ Specify to define the sort field as numeric rather than textual