SplitSeq¶

Sorts, samples and splits FASTA/FASTQ sequence files

usage: SplitSeq [-h] [–version] ...

-h, --help¶: show this help message and exit

--version¶: show program’s version number and exit

output files:

part<part>: reads partitioned by count, where <part> is the partition number.
<field>-<value>: reads partitioned by annotation <field> and <value>.
under-<number>: reads partitioned by numeric threshold where the annotation value is strictly less than the threshold <number>.
atleast-<number>: reads partitioned by numeric threshold where the annotation value is greater than or equal to the threshold <number>.
sorted: reads sorted by annotation value.
sorted-part<part>: reads sorted by annotation value and partitioned by count, where <part> is the partition number.
sample<i>-n<count>: randomly sampled reads where <i> is a number specifying the sampling instance and <count> is the number of sampled reads.

output annotation fields:

None

SplitSeq count¶

Splits sequences files by number of records.

usage: SplitSeq [-h] [–version] ...

-h, --help¶: show this help message and exit

-s <seq_files>¶: A list of FASTA/FASTQ files containing sequences to process.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>¶: Maximum number of sequences in each new file

SplitSeq group¶

Splits sequences files by annotation.

usage: SplitSeq [-h] [–version] ...

-h, --help¶: show this help message and exit

-s <seq_files>¶: A list of FASTA/FASTQ files containing sequences to process.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <field>¶: Annotation field to split sequence files by

--num <threshold>¶: Specify to define the split field as numeric and group sequences by value

SplitSeq sample¶

Randomly samples from unpaired sequences files.

usage: SplitSeq [-h] [–version] ...

-h, --help¶: show this help message and exit

-s <seq_files>¶: A list of FASTA/FASTQ files containing sequences to process.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>¶: Maximum number of sequences to sample from each file

-f <field>¶: The annotation field for sampling criteria

-u <values>¶: A list of annotation values that sequences must contain one of; requires the -f argument

SplitSeq samplepair¶

Randomly samples from paired-end sequences files.

usage: SplitSeq [-h] [–version] ...

-h, --help¶: show this help message and exit

-1 <seq_files_1>¶: An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>¶: An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>¶: A list of the number of sequences to sample from each file

-f <field>¶: The annotation field for sampling criteria

-u <values>¶: A list of annotation values that both paired sequences must contain one of; requires the -f argument

--coord {illumina,solexa,sra,454,presto}¶: The format of the sequence identifier which defines shared coordinate information across paired ends

SplitSeq sort¶

Sorts sequences files by annotation.

usage: SplitSeq [-h] [–version] ...

-h, --help¶: show this help message and exit

-s <seq_files>¶: A list of FASTA/FASTQ files containing sequences to process.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <field>¶: The annotation field to sort sequences by

-n <max_count>¶: Maximum number of sequences in each new file

--num¶: Specify to define the sort field as numeric rather than textual