SplitSeq

Sorts, samples and splits FASTA/FASTQ sequence files

usage: SplitSeq [-h] [–version] ...

-h, --help

show this help message and exit

--version

show program’s version number and exit

output files:
part<part>
reads partitioned by count, where <part> is the partition number.
<field>-<value>
reads partitioned by annotation <field> and <value>.
under-<number>
reads partitioned by numeric threshold where the annotation value is strictly less than the threshold <number>.
atleast-<number>
reads partitioned by numeric threshold where the annotation value is greater than or equal to the threshold <number>.
sorted
reads sorted by annotation value.
sorted-part<part>
reads sorted by annotation value and partitioned by count, where <part> is the partition number.
sample<i>-n<count>
randomly sampled reads where <i> is a number specifying the sampling instance and <count> is the number of sampled reads.
output annotation fields:
None

SplitSeq count

Splits sequences files by number of records.

usage: SplitSeq [-h] [–version] ...

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>

Maximum number of sequences in each new file

SplitSeq group

Splits sequences files by annotation.

usage: SplitSeq [-h] [–version] ...

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <field>

Annotation field to split sequence files by

--num <threshold>

Specify to define the split field as numeric and group sequences by value

SplitSeq sample

Randomly samples from unpaired sequences files.

usage: SplitSeq [-h] [–version] ...

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>

Maximum number of sequences to sample from each file

-f <field>

The annotation field for sampling criteria

-u <values>

A list of annotation values that sequences must contain one of; requires the -f argument

SplitSeq samplepair

Randomly samples from paired-end sequences files.

usage: SplitSeq [-h] [–version] ...

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-n <max_count>

A list of the number of sequences to sample from each file

-f <field>

The annotation field for sampling criteria

-u <values>

A list of annotation values that both paired sequences must contain one of; requires the -f argument

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired ends

SplitSeq sort

Sorts sequences files by annotation.

usage: SplitSeq [-h] [–version] ...

-h, --help

show this help message and exit

-s <seq_files>

A list of FASTA/FASTQ files containing sequences to process.

--fasta

Specify to force output as FASTA rather than FASTQ.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

-f <field>

The annotation field to sort sequences by

-n <max_count>

Maximum number of sequences in each new file

--num

Specify to define the sort field as numeric rather than textual