AssemblePairs

Assembles paired-end reads into a single sequence

usage: AssemblePairs [-h] [–version] ...

-h, --help

show this help message and exit

--version

show program’s version number and exit

output files:
assemble-pass
successfully assembled reads.
assemble-fail
raw reads failing paired-end assembly.
output annotation fields:
<user defined>
annotation fields specified by the –1f or –2f arguments.

AssemblePairs align

Assemble pairs by aligning ends

usage: AssemblePairs [-h] [–version] ...

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--failed

If specified create files containing records that fail processing.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired ends

--rc {head,tail,both}

Specify to reverse complement sequences before stitching

--1f <head_fields>

Specify annotation fields to copy from head records into assembled record

--2f <tail_fields>

Specify annotation fields to copy from tail records into assembled record

--alpha <alpha>

Significance threshold for sequence assemble

--maxerror <max_error>

Maximum allowable error rate

--minlen <min_len>

Minimum sequence length to scan for overlap

--maxlen <max_len>

Maximum sequence length to scan for overlap

--scanrev

If specified, scan past the end of the tail sequence to allow the head sequence to overhang the end of the tail sequence.

AssemblePairs join

Assemble pairs by concatenating ends

usage: AssemblePairs [-h] [–version] ...

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--failed

If specified create files containing records that fail processing.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired ends

--rc {head,tail,both}

Specify to reverse complement sequences before stitching

--1f <head_fields>

Specify annotation fields to copy from head records into assembled record

--2f <tail_fields>

Specify annotation fields to copy from tail records into assembled record

--gap <gap>

Number of N characters to place between ends

AssemblePairs reference

Assemble pairs by aligning reads against a reference database

usage: AssemblePairs [-h] [–version] ...

-h, --help

show this help message and exit

-1 <seq_files_1>

An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>

An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta

Specify to force output as FASTA rather than FASTQ.

--failed

If specified create files containing records that fail processing.

--log <log_file>

Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>

A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>

The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>

Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>

Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}

The format of the sequence identifier which defines shared coordinate information across paired ends

--rc {head,tail,both}

Specify to reverse complement sequences before stitching

--1f <head_fields>

Specify annotation fields to copy from head records into assembled record

--2f <tail_fields>

Specify annotation fields to copy from tail records into assembled record

-r <ref_file>

A FASTA file containing the reference sequence database.

--minident <min_ident>

Minimum identity of the assembled sequence required to call a valid assembly (between 0 and 1).

--evalue <evalue>

Minimum E-value for the ublast reference alignment for both the head and tail sequence.

--maxhits <max_hits>

Maximum number of hits from ublast to check for matching head and tail sequence reference alignments.

--fill

Specify to insert change the behavior of inserted characters when the head and tail sequences do not overlap. If specified this will result in inserted of the V region reference sequence instead of a sequence of Ns in the non-overlapping region. Warning, you could end up making chimeric sequences by using this option.

--exec <usearch_exec>

The path to the usearch executable file.