AssemblePairs¶

Assembles paired-end reads into a single sequence

usage: AssemblePairs [-h] [–version] ...

-h, --help¶: show this help message and exit

--version¶: show program’s version number and exit

output files:

assemble-pass: successfully assembled reads.
assemble-fail: raw reads failing paired-end assembly.

output annotation fields:

<user defined>: annotation fields specified by the –1f or –2f arguments.

AssemblePairs align¶

Assemble pairs by aligning ends

usage: AssemblePairs [-h] [–version] ...

-h, --help¶: show this help message and exit

-1 <seq_files_1>¶: An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>¶: An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--failed¶: If specified create files containing records that fail processing.

--log <log_file>¶: Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>¶: The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}¶: The format of the sequence identifier which defines shared coordinate information across paired ends

--rc {head,tail,both}¶: Specify to reverse complement sequences before stitching

--1f <head_fields>¶: Specify annotation fields to copy from head records into assembled record

--2f <tail_fields>¶: Specify annotation fields to copy from tail records into assembled record

--alpha <alpha>¶: Significance threshold for sequence assemble

--maxerror <max_error>¶: Maximum allowable error rate

--minlen <min_len>¶: Minimum sequence length to scan for overlap

--maxlen <max_len>¶: Maximum sequence length to scan for overlap

--scanrev¶: If specified, scan past the end of the tail sequence to allow the head sequence to overhang the end of the tail sequence.

AssemblePairs join¶

Assemble pairs by concatenating ends

usage: AssemblePairs [-h] [–version] ...

-h, --help¶: show this help message and exit

-1 <seq_files_1>¶: An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>¶: An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--failed¶: If specified create files containing records that fail processing.

--log <log_file>¶: Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>¶: The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}¶: The format of the sequence identifier which defines shared coordinate information across paired ends

--rc {head,tail,both}¶: Specify to reverse complement sequences before stitching

--1f <head_fields>¶: Specify annotation fields to copy from head records into assembled record

--2f <tail_fields>¶: Specify annotation fields to copy from tail records into assembled record

--gap <gap>¶: Number of N characters to place between ends

AssemblePairs reference¶

Assemble pairs by aligning reads against a reference database

usage: AssemblePairs [-h] [–version] ...

-h, --help¶: show this help message and exit

-1 <seq_files_1>¶: An ordered list of FASTA/FASTQ files containing head/primary sequences.

-2 <seq_files_2>¶: An ordered list of FASTA/FASTQ files containing tail/secondary sequences.

--fasta¶: Specify to force output as FASTA rather than FASTQ.

--failed¶: If specified create files containing records that fail processing.

--log <log_file>¶: Specify to write verbose logging to a file. May not be specified with multiple input files.

--delim <delimiter>¶: A list of the three delimiters that separate annotation blocks, field names and values, and values within a field, respectively.

--nproc <nproc>¶: The number of simultaneous computational processes to execute (CPU cores to utilized).

--outdir <out_dir>¶: Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.

--outname <out_name>¶: Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.

--coord {illumina,solexa,sra,454,presto}¶: The format of the sequence identifier which defines shared coordinate information across paired ends

--rc {head,tail,both}¶: Specify to reverse complement sequences before stitching

--1f <head_fields>¶: Specify annotation fields to copy from head records into assembled record

--2f <tail_fields>¶: Specify annotation fields to copy from tail records into assembled record

-r <ref_file>¶: A FASTA file containing the reference sequence database.

--minident <min_ident>¶: Minimum identity of the assembled sequence required to call a valid assembly (between 0 and 1).

--evalue <evalue>¶: Minimum E-value for the ublast reference alignment for both the head and tail sequence.

--maxhits <max_hits>¶: Maximum number of hits from ublast to check for matching head and tail sequence reference alignments.

--fill¶: Specify to insert change the behavior of inserted characters when the head and tail sequences do not overlap. If specified this will result in inserted of the V region reference sequence instead of a sequence of Ns in the non-overlapping region. Warning, you could end up making chimeric sequences by using this option.

--exec <usearch_exec>¶: The path to the usearch executable file.