Output Formats¶

Amplirust produces two output formats: FASTA sequences and an optional TSV statistics file.

FASTA Output¶

Products are written with descriptive headers:

>reference_id:primer_name:1 pos=0-123   strand=+    len=124
ACGTACGTACGT...
>reference_id:primer_name_rc:2  pos=200-324 strand=-    len=125
TGCATGCATGCA...

Header Format¶

{reference_id}:{primer_name}[_rc][_wrap]:{case_number}\tpos={start}-{end}\tstrand={+|-}\tlen={length}

Field	Description
`reference_id`	First whitespace-delimited token of the FASTA header, or LOCUS name for GenBank records (with fallback to accession, definition, or `unknown_N`)
`_rc` suffix	Product found on reverse complement strand
`_wrap` suffix	Product wraps around a circular genome
`case_number`	Increments per reference header (resets for each reference)
`strand`	`+` for forward, `-` for reverse complement match orientation

Output sequences retain their strand orientation; the strand field indicates match orientation.

TSV Statistics¶

The optional TSV output (--tsv) contains detailed information for each product:

Column	Description
amplicon_id	Header using `reference_id` with case number
reference_id	Original sequence header
source_file	Input file path
primer_name	Primer pair name
product_len	Output sequence length
full_len	Full product length (before trimming)
fwd_start	Forward primer match start (0-based)
fwd_end	Forward primer match end
fwd_mismatches	Edit distance for forward primer
fwd_identity	Identity percentage for forward
fwd_cigar	CIGAR string for forward alignment
rev_start	Reverse primer match start
rev_end	Reverse primer match end
rev_mismatches	Edit distance for reverse primer
rev_identity	Identity percentage for reverse
rev_cigar	CIGAR string for reverse alignment
strand	`+` (forward) or `-` (reverse complement)
is_circular_wrap	`true` if product wraps around
product_seq	The extracted sequence

Worked Example¶

Given a genome with a 16S rRNA region and primers with 1 mismatch each:

amplirust -i genome.fasta \
  -p "16S:AGAGTTTGATCMTGGCTCAG:TACGGYTACCTTGTTACGACTT" \
  -k 2 -o products.fasta --tsv stats.tsv

FASTA output:

>genome_1:16S:1 pos=1024-2548   strand=+    len=1525
AGAGTTTGATCATGGCTCAG...AAGTCGTAACAAGGTAGCCGTA

TSV row (selected columns):

Column	Value
amplicon_id	genome_1:16S:1
primer_name	16S
product_len	1525
fwd_start	1024
fwd_end	1044
fwd_mismatches	0
fwd_identity	1.000
rev_start	2526
rev_end	2548
rev_mismatches	0
rev_identity	1.000
strand	+