- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
9.2 KiB
NAME
obicomplement — reverse complement of sequences
SYNOPSIS
obicomplement [--batch-mem <string>] [--batch-size <int>]
[--batch-size-max <int>] [--compress|-Z] [--csv] [--debug]
[--ecopcr] [--embl] [--fail-on-taxonomy] [--fasta]
[--fasta-output] [--fastq] [--fastq-output] [--genbank]
[--help|-h|-?] [--input-OBI-header] [--input-json-header]
[--json-output] [--max-cpu <int>] [--no-order]
[--no-progressbar] [--out|-o <FILENAME>]
[--output-OBI-header|-O] [--output-json-header]
[--paired-with <FILENAME>] [--raw-taxid] [--silent-warning]
[--skip-empty] [--solexa] [--taxonomy|-t <string>] [--u-to-t]
[--update-taxid] [--with-leaves] [<args>]
DESCRIPTION
obicomplement computes the reverse complement of every sequence in the
input. For each input sequence, the nucleotides are first reversed, then
each base is replaced by its Watson–Crick complement (A↔T, C↔G), yielding
the strand that would pair with the original sequence read in the opposite
direction.
When quality scores are present (FASTQ data), they are reversed in the same
order as the sequence so that each quality value remains associated with its
corresponding base. Ambiguous IUPAC characters (e.g. N, R, Y) are
handled correctly and preserved in the output.
This operation is commonly needed when sequences have been sequenced on the wrong strand, when a primer is designed on the reverse strand, or when preparing sequences for strand-aware downstream analyses.
The command reads from standard input or from one or more files, processes
sequences in parallel, and writes the result to standard output or to the
file specified with --out.
INPUT
obicomplement accepts biological sequence data in FASTA, FASTQ, EMBL,
GenBank, ecoPCR output, and CSV formats. When no format flag is given, the
format is inferred automatically from the file contents or extension.
Input is read from standard input when no filename argument is provided, or from one or more files passed as positional arguments. Gzip-compressed files are handled transparently.
Paired-end data can be provided with --paired-with, which specifies the
file containing the second mate. Both mates are reverse-complemented and
written to separate output files.
OUTPUT
The output is a sequence file in which every sequence is the reverse
complement of the corresponding input sequence. The output format matches
the input by default (FASTA if no quality data, FASTQ if quality data are
present), and can be overridden with --fasta-output, --fastq-output, or
--json-output.
All annotations (attributes stored in the sequence header) are preserved unchanged. Quality scores, when present, are reversed to stay aligned with their bases.
Observed output example
>seq001 {"definition":"basic DNA sequence"}
cgatcgatcgatcgatcgat
>seq002 {"definition":"GC-rich sequence"}
gcgcgcgcgcgcgcgcgcgc
>seq003 {"definition":"AT-rich sequence"}
atatatatatatatatatat
>seq004 {"definition":"palindromic sequence"}
aattccggaattccggaatt
>seq005 {"definition":"mixed sequence"}
agctagcatgcatagccgat
OPTIONS
Input format
--fasta- Default: false. Force parsing of input as FASTA format.
--fastq- Default: false. Force parsing of input as FASTQ format.
--embl- Default: false. Force parsing of input as EMBL flatfile format.
--genbank- Default: false. Force parsing of input as GenBank flatfile format.
--ecopcr- Default: false. Force parsing of input as ecoPCR output format.
--csv- Default: false. Force parsing of input as CSV format.
--solexa- Default: false. Decode quality scores using the Solexa/Illumina pre-1.3 convention instead of the standard Phred+33 encoding.
--input-OBI-header- Default: false. Interpret FASTA/FASTQ header annotations using the OBI key=value format.
--input-json-header- Default: false. Interpret FASTA/FASTQ header annotations using JSON format.
--no-order- Default: false. When several input files are given, declare that no ordering relationship exists among them, allowing the reader to interleave records freely.
--paired-with <FILENAME>- Default: none. File containing the paired (R2) reads. When set,
obicomplementprocesses both mates and writes them to separate output files.
Sequence preprocessing
--u-to-t- Default: false. Convert Uracil (U) to Thymine (T) before computing the reverse complement. Useful when processing RNA sequences that must be treated as DNA.
--skip-empty- Default: false. Discard sequences of length zero from the output.
Output format
--fasta-output- Default: false. Write output in FASTA format regardless of whether quality scores are present.
--fastq-output- Default: false. Write output in FASTQ format (requires quality data).
--json-output- Default: false. Write output in JSON format.
--out|-o <FILENAME>- Default:
-(standard output). File used to save the output. --output-OBI-header|-O- Default: false. Write FASTA/FASTQ header annotations in OBI key=value format.
--output-json-header- Default: false. Write FASTA/FASTQ header annotations in JSON format.
--compress|-Z- Default: false. Compress the output with gzip.
Taxonomy
--taxonomy|-t <string>- Default: none. Path to a taxonomy database. Required only when the input sequences carry taxid annotations that need to be validated or updated.
--fail-on-taxonomy- Default: false. Cause
obicomplementto exit with an error if a taxid referenced in the data is not a currently valid node in the loaded taxonomy. --update-taxid- Default: false. Automatically replace taxids that have been declared merged into a newer node by the taxonomy database.
--raw-taxid- Default: false. Print taxids without appending the taxon name and rank.
--with-leaves- Default: false. When the taxonomy is extracted from the sequence file, attach sequences as leaves of their taxid node.
Performance and diagnostics
--max-cpu <int>- Default: 16 (env:
OBIMAXCPU). Number of parallel threads used to process sequences. --batch-size <int>- Default: 1 (env:
OBIBATCHSIZE). Minimum number of sequences per processing batch. --batch-size-max <int>- Default: 2000 (env:
OBIBATCHSIZEMAX). Maximum number of sequences per processing batch. --batch-mem <string>- Default:
128M(env:OBIBATCHMEM). Maximum memory allocated per batch (e.g.128K,64M,1G). Set to0to disable the memory limit. --no-progressbar- Default: false. Disable the progress bar printed to stderr.
--silent-warning- Default: false (env:
OBIWARNING). Suppress warning messages. --debug- Default: false (env:
OBIDEBUG). Enable debug logging.
EXAMPLES
# Reverse complement all sequences in a FASTA file
obicomplement sequences.fasta > out_default.fasta
Expected output: 5 sequences written to out_default.fasta.
# Reverse complement a FASTQ file, preserving quality scores
obicomplement reads.fastq --fastq-output --out out_fastq.fastq
Expected output: 5 sequences written to out_fastq.fastq.
# Convert RNA sequences to their reverse complement DNA strand
obicomplement --u-to-t rna_sequences.fasta > out_rna_rc.fasta
Expected output: 3 sequences written to out_rna_rc.fasta.
# Reverse complement paired-end reads into two separate output files
obicomplement R1.fastq --paired-with R2.fastq --out out_paired.fastq
Expected output: 3 sequences written to out_paired_R1.fastq and 3 sequences to out_paired_R2.fastq.
# Reverse complement and compress output, skipping any empty sequences
obicomplement --skip-empty --compress sequences.fasta --out out_compressed.fasta.gz
Expected output: 5 sequences written to out_compressed.fasta.gz (gzip-compressed FASTA).
# Reverse complement with OBI-format header output
obicomplement --output-OBI-header sequences.fasta --out out_obi.fasta
Expected output: 5 sequences written to out_obi.fasta.
# Reverse complement with explicit JSON-format header output
obicomplement --output-json-header sequences.fasta --out out_jsonheader.fasta
Expected output: 5 sequences written to out_jsonheader.fasta.
# Reverse complement and write full JSON output format
obicomplement --json-output sequences.fasta --out out_json.json
Expected output: 5 sequences written to out_json.json.
SEE ALSO
obiconvert— format conversion and sequence filtering pipelineobipairing— paired-end read merging (uses reverse complement internally)obigrep— sequence filtering and selection
NOTES
Quality scores (Phred-scaled) are reversed in lock-step with the sequence so that positional quality information remains valid after the reverse complement operation. This is essential for downstream tools that rely on per-base quality for alignment or variant calling.
Ambiguous IUPAC characters and gap symbols (-) are handled gracefully:
standard ambiguous bases are complemented according to IUPAC rules, while
gap and missing-data symbols are preserved unchanged.