Files
obitools4/autodoc/cmd/obicomplement.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

9.2 KiB
Raw Blame History

NAME

obicomplement — reverse complement of sequences


SYNOPSIS

obicomplement [--batch-mem <string>] [--batch-size <int>]
              [--batch-size-max <int>] [--compress|-Z] [--csv] [--debug]
              [--ecopcr] [--embl] [--fail-on-taxonomy] [--fasta]
              [--fasta-output] [--fastq] [--fastq-output] [--genbank]
              [--help|-h|-?] [--input-OBI-header] [--input-json-header]
              [--json-output] [--max-cpu <int>] [--no-order]
              [--no-progressbar] [--out|-o <FILENAME>]
              [--output-OBI-header|-O] [--output-json-header]
              [--paired-with <FILENAME>] [--raw-taxid] [--silent-warning]
              [--skip-empty] [--solexa] [--taxonomy|-t <string>] [--u-to-t]
              [--update-taxid] [--with-leaves] [<args>]

DESCRIPTION

obicomplement computes the reverse complement of every sequence in the input. For each input sequence, the nucleotides are first reversed, then each base is replaced by its WatsonCrick complement (A↔T, C↔G), yielding the strand that would pair with the original sequence read in the opposite direction.

When quality scores are present (FASTQ data), they are reversed in the same order as the sequence so that each quality value remains associated with its corresponding base. Ambiguous IUPAC characters (e.g. N, R, Y) are handled correctly and preserved in the output.

This operation is commonly needed when sequences have been sequenced on the wrong strand, when a primer is designed on the reverse strand, or when preparing sequences for strand-aware downstream analyses.

The command reads from standard input or from one or more files, processes sequences in parallel, and writes the result to standard output or to the file specified with --out.


INPUT

obicomplement accepts biological sequence data in FASTA, FASTQ, EMBL, GenBank, ecoPCR output, and CSV formats. When no format flag is given, the format is inferred automatically from the file contents or extension.

Input is read from standard input when no filename argument is provided, or from one or more files passed as positional arguments. Gzip-compressed files are handled transparently.

Paired-end data can be provided with --paired-with, which specifies the file containing the second mate. Both mates are reverse-complemented and written to separate output files.


OUTPUT

The output is a sequence file in which every sequence is the reverse complement of the corresponding input sequence. The output format matches the input by default (FASTA if no quality data, FASTQ if quality data are present), and can be overridden with --fasta-output, --fastq-output, or --json-output.

All annotations (attributes stored in the sequence header) are preserved unchanged. Quality scores, when present, are reversed to stay aligned with their bases.

Observed output example

>seq001 {"definition":"basic DNA sequence"}
cgatcgatcgatcgatcgat
>seq002 {"definition":"GC-rich sequence"}
gcgcgcgcgcgcgcgcgcgc
>seq003 {"definition":"AT-rich sequence"}
atatatatatatatatatat
>seq004 {"definition":"palindromic sequence"}
aattccggaattccggaatt
>seq005 {"definition":"mixed sequence"}
agctagcatgcatagccgat

OPTIONS

Input format

--fasta
Default: false. Force parsing of input as FASTA format.
--fastq
Default: false. Force parsing of input as FASTQ format.
--embl
Default: false. Force parsing of input as EMBL flatfile format.
--genbank
Default: false. Force parsing of input as GenBank flatfile format.
--ecopcr
Default: false. Force parsing of input as ecoPCR output format.
--csv
Default: false. Force parsing of input as CSV format.
--solexa
Default: false. Decode quality scores using the Solexa/Illumina pre-1.3 convention instead of the standard Phred+33 encoding.
--input-OBI-header
Default: false. Interpret FASTA/FASTQ header annotations using the OBI key=value format.
--input-json-header
Default: false. Interpret FASTA/FASTQ header annotations using JSON format.
--no-order
Default: false. When several input files are given, declare that no ordering relationship exists among them, allowing the reader to interleave records freely.
--paired-with <FILENAME>
Default: none. File containing the paired (R2) reads. When set, obicomplement processes both mates and writes them to separate output files.

Sequence preprocessing

--u-to-t
Default: false. Convert Uracil (U) to Thymine (T) before computing the reverse complement. Useful when processing RNA sequences that must be treated as DNA.
--skip-empty
Default: false. Discard sequences of length zero from the output.

Output format

--fasta-output
Default: false. Write output in FASTA format regardless of whether quality scores are present.
--fastq-output
Default: false. Write output in FASTQ format (requires quality data).
--json-output
Default: false. Write output in JSON format.
--out|-o <FILENAME>
Default: - (standard output). File used to save the output.
--output-OBI-header|-O
Default: false. Write FASTA/FASTQ header annotations in OBI key=value format.
--output-json-header
Default: false. Write FASTA/FASTQ header annotations in JSON format.
--compress|-Z
Default: false. Compress the output with gzip.

Taxonomy

--taxonomy|-t <string>
Default: none. Path to a taxonomy database. Required only when the input sequences carry taxid annotations that need to be validated or updated.
--fail-on-taxonomy
Default: false. Cause obicomplement to exit with an error if a taxid referenced in the data is not a currently valid node in the loaded taxonomy.
--update-taxid
Default: false. Automatically replace taxids that have been declared merged into a newer node by the taxonomy database.
--raw-taxid
Default: false. Print taxids without appending the taxon name and rank.
--with-leaves
Default: false. When the taxonomy is extracted from the sequence file, attach sequences as leaves of their taxid node.

Performance and diagnostics

--max-cpu <int>
Default: 16 (env: OBIMAXCPU). Number of parallel threads used to process sequences.
--batch-size <int>
Default: 1 (env: OBIBATCHSIZE). Minimum number of sequences per processing batch.
--batch-size-max <int>
Default: 2000 (env: OBIBATCHSIZEMAX). Maximum number of sequences per processing batch.
--batch-mem <string>
Default: 128M (env: OBIBATCHMEM). Maximum memory allocated per batch (e.g. 128K, 64M, 1G). Set to 0 to disable the memory limit.
--no-progressbar
Default: false. Disable the progress bar printed to stderr.
--silent-warning
Default: false (env: OBIWARNING). Suppress warning messages.
--debug
Default: false (env: OBIDEBUG). Enable debug logging.

EXAMPLES

# Reverse complement all sequences in a FASTA file
obicomplement sequences.fasta > out_default.fasta

Expected output: 5 sequences written to out_default.fasta.

# Reverse complement a FASTQ file, preserving quality scores
obicomplement reads.fastq --fastq-output --out out_fastq.fastq

Expected output: 5 sequences written to out_fastq.fastq.

# Convert RNA sequences to their reverse complement DNA strand
obicomplement --u-to-t rna_sequences.fasta > out_rna_rc.fasta

Expected output: 3 sequences written to out_rna_rc.fasta.

# Reverse complement paired-end reads into two separate output files
obicomplement R1.fastq --paired-with R2.fastq --out out_paired.fastq

Expected output: 3 sequences written to out_paired_R1.fastq and 3 sequences to out_paired_R2.fastq.

# Reverse complement and compress output, skipping any empty sequences
obicomplement --skip-empty --compress sequences.fasta --out out_compressed.fasta.gz

Expected output: 5 sequences written to out_compressed.fasta.gz (gzip-compressed FASTA).

# Reverse complement with OBI-format header output
obicomplement --output-OBI-header sequences.fasta --out out_obi.fasta

Expected output: 5 sequences written to out_obi.fasta.

# Reverse complement with explicit JSON-format header output
obicomplement --output-json-header sequences.fasta --out out_jsonheader.fasta

Expected output: 5 sequences written to out_jsonheader.fasta.

# Reverse complement and write full JSON output format
obicomplement --json-output sequences.fasta --out out_json.json

Expected output: 5 sequences written to out_json.json.


SEE ALSO

  • obiconvert — format conversion and sequence filtering pipeline
  • obipairing — paired-end read merging (uses reverse complement internally)
  • obigrep — sequence filtering and selection

NOTES

Quality scores (Phred-scaled) are reversed in lock-step with the sequence so that positional quality information remains valid after the reverse complement operation. This is essential for downstream tools that rely on per-base quality for alignment or variant calling.

Ambiguous IUPAC characters and gap symbols (-) are handled gracefully: standard ambiguous bases are complemented according to IUPAC rules, while gap and missing-data symbols are preserved unchanged.