⬆️ version bump to v4.5

- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2026-04-30 03:50:39 +00:00 · 2026-04-07 08:36:50 +02:00
parent 670edc1958
commit 8c7017a99d
392 changed files with 18875 additions and 141 deletions
@@ -0,0 +1,300 @@
+# NAME
+
+obicomplement — reverse complement of sequences
+
+---
+
+# SYNOPSIS
+
+```
+obicomplement [--batch-mem <string>] [--batch-size <int>]
+              [--batch-size-max <int>] [--compress|-Z] [--csv] [--debug]
+              [--ecopcr] [--embl] [--fail-on-taxonomy] [--fasta]
+              [--fasta-output] [--fastq] [--fastq-output] [--genbank]
+              [--help|-h|-?] [--input-OBI-header] [--input-json-header]
+              [--json-output] [--max-cpu <int>] [--no-order]
+              [--no-progressbar] [--out|-o <FILENAME>]
+              [--output-OBI-header|-O] [--output-json-header]
+              [--paired-with <FILENAME>] [--raw-taxid] [--silent-warning]
+              [--skip-empty] [--solexa] [--taxonomy|-t <string>] [--u-to-t]
+              [--update-taxid] [--with-leaves] [<args>]
+```
+
+---
+
+# DESCRIPTION
+
+`obicomplement` computes the reverse complement of every sequence in the
+input. For each input sequence, the nucleotides are first reversed, then
+each base is replaced by its Watson–Crick complement (A↔T, C↔G), yielding
+the strand that would pair with the original sequence read in the opposite
+direction.
+
+When quality scores are present (FASTQ data), they are reversed in the same
+order as the sequence so that each quality value remains associated with its
+corresponding base. Ambiguous IUPAC characters (e.g. `N`, `R`, `Y`) are
+handled correctly and preserved in the output.
+
+This operation is commonly needed when sequences have been sequenced on the
+wrong strand, when a primer is designed on the reverse strand, or when
+preparing sequences for strand-aware downstream analyses.
+
+The command reads from standard input or from one or more files, processes
+sequences in parallel, and writes the result to standard output or to the
+file specified with `--out`.
+
+---
+
+# INPUT
+
+`obicomplement` accepts biological sequence data in FASTA, FASTQ, EMBL,
+GenBank, ecoPCR output, and CSV formats. When no format flag is given, the
+format is inferred automatically from the file contents or extension.
+
+Input is read from standard input when no filename argument is provided, or
+from one or more files passed as positional arguments. Gzip-compressed files
+are handled transparently.
+
+Paired-end data can be provided with `--paired-with`, which specifies the
+file containing the second mate. Both mates are reverse-complemented and
+written to separate output files.
+
+---
+
+# OUTPUT
+
+The output is a sequence file in which every sequence is the reverse
+complement of the corresponding input sequence. The output format matches
+the input by default (FASTA if no quality data, FASTQ if quality data are
+present), and can be overridden with `--fasta-output`, `--fastq-output`, or
+`--json-output`.
+
+All annotations (attributes stored in the sequence header) are preserved
+unchanged. Quality scores, when present, are reversed to stay aligned with
+their bases.
+
+## Observed output example
+
+```
+>seq001 {"definition":"basic DNA sequence"}
+cgatcgatcgatcgatcgat
+>seq002 {"definition":"GC-rich sequence"}
+gcgcgcgcgcgcgcgcgcgc
+>seq003 {"definition":"AT-rich sequence"}
+atatatatatatatatatat
+>seq004 {"definition":"palindromic sequence"}
+aattccggaattccggaatt
+>seq005 {"definition":"mixed sequence"}
+agctagcatgcatagccgat
+```
+
+---
+
+# OPTIONS
+
+## Input format
+
+**`--fasta`**
+: Default: false. Force parsing of input as FASTA format.
+
+**`--fastq`**
+: Default: false. Force parsing of input as FASTQ format.
+
+**`--embl`**
+: Default: false. Force parsing of input as EMBL flatfile format.
+
+**`--genbank`**
+: Default: false. Force parsing of input as GenBank flatfile format.
+
+**`--ecopcr`**
+: Default: false. Force parsing of input as ecoPCR output format.
+
+**`--csv`**
+: Default: false. Force parsing of input as CSV format.
+
+**`--solexa`**
+: Default: false. Decode quality scores using the Solexa/Illumina pre-1.3
+  convention instead of the standard Phred+33 encoding.
+
+**`--input-OBI-header`**
+: Default: false. Interpret FASTA/FASTQ header annotations using the OBI
+  key=value format.
+
+**`--input-json-header`**
+: Default: false. Interpret FASTA/FASTQ header annotations using JSON
+  format.
+
+**`--no-order`**
+: Default: false. When several input files are given, declare that no
+  ordering relationship exists among them, allowing the reader to interleave
+  records freely.
+
+**`--paired-with <FILENAME>`**
+: Default: none. File containing the paired (R2) reads. When set,
+  `obicomplement` processes both mates and writes them to separate output
+  files.
+
+## Sequence preprocessing
+
+**`--u-to-t`**
+: Default: false. Convert Uracil (U) to Thymine (T) before computing the
+  reverse complement. Useful when processing RNA sequences that must be
+  treated as DNA.
+
+**`--skip-empty`**
+: Default: false. Discard sequences of length zero from the output.
+
+## Output format
+
+**`--fasta-output`**
+: Default: false. Write output in FASTA format regardless of whether quality
+  scores are present.
+
+**`--fastq-output`**
+: Default: false. Write output in FASTQ format (requires quality data).
+
+**`--json-output`**
+: Default: false. Write output in JSON format.
+
+**`--out|-o <FILENAME>`**
+: Default: `-` (standard output). File used to save the output.
+
+**`--output-OBI-header|-O`**
+: Default: false. Write FASTA/FASTQ header annotations in OBI key=value
+  format.
+
+**`--output-json-header`**
+: Default: false. Write FASTA/FASTQ header annotations in JSON format.
+
+**`--compress|-Z`**
+: Default: false. Compress the output with gzip.
+
+## Taxonomy
+
+**`--taxonomy|-t <string>`**
+: Default: none. Path to a taxonomy database. Required only when the input
+  sequences carry taxid annotations that need to be validated or updated.
+
+**`--fail-on-taxonomy`**
+: Default: false. Cause `obicomplement` to exit with an error if a taxid
+  referenced in the data is not a currently valid node in the loaded
+  taxonomy.
+
+**`--update-taxid`**
+: Default: false. Automatically replace taxids that have been declared
+  merged into a newer node by the taxonomy database.
+
+**`--raw-taxid`**
+: Default: false. Print taxids without appending the taxon name and rank.
+
+**`--with-leaves`**
+: Default: false. When the taxonomy is extracted from the sequence file,
+  attach sequences as leaves of their taxid node.
+
+## Performance and diagnostics
+
+**`--max-cpu <int>`**
+: Default: 16 (env: `OBIMAXCPU`). Number of parallel threads used to
+  process sequences.
+
+**`--batch-size <int>`**
+: Default: 1 (env: `OBIBATCHSIZE`). Minimum number of sequences per
+  processing batch.
+
+**`--batch-size-max <int>`**
+: Default: 2000 (env: `OBIBATCHSIZEMAX`). Maximum number of sequences per
+  processing batch.
+
+**`--batch-mem <string>`**
+: Default: `128M` (env: `OBIBATCHMEM`). Maximum memory allocated per batch
+  (e.g. `128K`, `64M`, `1G`). Set to `0` to disable the memory limit.
+
+**`--no-progressbar`**
+: Default: false. Disable the progress bar printed to stderr.
+
+**`--silent-warning`**
+: Default: false (env: `OBIWARNING`). Suppress warning messages.
+
+**`--debug`**
+: Default: false (env: `OBIDEBUG`). Enable debug logging.
+
+---
+
+# EXAMPLES
+
+```bash
+# Reverse complement all sequences in a FASTA file
+obicomplement sequences.fasta > out_default.fasta
+```
+
+**Expected output:** 5 sequences written to `out_default.fasta`.
+
+```bash
+# Reverse complement a FASTQ file, preserving quality scores
+obicomplement reads.fastq --fastq-output --out out_fastq.fastq
+```
+
+**Expected output:** 5 sequences written to `out_fastq.fastq`.
+
+```bash
+# Convert RNA sequences to their reverse complement DNA strand
+obicomplement --u-to-t rna_sequences.fasta > out_rna_rc.fasta
+```
+
+**Expected output:** 3 sequences written to `out_rna_rc.fasta`.
+
+```bash
+# Reverse complement paired-end reads into two separate output files
+obicomplement R1.fastq --paired-with R2.fastq --out out_paired.fastq
+```
+
+**Expected output:** 3 sequences written to `out_paired_R1.fastq` and 3 sequences to `out_paired_R2.fastq`.
+
+```bash
+# Reverse complement and compress output, skipping any empty sequences
+obicomplement --skip-empty --compress sequences.fasta --out out_compressed.fasta.gz
+```
+
+**Expected output:** 5 sequences written to `out_compressed.fasta.gz` (gzip-compressed FASTA).
+
+```bash
+# Reverse complement with OBI-format header output
+obicomplement --output-OBI-header sequences.fasta --out out_obi.fasta
+```
+
+**Expected output:** 5 sequences written to `out_obi.fasta`.
+
+```bash
+# Reverse complement with explicit JSON-format header output
+obicomplement --output-json-header sequences.fasta --out out_jsonheader.fasta
+```
+
+**Expected output:** 5 sequences written to `out_jsonheader.fasta`.
+
+```bash
+# Reverse complement and write full JSON output format
+obicomplement --json-output sequences.fasta --out out_json.json
+```
+
+**Expected output:** 5 sequences written to `out_json.json`.
+
+---
+
+# SEE ALSO
+
+- `obiconvert` — format conversion and sequence filtering pipeline
+- `obipairing` — paired-end read merging (uses reverse complement internally)
+- `obigrep` — sequence filtering and selection
+
+---
+
+# NOTES
+
+Quality scores (Phred-scaled) are reversed in lock-step with the sequence
+so that positional quality information remains valid after the reverse
+complement operation. This is essential for downstream tools that rely on
+per-base quality for alignment or variant calling.
+
+Ambiguous IUPAC characters and gap symbols (`-`) are handled gracefully:
+standard ambiguous bases are complemented according to IUPAC rules, while
+gap and missing-data symbols are preserved unchanged.