mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
301 lines
9.2 KiB
Markdown
301 lines
9.2 KiB
Markdown
# NAME
|
||
|
||
obicomplement — reverse complement of sequences
|
||
|
||
---
|
||
|
||
# SYNOPSIS
|
||
|
||
```
|
||
obicomplement [--batch-mem <string>] [--batch-size <int>]
|
||
[--batch-size-max <int>] [--compress|-Z] [--csv] [--debug]
|
||
[--ecopcr] [--embl] [--fail-on-taxonomy] [--fasta]
|
||
[--fasta-output] [--fastq] [--fastq-output] [--genbank]
|
||
[--help|-h|-?] [--input-OBI-header] [--input-json-header]
|
||
[--json-output] [--max-cpu <int>] [--no-order]
|
||
[--no-progressbar] [--out|-o <FILENAME>]
|
||
[--output-OBI-header|-O] [--output-json-header]
|
||
[--paired-with <FILENAME>] [--raw-taxid] [--silent-warning]
|
||
[--skip-empty] [--solexa] [--taxonomy|-t <string>] [--u-to-t]
|
||
[--update-taxid] [--with-leaves] [<args>]
|
||
```
|
||
|
||
---
|
||
|
||
# DESCRIPTION
|
||
|
||
`obicomplement` computes the reverse complement of every sequence in the
|
||
input. For each input sequence, the nucleotides are first reversed, then
|
||
each base is replaced by its Watson–Crick complement (A↔T, C↔G), yielding
|
||
the strand that would pair with the original sequence read in the opposite
|
||
direction.
|
||
|
||
When quality scores are present (FASTQ data), they are reversed in the same
|
||
order as the sequence so that each quality value remains associated with its
|
||
corresponding base. Ambiguous IUPAC characters (e.g. `N`, `R`, `Y`) are
|
||
handled correctly and preserved in the output.
|
||
|
||
This operation is commonly needed when sequences have been sequenced on the
|
||
wrong strand, when a primer is designed on the reverse strand, or when
|
||
preparing sequences for strand-aware downstream analyses.
|
||
|
||
The command reads from standard input or from one or more files, processes
|
||
sequences in parallel, and writes the result to standard output or to the
|
||
file specified with `--out`.
|
||
|
||
---
|
||
|
||
# INPUT
|
||
|
||
`obicomplement` accepts biological sequence data in FASTA, FASTQ, EMBL,
|
||
GenBank, ecoPCR output, and CSV formats. When no format flag is given, the
|
||
format is inferred automatically from the file contents or extension.
|
||
|
||
Input is read from standard input when no filename argument is provided, or
|
||
from one or more files passed as positional arguments. Gzip-compressed files
|
||
are handled transparently.
|
||
|
||
Paired-end data can be provided with `--paired-with`, which specifies the
|
||
file containing the second mate. Both mates are reverse-complemented and
|
||
written to separate output files.
|
||
|
||
---
|
||
|
||
# OUTPUT
|
||
|
||
The output is a sequence file in which every sequence is the reverse
|
||
complement of the corresponding input sequence. The output format matches
|
||
the input by default (FASTA if no quality data, FASTQ if quality data are
|
||
present), and can be overridden with `--fasta-output`, `--fastq-output`, or
|
||
`--json-output`.
|
||
|
||
All annotations (attributes stored in the sequence header) are preserved
|
||
unchanged. Quality scores, when present, are reversed to stay aligned with
|
||
their bases.
|
||
|
||
## Observed output example
|
||
|
||
```
|
||
>seq001 {"definition":"basic DNA sequence"}
|
||
cgatcgatcgatcgatcgat
|
||
>seq002 {"definition":"GC-rich sequence"}
|
||
gcgcgcgcgcgcgcgcgcgc
|
||
>seq003 {"definition":"AT-rich sequence"}
|
||
atatatatatatatatatat
|
||
>seq004 {"definition":"palindromic sequence"}
|
||
aattccggaattccggaatt
|
||
>seq005 {"definition":"mixed sequence"}
|
||
agctagcatgcatagccgat
|
||
```
|
||
|
||
---
|
||
|
||
# OPTIONS
|
||
|
||
## Input format
|
||
|
||
**`--fasta`**
|
||
: Default: false. Force parsing of input as FASTA format.
|
||
|
||
**`--fastq`**
|
||
: Default: false. Force parsing of input as FASTQ format.
|
||
|
||
**`--embl`**
|
||
: Default: false. Force parsing of input as EMBL flatfile format.
|
||
|
||
**`--genbank`**
|
||
: Default: false. Force parsing of input as GenBank flatfile format.
|
||
|
||
**`--ecopcr`**
|
||
: Default: false. Force parsing of input as ecoPCR output format.
|
||
|
||
**`--csv`**
|
||
: Default: false. Force parsing of input as CSV format.
|
||
|
||
**`--solexa`**
|
||
: Default: false. Decode quality scores using the Solexa/Illumina pre-1.3
|
||
convention instead of the standard Phred+33 encoding.
|
||
|
||
**`--input-OBI-header`**
|
||
: Default: false. Interpret FASTA/FASTQ header annotations using the OBI
|
||
key=value format.
|
||
|
||
**`--input-json-header`**
|
||
: Default: false. Interpret FASTA/FASTQ header annotations using JSON
|
||
format.
|
||
|
||
**`--no-order`**
|
||
: Default: false. When several input files are given, declare that no
|
||
ordering relationship exists among them, allowing the reader to interleave
|
||
records freely.
|
||
|
||
**`--paired-with <FILENAME>`**
|
||
: Default: none. File containing the paired (R2) reads. When set,
|
||
`obicomplement` processes both mates and writes them to separate output
|
||
files.
|
||
|
||
## Sequence preprocessing
|
||
|
||
**`--u-to-t`**
|
||
: Default: false. Convert Uracil (U) to Thymine (T) before computing the
|
||
reverse complement. Useful when processing RNA sequences that must be
|
||
treated as DNA.
|
||
|
||
**`--skip-empty`**
|
||
: Default: false. Discard sequences of length zero from the output.
|
||
|
||
## Output format
|
||
|
||
**`--fasta-output`**
|
||
: Default: false. Write output in FASTA format regardless of whether quality
|
||
scores are present.
|
||
|
||
**`--fastq-output`**
|
||
: Default: false. Write output in FASTQ format (requires quality data).
|
||
|
||
**`--json-output`**
|
||
: Default: false. Write output in JSON format.
|
||
|
||
**`--out|-o <FILENAME>`**
|
||
: Default: `-` (standard output). File used to save the output.
|
||
|
||
**`--output-OBI-header|-O`**
|
||
: Default: false. Write FASTA/FASTQ header annotations in OBI key=value
|
||
format.
|
||
|
||
**`--output-json-header`**
|
||
: Default: false. Write FASTA/FASTQ header annotations in JSON format.
|
||
|
||
**`--compress|-Z`**
|
||
: Default: false. Compress the output with gzip.
|
||
|
||
## Taxonomy
|
||
|
||
**`--taxonomy|-t <string>`**
|
||
: Default: none. Path to a taxonomy database. Required only when the input
|
||
sequences carry taxid annotations that need to be validated or updated.
|
||
|
||
**`--fail-on-taxonomy`**
|
||
: Default: false. Cause `obicomplement` to exit with an error if a taxid
|
||
referenced in the data is not a currently valid node in the loaded
|
||
taxonomy.
|
||
|
||
**`--update-taxid`**
|
||
: Default: false. Automatically replace taxids that have been declared
|
||
merged into a newer node by the taxonomy database.
|
||
|
||
**`--raw-taxid`**
|
||
: Default: false. Print taxids without appending the taxon name and rank.
|
||
|
||
**`--with-leaves`**
|
||
: Default: false. When the taxonomy is extracted from the sequence file,
|
||
attach sequences as leaves of their taxid node.
|
||
|
||
## Performance and diagnostics
|
||
|
||
**`--max-cpu <int>`**
|
||
: Default: 16 (env: `OBIMAXCPU`). Number of parallel threads used to
|
||
process sequences.
|
||
|
||
**`--batch-size <int>`**
|
||
: Default: 1 (env: `OBIBATCHSIZE`). Minimum number of sequences per
|
||
processing batch.
|
||
|
||
**`--batch-size-max <int>`**
|
||
: Default: 2000 (env: `OBIBATCHSIZEMAX`). Maximum number of sequences per
|
||
processing batch.
|
||
|
||
**`--batch-mem <string>`**
|
||
: Default: `128M` (env: `OBIBATCHMEM`). Maximum memory allocated per batch
|
||
(e.g. `128K`, `64M`, `1G`). Set to `0` to disable the memory limit.
|
||
|
||
**`--no-progressbar`**
|
||
: Default: false. Disable the progress bar printed to stderr.
|
||
|
||
**`--silent-warning`**
|
||
: Default: false (env: `OBIWARNING`). Suppress warning messages.
|
||
|
||
**`--debug`**
|
||
: Default: false (env: `OBIDEBUG`). Enable debug logging.
|
||
|
||
---
|
||
|
||
# EXAMPLES
|
||
|
||
```bash
|
||
# Reverse complement all sequences in a FASTA file
|
||
obicomplement sequences.fasta > out_default.fasta
|
||
```
|
||
|
||
**Expected output:** 5 sequences written to `out_default.fasta`.
|
||
|
||
```bash
|
||
# Reverse complement a FASTQ file, preserving quality scores
|
||
obicomplement reads.fastq --fastq-output --out out_fastq.fastq
|
||
```
|
||
|
||
**Expected output:** 5 sequences written to `out_fastq.fastq`.
|
||
|
||
```bash
|
||
# Convert RNA sequences to their reverse complement DNA strand
|
||
obicomplement --u-to-t rna_sequences.fasta > out_rna_rc.fasta
|
||
```
|
||
|
||
**Expected output:** 3 sequences written to `out_rna_rc.fasta`.
|
||
|
||
```bash
|
||
# Reverse complement paired-end reads into two separate output files
|
||
obicomplement R1.fastq --paired-with R2.fastq --out out_paired.fastq
|
||
```
|
||
|
||
**Expected output:** 3 sequences written to `out_paired_R1.fastq` and 3 sequences to `out_paired_R2.fastq`.
|
||
|
||
```bash
|
||
# Reverse complement and compress output, skipping any empty sequences
|
||
obicomplement --skip-empty --compress sequences.fasta --out out_compressed.fasta.gz
|
||
```
|
||
|
||
**Expected output:** 5 sequences written to `out_compressed.fasta.gz` (gzip-compressed FASTA).
|
||
|
||
```bash
|
||
# Reverse complement with OBI-format header output
|
||
obicomplement --output-OBI-header sequences.fasta --out out_obi.fasta
|
||
```
|
||
|
||
**Expected output:** 5 sequences written to `out_obi.fasta`.
|
||
|
||
```bash
|
||
# Reverse complement with explicit JSON-format header output
|
||
obicomplement --output-json-header sequences.fasta --out out_jsonheader.fasta
|
||
```
|
||
|
||
**Expected output:** 5 sequences written to `out_jsonheader.fasta`.
|
||
|
||
```bash
|
||
# Reverse complement and write full JSON output format
|
||
obicomplement --json-output sequences.fasta --out out_json.json
|
||
```
|
||
|
||
**Expected output:** 5 sequences written to `out_json.json`.
|
||
|
||
---
|
||
|
||
# SEE ALSO
|
||
|
||
- `obiconvert` — format conversion and sequence filtering pipeline
|
||
- `obipairing` — paired-end read merging (uses reverse complement internally)
|
||
- `obigrep` — sequence filtering and selection
|
||
|
||
---
|
||
|
||
# NOTES
|
||
|
||
Quality scores (Phred-scaled) are reversed in lock-step with the sequence
|
||
so that positional quality information remains valid after the reverse
|
||
complement operation. This is essential for downstream tools that rely on
|
||
per-base quality for alignment or variant calling.
|
||
|
||
Ambiguous IUPAC characters and gap symbols (`-`) are handled gracefully:
|
||
standard ambiguous bases are complemented according to IUPAC rules, while
|
||
gap and missing-data symbols are preserved unchanged.
|