mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,300 @@
|
||||
# NAME
|
||||
|
||||
obicomplement — reverse complement of sequences
|
||||
|
||||
---
|
||||
|
||||
# SYNOPSIS
|
||||
|
||||
```
|
||||
obicomplement [--batch-mem <string>] [--batch-size <int>]
|
||||
[--batch-size-max <int>] [--compress|-Z] [--csv] [--debug]
|
||||
[--ecopcr] [--embl] [--fail-on-taxonomy] [--fasta]
|
||||
[--fasta-output] [--fastq] [--fastq-output] [--genbank]
|
||||
[--help|-h|-?] [--input-OBI-header] [--input-json-header]
|
||||
[--json-output] [--max-cpu <int>] [--no-order]
|
||||
[--no-progressbar] [--out|-o <FILENAME>]
|
||||
[--output-OBI-header|-O] [--output-json-header]
|
||||
[--paired-with <FILENAME>] [--raw-taxid] [--silent-warning]
|
||||
[--skip-empty] [--solexa] [--taxonomy|-t <string>] [--u-to-t]
|
||||
[--update-taxid] [--with-leaves] [<args>]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# DESCRIPTION
|
||||
|
||||
`obicomplement` computes the reverse complement of every sequence in the
|
||||
input. For each input sequence, the nucleotides are first reversed, then
|
||||
each base is replaced by its Watson–Crick complement (A↔T, C↔G), yielding
|
||||
the strand that would pair with the original sequence read in the opposite
|
||||
direction.
|
||||
|
||||
When quality scores are present (FASTQ data), they are reversed in the same
|
||||
order as the sequence so that each quality value remains associated with its
|
||||
corresponding base. Ambiguous IUPAC characters (e.g. `N`, `R`, `Y`) are
|
||||
handled correctly and preserved in the output.
|
||||
|
||||
This operation is commonly needed when sequences have been sequenced on the
|
||||
wrong strand, when a primer is designed on the reverse strand, or when
|
||||
preparing sequences for strand-aware downstream analyses.
|
||||
|
||||
The command reads from standard input or from one or more files, processes
|
||||
sequences in parallel, and writes the result to standard output or to the
|
||||
file specified with `--out`.
|
||||
|
||||
---
|
||||
|
||||
# INPUT
|
||||
|
||||
`obicomplement` accepts biological sequence data in FASTA, FASTQ, EMBL,
|
||||
GenBank, ecoPCR output, and CSV formats. When no format flag is given, the
|
||||
format is inferred automatically from the file contents or extension.
|
||||
|
||||
Input is read from standard input when no filename argument is provided, or
|
||||
from one or more files passed as positional arguments. Gzip-compressed files
|
||||
are handled transparently.
|
||||
|
||||
Paired-end data can be provided with `--paired-with`, which specifies the
|
||||
file containing the second mate. Both mates are reverse-complemented and
|
||||
written to separate output files.
|
||||
|
||||
---
|
||||
|
||||
# OUTPUT
|
||||
|
||||
The output is a sequence file in which every sequence is the reverse
|
||||
complement of the corresponding input sequence. The output format matches
|
||||
the input by default (FASTA if no quality data, FASTQ if quality data are
|
||||
present), and can be overridden with `--fasta-output`, `--fastq-output`, or
|
||||
`--json-output`.
|
||||
|
||||
All annotations (attributes stored in the sequence header) are preserved
|
||||
unchanged. Quality scores, when present, are reversed to stay aligned with
|
||||
their bases.
|
||||
|
||||
## Observed output example
|
||||
|
||||
```
|
||||
>seq001 {"definition":"basic DNA sequence"}
|
||||
cgatcgatcgatcgatcgat
|
||||
>seq002 {"definition":"GC-rich sequence"}
|
||||
gcgcgcgcgcgcgcgcgcgc
|
||||
>seq003 {"definition":"AT-rich sequence"}
|
||||
atatatatatatatatatat
|
||||
>seq004 {"definition":"palindromic sequence"}
|
||||
aattccggaattccggaatt
|
||||
>seq005 {"definition":"mixed sequence"}
|
||||
agctagcatgcatagccgat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# OPTIONS
|
||||
|
||||
## Input format
|
||||
|
||||
**`--fasta`**
|
||||
: Default: false. Force parsing of input as FASTA format.
|
||||
|
||||
**`--fastq`**
|
||||
: Default: false. Force parsing of input as FASTQ format.
|
||||
|
||||
**`--embl`**
|
||||
: Default: false. Force parsing of input as EMBL flatfile format.
|
||||
|
||||
**`--genbank`**
|
||||
: Default: false. Force parsing of input as GenBank flatfile format.
|
||||
|
||||
**`--ecopcr`**
|
||||
: Default: false. Force parsing of input as ecoPCR output format.
|
||||
|
||||
**`--csv`**
|
||||
: Default: false. Force parsing of input as CSV format.
|
||||
|
||||
**`--solexa`**
|
||||
: Default: false. Decode quality scores using the Solexa/Illumina pre-1.3
|
||||
convention instead of the standard Phred+33 encoding.
|
||||
|
||||
**`--input-OBI-header`**
|
||||
: Default: false. Interpret FASTA/FASTQ header annotations using the OBI
|
||||
key=value format.
|
||||
|
||||
**`--input-json-header`**
|
||||
: Default: false. Interpret FASTA/FASTQ header annotations using JSON
|
||||
format.
|
||||
|
||||
**`--no-order`**
|
||||
: Default: false. When several input files are given, declare that no
|
||||
ordering relationship exists among them, allowing the reader to interleave
|
||||
records freely.
|
||||
|
||||
**`--paired-with <FILENAME>`**
|
||||
: Default: none. File containing the paired (R2) reads. When set,
|
||||
`obicomplement` processes both mates and writes them to separate output
|
||||
files.
|
||||
|
||||
## Sequence preprocessing
|
||||
|
||||
**`--u-to-t`**
|
||||
: Default: false. Convert Uracil (U) to Thymine (T) before computing the
|
||||
reverse complement. Useful when processing RNA sequences that must be
|
||||
treated as DNA.
|
||||
|
||||
**`--skip-empty`**
|
||||
: Default: false. Discard sequences of length zero from the output.
|
||||
|
||||
## Output format
|
||||
|
||||
**`--fasta-output`**
|
||||
: Default: false. Write output in FASTA format regardless of whether quality
|
||||
scores are present.
|
||||
|
||||
**`--fastq-output`**
|
||||
: Default: false. Write output in FASTQ format (requires quality data).
|
||||
|
||||
**`--json-output`**
|
||||
: Default: false. Write output in JSON format.
|
||||
|
||||
**`--out|-o <FILENAME>`**
|
||||
: Default: `-` (standard output). File used to save the output.
|
||||
|
||||
**`--output-OBI-header|-O`**
|
||||
: Default: false. Write FASTA/FASTQ header annotations in OBI key=value
|
||||
format.
|
||||
|
||||
**`--output-json-header`**
|
||||
: Default: false. Write FASTA/FASTQ header annotations in JSON format.
|
||||
|
||||
**`--compress|-Z`**
|
||||
: Default: false. Compress the output with gzip.
|
||||
|
||||
## Taxonomy
|
||||
|
||||
**`--taxonomy|-t <string>`**
|
||||
: Default: none. Path to a taxonomy database. Required only when the input
|
||||
sequences carry taxid annotations that need to be validated or updated.
|
||||
|
||||
**`--fail-on-taxonomy`**
|
||||
: Default: false. Cause `obicomplement` to exit with an error if a taxid
|
||||
referenced in the data is not a currently valid node in the loaded
|
||||
taxonomy.
|
||||
|
||||
**`--update-taxid`**
|
||||
: Default: false. Automatically replace taxids that have been declared
|
||||
merged into a newer node by the taxonomy database.
|
||||
|
||||
**`--raw-taxid`**
|
||||
: Default: false. Print taxids without appending the taxon name and rank.
|
||||
|
||||
**`--with-leaves`**
|
||||
: Default: false. When the taxonomy is extracted from the sequence file,
|
||||
attach sequences as leaves of their taxid node.
|
||||
|
||||
## Performance and diagnostics
|
||||
|
||||
**`--max-cpu <int>`**
|
||||
: Default: 16 (env: `OBIMAXCPU`). Number of parallel threads used to
|
||||
process sequences.
|
||||
|
||||
**`--batch-size <int>`**
|
||||
: Default: 1 (env: `OBIBATCHSIZE`). Minimum number of sequences per
|
||||
processing batch.
|
||||
|
||||
**`--batch-size-max <int>`**
|
||||
: Default: 2000 (env: `OBIBATCHSIZEMAX`). Maximum number of sequences per
|
||||
processing batch.
|
||||
|
||||
**`--batch-mem <string>`**
|
||||
: Default: `128M` (env: `OBIBATCHMEM`). Maximum memory allocated per batch
|
||||
(e.g. `128K`, `64M`, `1G`). Set to `0` to disable the memory limit.
|
||||
|
||||
**`--no-progressbar`**
|
||||
: Default: false. Disable the progress bar printed to stderr.
|
||||
|
||||
**`--silent-warning`**
|
||||
: Default: false (env: `OBIWARNING`). Suppress warning messages.
|
||||
|
||||
**`--debug`**
|
||||
: Default: false (env: `OBIDEBUG`). Enable debug logging.
|
||||
|
||||
---
|
||||
|
||||
# EXAMPLES
|
||||
|
||||
```bash
|
||||
# Reverse complement all sequences in a FASTA file
|
||||
obicomplement sequences.fasta > out_default.fasta
|
||||
```
|
||||
|
||||
**Expected output:** 5 sequences written to `out_default.fasta`.
|
||||
|
||||
```bash
|
||||
# Reverse complement a FASTQ file, preserving quality scores
|
||||
obicomplement reads.fastq --fastq-output --out out_fastq.fastq
|
||||
```
|
||||
|
||||
**Expected output:** 5 sequences written to `out_fastq.fastq`.
|
||||
|
||||
```bash
|
||||
# Convert RNA sequences to their reverse complement DNA strand
|
||||
obicomplement --u-to-t rna_sequences.fasta > out_rna_rc.fasta
|
||||
```
|
||||
|
||||
**Expected output:** 3 sequences written to `out_rna_rc.fasta`.
|
||||
|
||||
```bash
|
||||
# Reverse complement paired-end reads into two separate output files
|
||||
obicomplement R1.fastq --paired-with R2.fastq --out out_paired.fastq
|
||||
```
|
||||
|
||||
**Expected output:** 3 sequences written to `out_paired_R1.fastq` and 3 sequences to `out_paired_R2.fastq`.
|
||||
|
||||
```bash
|
||||
# Reverse complement and compress output, skipping any empty sequences
|
||||
obicomplement --skip-empty --compress sequences.fasta --out out_compressed.fasta.gz
|
||||
```
|
||||
|
||||
**Expected output:** 5 sequences written to `out_compressed.fasta.gz` (gzip-compressed FASTA).
|
||||
|
||||
```bash
|
||||
# Reverse complement with OBI-format header output
|
||||
obicomplement --output-OBI-header sequences.fasta --out out_obi.fasta
|
||||
```
|
||||
|
||||
**Expected output:** 5 sequences written to `out_obi.fasta`.
|
||||
|
||||
```bash
|
||||
# Reverse complement with explicit JSON-format header output
|
||||
obicomplement --output-json-header sequences.fasta --out out_jsonheader.fasta
|
||||
```
|
||||
|
||||
**Expected output:** 5 sequences written to `out_jsonheader.fasta`.
|
||||
|
||||
```bash
|
||||
# Reverse complement and write full JSON output format
|
||||
obicomplement --json-output sequences.fasta --out out_json.json
|
||||
```
|
||||
|
||||
**Expected output:** 5 sequences written to `out_json.json`.
|
||||
|
||||
---
|
||||
|
||||
# SEE ALSO
|
||||
|
||||
- `obiconvert` — format conversion and sequence filtering pipeline
|
||||
- `obipairing` — paired-end read merging (uses reverse complement internally)
|
||||
- `obigrep` — sequence filtering and selection
|
||||
|
||||
---
|
||||
|
||||
# NOTES
|
||||
|
||||
Quality scores (Phred-scaled) are reversed in lock-step with the sequence
|
||||
so that positional quality information remains valid after the reverse
|
||||
complement operation. This is essential for downstream tools that rely on
|
||||
per-base quality for alignment or variant calling.
|
||||
|
||||
Ambiguous IUPAC characters and gap symbols (`-`) are handled gracefully:
|
||||
standard ambiguous bases are complemented according to IUPAC rules, while
|
||||
gap and missing-data symbols are preserved unchanged.
|
||||
Reference in New Issue
Block a user