Files
obitools4/doc/man/obigrep.qmd
Eric Coissac d88de15cdc Refactoring codes for removing buffer size options. An some other changes...
Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
2023-03-07 11:12:13 +07:00

154 lines
4.4 KiB
Plaintext

---
title: "obigrep"
section: 1
author: Eric Coissac <eric.coissac@metabarcoding.org>
format:
html: default
man: default
---
# NAME
obigrep -- filters sequence files according to numerous conditions
# SYNOPSIS
**obigrep** \[**\--attribute** | **-a** _KEY=VALUE_]...
\[**\--compress** | **-Z**]
\[**\--debug**]
\[**\--definition**|**-D** _PATTERN_]...
\[**\--ecopcr**]
\[**\--embl**]
\[**\--fasta-output**]
\[**\--fastq-output**]
\[**\--genbank**]
\[**\--has-attribute** | **-A** _KEY_]...
\[**\--help** | **-h** | **-?**]
\[**\--id-list** _FILENAME_]
\[**\--identifier** | **-I** _PATTERN_]...
\[**\--ignore-taxon** | **-i** _TAXID_]...
\[**\--input-OBI-header**]
\[**\--input-json-header**]
\[**\--inverse-match** | **-v**]
\[**\--max-count**|**-C** _COUNT_]
\[**\--max-cpu** _INT_]
\[**\--max-length** | **-L** _LENGTH_]
\[**\--min-count** | **-c** _COUNT_]
\[**\--min-length** | **-l** _LENGTH_]
\[**\--no-order**]
\[**\--no-progressbar**]
\[**\--out** | **-o** _FILENAME_]
\[**\--output-OBI-header** | **-O**]
\[**\--output-json-header**]
\[**\--paired-mode** _forward|reverse|and|or|andnot|xor_]
\[**\--paired-with** _FILENAME_]
\[**\--predicate**|**-p** _EXPRESSION_]...
\[**\--require-rank** _RANK_NAME_]...
\[**\--restrict-to-taxon** | **-r** _TAXID_]...
\[**\--save-discarded** _FILENAME_]
\[**\--sequence**|**-s** _PATTERN_]...
\[**\--solexa**]
\[**\--taxdump** | **-t** _DIRECTORY_]
\[**\--workers** | **-w** _INT_] [_FILENAMES_]
# DESCRIPTION
{{< include ../lib/descriptions/_obigrep.qmd >}}
# OPTIONS
## General options
{{< include ../lib/options/_system.qmd >}}
## Input format options
The OBITools are centered around the [FASTA] (https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ] (https://en.wikipedia.org/wiki/FASTQ_format) formats. These formats are automaticaly recognized when data are read both from files, and from standard input (`stdin`). Other formats (genbank, EMBL, ecopcr) are also automatically identified when data are read from files, but for stdin input, input format must be indicated using one of the following options.
## Output format options
{{< include ../lib/options/_output.qmd >}}
## Paired reads options
**\--paired-with** _FILENAME_
**\--paired-mode** _forward|reverse|and|or|andnot|xor_
## Taxonomy related options
**\--taxdump** | **-t** _DIRECTORY_
**\--ignore-taxon** | **-i** _TAXID_
**\--require-rank** _RANK_NAME_
**\--restrict-to-taxon** | **-r** _TAXID_
## Filtering options
**\--has-attribute** | **-A** _KEY_...
**\--id-list** _FILENAME_
**\--identifier** | **-I** _PATTERN_
{{< include ../lib/options/selection/_max-count.qmd >}}
{{< include ../lib/options/selection/_min-count.qmd >}}
{{< include ../lib/options/selection/_max-length.qmd >}}
{{< include ../lib/options/selection/_min-length.qmd >}}
**\--predicate**|**-p** _EXPRESSION_
{{< include ../lib/options/selection/_sequence.qmd >}}
**\--inverse-match** | **-v**
**\--save-discarded** _FILENAME_
# ENVIRONMENT
**OBICPUMAX**
# EXAMPLES
- Filtering sequence file to keep only barcodes between 8 and 130 bp.
```bash
obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
```
- Filtering reads without anbiguity base code in its sequence.
```bash
obigrep -s '^[acgt]+$' data_SPER01.fasta > data_onlyACGT_SPER01.fasta
```
- Filtering paired files for keeping only pairs of read without ambiguity.
```bash
obigrep -s '^[acgt]+$' \
--paired-mode and --paired-with wolf_R.fastq.gz \
--out wolf_good.fastq \
wolf_F.fastq.gz
```
That command produces two files `wolf_good_R1.fastq` and `wolf_good_R1.fastq`
containing respectively the filtered forward and reverse reads.
# SEE ALSO
`obiannotate`
# HISTORY
# BUGS
Submit bug reports online at: https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues