mirror of
https://github.com/metabarcoding/obitools4.git
synced 2025-06-29 16:20:46 +00:00
51 lines
1.5 KiB
Plaintext
51 lines
1.5 KiB
Plaintext
# Sequence sampling and filtering
|
|
|
|
## `obigrep` -- filters sequence files according to numerous conditions
|
|
|
|
{{< include ../lib/descriptions/_obigrep.qmd >}}
|
|
|
|
### The options usable with `obigrep`
|
|
|
|
#### Selecting sequences based on their caracteristics
|
|
|
|
Sequences can be selected on several of their caracteristics, their length, their id, their sequence. Options allow for specifying the condition if selection.
|
|
|
|
**Selection based on the sequence**
|
|
|
|
|
|
Sequence records can be selected according if they match or not with a pattern. The simplest pattern is as short sequence (*e.g* `AACCTT`). But the usage of regular patterns allows for looking for more complex pattern. As example, `A[TG]C+G` matches a `A`, followed by a `T` or a `G`, then one or several `C` and endly a `G`.
|
|
|
|
{{< include ../lib/options/selection/_sequence.qmd >}}
|
|
|
|
*Examples:*
|
|
|
|
: Selects only the sequence records that contain an *EcoRI* restriction site.
|
|
|
|
```bash
|
|
obigrep -s 'GAATTC' seq1.fasta > seq2.fasta
|
|
```
|
|
|
|
: Selects only the sequence records that contain a stretch of at least 10 ``A``.
|
|
|
|
```bash
|
|
obigrep -s 'A{10,}' seq1.fasta > seq2.fasta
|
|
```
|
|
|
|
: Selects only the sequence records that do not contain ambiguous nucleotides.
|
|
|
|
```bash
|
|
obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta
|
|
```
|
|
|
|
|
|
{{< include ../lib/options/selection/_min-count.qmd >}}
|
|
|
|
{{< include ../lib/options/selection/_max-count.qmd >}}
|
|
|
|
*Examples*
|
|
|
|
: Selecting sequence records representing at least five reads in the dataset.
|
|
|
|
```bash
|
|
obigrep -c 5 data_SPER01.fasta > data_norare_SPER01.fasta
|
|
``` |