mirror of
https://github.com/metabarcoding/obitools4.git
synced 2025-06-29 16:20:46 +00:00
Start writing of the man pages
Former-commit-id: 533a8a6380a5a0b66314081a5ec2aeb973ebadec
This commit is contained in:
239
doc/build/_man/man1/obigrep.man
vendored
Normal file
239
doc/build/_man/man1/obigrep.man
vendored
Normal file
@ -0,0 +1,239 @@
|
||||
.\" Automatically generated by Pandoc 2.19.2
|
||||
.\"
|
||||
.\" Define V font for inline verbatim, using C font in formats
|
||||
.\" that render this, and otherwise B font.
|
||||
.ie "\f[CB]x\f[]"x" \{\
|
||||
. ftr V B
|
||||
. ftr VI BI
|
||||
. ftr VB B
|
||||
. ftr VBI BI
|
||||
.\}
|
||||
.el \{\
|
||||
. ftr V CR
|
||||
. ftr VI CI
|
||||
. ftr VB CB
|
||||
. ftr VBI CBI
|
||||
.\}
|
||||
.TH "obigrep" "1" "" "" ""
|
||||
.hy
|
||||
.SH NAME
|
||||
.PP
|
||||
obigrep \[en] filters sequence files according to numerous conditions
|
||||
.SH SYNOPSIS
|
||||
.PP
|
||||
\f[B]obigrep\f[R] [\f[B]--attribute\f[R] | \f[B]-a\f[R]
|
||||
\f[I]KEY=VALUE\f[R]]\&...
|
||||
[\f[B]--compress\f[R] | \f[B]-Z\f[R]] [\f[B]--debug\f[R]]
|
||||
[\f[B]--definition\f[R]|\f[B]-D\f[R] \f[I]PATTERN\f[R]]\&...
|
||||
.PD 0
|
||||
.P
|
||||
.PD
|
||||
[\f[B]--ecopcr\f[R]] [\f[B]--embl\f[R]] [\f[B]--fasta-output\f[R]]
|
||||
[\f[B]--fastq-output\f[R]] [\f[B]--genbank\f[R]]
|
||||
[\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]]\&...
|
||||
[\f[B]--help\f[R] | \f[B]-h\f[R] | \f[B]-?\f[R]] [\f[B]--id-list\f[R]
|
||||
\f[I]FILENAME\f[R]] [\f[B]--identifier\f[R] | \f[B]-I\f[R]
|
||||
\f[I]PATTERN\f[R]]\&...
|
||||
[\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]]\&...
|
||||
[\f[B]--input-OBI-header\f[R]] [\f[B]--input-json-header\f[R]]
|
||||
[\f[B]--inverse-match\f[R] | \f[B]-v\f[R]]
|
||||
[\f[B]--max-count\f[R]|\f[B]-C\f[R] \f[I]COUNT\f[R]]
|
||||
[\f[B]--max-cpu\f[R] \f[I]INT\f[R]] [\f[B]--max-length\f[R] |
|
||||
\f[B]-L\f[R] \f[I]LENGTH\f[R]] [\f[B]--min-count\f[R] | \f[B]-c\f[R]
|
||||
\f[I]COUNT\f[R]] [\f[B]--min-length\f[R] | \f[B]-l\f[R]
|
||||
\f[I]LENGTH\f[R]] [\f[B]--no-order\f[R]] [\f[B]--no-progressbar\f[R]]
|
||||
[\f[B]--out\f[R] | \f[B]-o\f[R] \f[I]FILENAME\f[R]]
|
||||
[\f[B]--output-OBI-header\f[R] | \f[B]-O\f[R]]
|
||||
[\f[B]--output-json-header\f[R]] [\f[B]--paired-mode\f[R]
|
||||
\f[I]forward|reverse|and|or|andnot|xor\f[R]] [\f[B]--paired-with\f[R]
|
||||
\f[I]FILENAME\f[R]] [\f[B]--predicate\f[R]|\f[B]-p\f[R]
|
||||
\f[I]EXPRESSION\f[R]]\&...
|
||||
[\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]]\&...
|
||||
[\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]]\&...
|
||||
[\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]]
|
||||
[\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]]\&...
|
||||
[\f[B]--solexa\f[R]] [\f[B]--taxdump\f[R] | \f[B]-t\f[R]
|
||||
\f[I]DIRECTORY\f[R]] [\f[B]--workers\f[R] | \f[B]-w\f[R] \f[I]INT\f[R]]
|
||||
[\f[I]FILENAMES\f[R]]
|
||||
.SH DESCRIPTION
|
||||
.PP
|
||||
The \f[V]obigrep\f[R] command is somewhat analogous to the standard Unix
|
||||
\f[V]grep\f[R] command.
|
||||
It selects a subset of sequence records from a sequence file.
|
||||
A sequence record is a complex object consisting of an identifier, a set
|
||||
of attributes (a key, defined by its name, associated with a value), a
|
||||
definition, and the sequence itself.
|
||||
Instead of working text line by text line like the standard Unix tool,
|
||||
\f[V]obigrep\f[R] selection is done sequence record by sequence record.
|
||||
A large number of options allow you to refine the selection on any
|
||||
element of the sequence.
|
||||
\f[V]obigrep\f[R] allows you to specify multiple conditions
|
||||
simultaneously (which take on the value \f[V]TRUE\f[R] or
|
||||
\f[V]FALSE\f[R]) and only those sequence records which meet all
|
||||
conditions (all conditions are \f[V]TRUE\f[R]) are selected.
|
||||
\f[V]obigrep\f[R] is able to work on two paired read files.
|
||||
The selection criteria apply to one or the other of the readings in each
|
||||
pair depending on the mode chosen (\f[B]--paired-mode\f[R] option).
|
||||
In all cases the selection is applied in the same way to both files,
|
||||
thus maintaining their consistency.
|
||||
.SH OPTIONS
|
||||
.SS General options
|
||||
.PP
|
||||
\f[B]Helpful options\f[R]
|
||||
.TP
|
||||
\f[B]--help\f[R], \f[B]-h\f[R]
|
||||
Display a friendly help message.
|
||||
.PP
|
||||
\f[B]--no-progressbar\f[R]
|
||||
.PP
|
||||
\f[B]Managing parallel execution\f[R]
|
||||
.TP
|
||||
\f[B]--max-cpu\f[R]
|
||||
OBITools V4 are able to run in parallel on all the CPU cores available
|
||||
on the computer.
|
||||
It is sometime required to limit the computation to a smaller number of
|
||||
cores.
|
||||
That option specify the maximum number of cores that the OBITools
|
||||
command can use.
|
||||
This behaviour can also be set up using the \f[V]OBIMAXCPU\f[R]
|
||||
environment variable.
|
||||
.PP
|
||||
\f[B]--workers\f[R], \f[B]-w\f[R]
|
||||
.PP
|
||||
\f[B]OBITools debuging related options\f[R]
|
||||
.PP
|
||||
\f[B]--debug\f[R]
|
||||
.SS Input format options
|
||||
.PP
|
||||
The OBITools are centered around the [FASTA]
|
||||
(https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ]
|
||||
(https://en.wikipedia.org/wiki/FASTQ_format) formats.
|
||||
These formats are automaticaly recognized when data are read both from
|
||||
files, and from standard input (\f[V]stdin\f[R]).
|
||||
Other formats (genbank, EMBL, ecopcr) are also automatically identified
|
||||
when data are read from files, but for stdin input, input format must be
|
||||
indicated using one of the following options.
|
||||
.SS Output format options
|
||||
.PP
|
||||
\f[B]--fasta-output\f[R]
|
||||
.PP
|
||||
\f[B]--fastq-output\f[R]
|
||||
.PP
|
||||
\f[B]--output-OBI-header\f[R], \f[B]-O\f[R]
|
||||
.PP
|
||||
\f[B]--output-json-header\f[R]
|
||||
.TP
|
||||
\f[B]--out\f[R] \f[I]FILENAME\f[R], \f[B]-o\f[R]
|
||||
OBITools, as all standard UNIX tools, print their results to the
|
||||
standard output (\f[V]stdout\f[R]).
|
||||
To save them, stdout must be redirected to a file.
|
||||
That option allows to specify explicitely an output file to the command.
|
||||
This is especially useful when OBITools are processing paired files.
|
||||
In that later case, the indicated output file names is modified by
|
||||
adding to it the \f[I]_R1\f[R] (forward file) and \f[I]_R2\f[R] (reverse
|
||||
file) suffix just before the extensions (\f[I]e.g.\f[R] sequence.fasta
|
||||
becomes sequence_R1.fasta and sequence_R2.fasta).
|
||||
If that option is not specified and paired files are processed only the
|
||||
forward data are ouputed to the \f[I]stdout\f[R].
|
||||
.TP
|
||||
\f[B]--compress\f[R], \f[B]-Z\f[R]
|
||||
The ouput is compressed following the
|
||||
gzip (https://en.wikipedia.org/wiki/Gzip) format.
|
||||
.SS Paired reads options
|
||||
.PP
|
||||
\f[B]--paired-with\f[R] \f[I]FILENAME\f[R]
|
||||
.PP
|
||||
\f[B]--paired-mode\f[R] \f[I]forward|reverse|and|or|andnot|xor\f[R]
|
||||
.SS Taxonomy related options
|
||||
.PP
|
||||
\f[B]--taxdump\f[R] | \f[B]-t\f[R] \f[I]DIRECTORY\f[R]
|
||||
.PP
|
||||
\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]
|
||||
.PP
|
||||
\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]
|
||||
.PP
|
||||
\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]
|
||||
.SS Filtering options
|
||||
.PP
|
||||
\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]\&...
|
||||
.PP
|
||||
\f[B]--id-list\f[R] \f[I]FILENAME\f[R]
|
||||
.PP
|
||||
\f[B]--identifier\f[R] | \f[B]-I\f[R] \f[I]PATTERN\f[R]
|
||||
.TP
|
||||
\f[B]--max-count\f[R] | \f[B]-C\f[R] \f[I]COUNT\f[R]
|
||||
only sequences reprensenting no more than \f[I]COUNT\f[R] reads will be
|
||||
selected.
|
||||
That option rely on the \f[V]count\f[R] attribute.
|
||||
If the \f[V]count\f[R] attribute is not defined for a sequence record,
|
||||
it is assumed equal to 1.
|
||||
.TP
|
||||
\f[B]--min-count\f[R] | \f[B]-c\f[R] \f[I]COUNT\f[R]
|
||||
only sequences reprensenting at least \f[I]COUNT\f[R] reads will be
|
||||
selected.
|
||||
That option rely on the \f[V]count\f[R] attribute.
|
||||
If the \f[V]count\f[R] attribute is not defined for a sequence record,
|
||||
it is assumed equal to 1.
|
||||
.PP
|
||||
\f[B]--max-length\f[R] | \f[B]-L\f[R] \f[I]LENGTH\f[R]
|
||||
.PP
|
||||
\f[B]--min-length\f[R] | \f[B]-l\f[R] \f[I]LENGTH\f[R]
|
||||
.PP
|
||||
\f[B]--predicate\f[R]|\f[B]-p\f[R] \f[I]EXPRESSION\f[R]
|
||||
.PP
|
||||
\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]
|
||||
.PP
|
||||
\f[B]--inverse-match\f[R] | \f[B]-v\f[R]
|
||||
.PP
|
||||
\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]
|
||||
.SH ENVIRONMENT
|
||||
.PP
|
||||
\f[B]OBICPUMAX\f[R]
|
||||
.SH EXAMPLES
|
||||
.IP \[bu] 2
|
||||
Filtering sequence file to keep only barcodes between 8 and 130 bp.
|
||||
.RS 2
|
||||
.IP
|
||||
.nf
|
||||
\f[C]
|
||||
obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
|
||||
\f[R]
|
||||
.fi
|
||||
.RE
|
||||
.IP \[bu] 2
|
||||
Filtering reads without anbiguity base code in its sequence.
|
||||
.RS 2
|
||||
.IP
|
||||
.nf
|
||||
\f[C]
|
||||
obigrep -s \[aq]\[ha][acgt]+$\[aq] data_SPER01.fasta > data_onlyACGT_SPER01.fasta
|
||||
\f[R]
|
||||
.fi
|
||||
.RE
|
||||
.IP \[bu] 2
|
||||
Filtering paired files for keeping only pairs of read without ambiguity.
|
||||
.RS 2
|
||||
.IP
|
||||
.nf
|
||||
\f[C]
|
||||
obigrep -s \[aq]\[ha][acgt]+$\[aq] \[rs]
|
||||
--paired-mode and --paired-with wolf_R.fastq.gz \[rs]
|
||||
--out wolf_good.fastq \[rs]
|
||||
wolf_F.fastq.gz
|
||||
\f[R]
|
||||
.fi
|
||||
.PP
|
||||
That command produces two files \f[V]wolf_good_R1.fastq\f[R] and
|
||||
\f[V]wolf_good_R1.fastq\f[R] containing respectively the filtered
|
||||
forward and reverse reads.
|
||||
.RE
|
||||
.SH SEE ALSO
|
||||
.PP
|
||||
\f[V]obiannotate\f[R]
|
||||
.SH HISTORY
|
||||
.SH BUGS
|
||||
.PP
|
||||
Submit bug reports online at:
|
||||
https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues
|
||||
.SH AUTHORS
|
||||
Eric Coissac <eric.coissac@metabarcoding.org>.
|
41
doc/man/Makefile
Normal file
41
doc/man/Makefile
Normal file
@ -0,0 +1,41 @@
|
||||
MANPAGES= obigrep
|
||||
|
||||
BUILDDIR=../build
|
||||
MANDIR=$(BUILDDIR)/_man
|
||||
MANDEST=$(MANDIR)/man1
|
||||
HTMLDEST=$(MANDIR)/html
|
||||
|
||||
MANSRC=$(MANPAGES:=.qmd)
|
||||
DEPS=$(patsubst %,depends/%,$(MANPAGES:=.d))
|
||||
MAN=$(patsubst %,$(MANDEST)/%,$(MANSRC:.qmd=.man))
|
||||
|
||||
|
||||
|
||||
all: $(MAN)
|
||||
|
||||
clean:
|
||||
rm -f $(MAN)
|
||||
rm -rf depends
|
||||
|
||||
.PHONY: all
|
||||
|
||||
$(MANDEST):
|
||||
@echo Creating $@ directory
|
||||
@mkdir -p $@
|
||||
|
||||
$(MAN) : $(MANDEST)/%.man : %.qmd $(MANDEST)
|
||||
@echo "Rendering the man page for " $(notdir $(@:.man=))
|
||||
@quarto render $< --to man
|
||||
@mv $(notdir $@) $@
|
||||
@echo =====================================================
|
||||
@echo
|
||||
|
||||
depends/%.d: %.qmd
|
||||
@mkdir -p depends
|
||||
@echo Generating depends file for $(notdir $(@:.qmd=))
|
||||
@awk -v src=$< 'BEGIN {printf("%s: ",src)} \
|
||||
/\{\{< *include *[^>]+>\}\}/ {sub(/^ *\{\{< *include */,"",$$0); \
|
||||
sub(/ *> *\}\} */,"",$$0); \
|
||||
printf("%s ",$$0)}' $< > $@
|
||||
|
||||
-include $(DEPS)
|
153
doc/man/obigrep.qmd
Normal file
153
doc/man/obigrep.qmd
Normal file
@ -0,0 +1,153 @@
|
||||
---
|
||||
title: "obigrep"
|
||||
section: 1
|
||||
author: Eric Coissac <eric.coissac@metabarcoding.org>
|
||||
format:
|
||||
html: default
|
||||
man: default
|
||||
---
|
||||
|
||||
# NAME
|
||||
|
||||
obigrep -- filters sequence files according to numerous conditions
|
||||
|
||||
# SYNOPSIS
|
||||
|
||||
|
||||
**obigrep** \[**\--attribute** | **-a** _KEY=VALUE_]...
|
||||
\[**\--compress** | **-Z**]
|
||||
\[**\--debug**]
|
||||
\[**\--definition**|**-D** _PATTERN_]...
|
||||
\[**\--ecopcr**]
|
||||
\[**\--embl**]
|
||||
\[**\--fasta-output**]
|
||||
\[**\--fastq-output**]
|
||||
\[**\--genbank**]
|
||||
\[**\--has-attribute** | **-A** _KEY_]...
|
||||
\[**\--help** | **-h** | **-?**]
|
||||
\[**\--id-list** _FILENAME_]
|
||||
\[**\--identifier** | **-I** _PATTERN_]...
|
||||
\[**\--ignore-taxon** | **-i** _TAXID_]...
|
||||
\[**\--input-OBI-header**]
|
||||
\[**\--input-json-header**]
|
||||
\[**\--inverse-match** | **-v**]
|
||||
\[**\--max-count**|**-C** _COUNT_]
|
||||
\[**\--max-cpu** _INT_]
|
||||
\[**\--max-length** | **-L** _LENGTH_]
|
||||
\[**\--min-count** | **-c** _COUNT_]
|
||||
\[**\--min-length** | **-l** _LENGTH_]
|
||||
\[**\--no-order**]
|
||||
\[**\--no-progressbar**]
|
||||
\[**\--out** | **-o** _FILENAME_]
|
||||
\[**\--output-OBI-header** | **-O**]
|
||||
\[**\--output-json-header**]
|
||||
\[**\--paired-mode** _forward|reverse|and|or|andnot|xor_]
|
||||
\[**\--paired-with** _FILENAME_]
|
||||
\[**\--predicate**|**-p** _EXPRESSION_]...
|
||||
\[**\--require-rank** _RANK_NAME_]...
|
||||
\[**\--restrict-to-taxon** | **-r** _TAXID_]...
|
||||
\[**\--save-discarded** _FILENAME_]
|
||||
\[**\--sequence**|**-s** _PATTERN_]...
|
||||
\[**\--solexa**]
|
||||
\[**\--taxdump** | **-t** _DIRECTORY_]
|
||||
\[**\--workers** | **-w** _INT_] [_FILENAMES_]
|
||||
|
||||
# DESCRIPTION
|
||||
|
||||
{{< include ../lib/descriptions/_obigrep.qmd >}}
|
||||
|
||||
# OPTIONS
|
||||
|
||||
## General options
|
||||
|
||||
{{< include ../lib/options/_system.qmd >}}
|
||||
|
||||
## Input format options
|
||||
|
||||
The OBITools are centered around the [FASTA] (https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ] (https://en.wikipedia.org/wiki/FASTQ_format) formats. These formats are automaticaly recognized when data are read both from files, and from standard input (`stdin`). Other formats (genbank, EMBL, ecopcr) are also automatically identified when data are read from files, but for stdin input, input format must be indicated using one of the following options.
|
||||
|
||||
|
||||
## Output format options
|
||||
|
||||
{{< include ../lib/options/_output.qmd >}}
|
||||
|
||||
## Paired reads options
|
||||
|
||||
**\--paired-with** _FILENAME_
|
||||
|
||||
**\--paired-mode** _forward|reverse|and|or|andnot|xor_
|
||||
|
||||
## Taxonomy related options
|
||||
|
||||
**\--taxdump** | **-t** _DIRECTORY_
|
||||
|
||||
**\--ignore-taxon** | **-i** _TAXID_
|
||||
|
||||
**\--require-rank** _RANK_NAME_
|
||||
|
||||
**\--restrict-to-taxon** | **-r** _TAXID_
|
||||
|
||||
## Filtering options
|
||||
|
||||
**\--has-attribute** | **-A** _KEY_...
|
||||
|
||||
**\--id-list** _FILENAME_
|
||||
|
||||
**\--identifier** | **-I** _PATTERN_
|
||||
|
||||
{{< include ../lib/options/selection/_max-count.qmd >}}
|
||||
|
||||
{{< include ../lib/options/selection/_min-count.qmd >}}
|
||||
|
||||
**\--max-length** | **-L** _LENGTH_
|
||||
|
||||
**\--min-length** | **-l** _LENGTH_
|
||||
|
||||
**\--predicate**|**-p** _EXPRESSION_
|
||||
|
||||
**\--sequence**|**-s** _PATTERN_
|
||||
|
||||
**\--inverse-match** | **-v**
|
||||
|
||||
**\--save-discarded** _FILENAME_
|
||||
|
||||
# ENVIRONMENT
|
||||
|
||||
**OBICPUMAX**
|
||||
|
||||
# EXAMPLES
|
||||
|
||||
- Filtering sequence file to keep only barcodes between 8 and 130 bp.
|
||||
|
||||
```bash
|
||||
obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
|
||||
```
|
||||
|
||||
- Filtering reads without anbiguity base code in its sequence.
|
||||
|
||||
```bash
|
||||
obigrep -s '^[acgt]+$' data_SPER01.fasta > data_onlyACGT_SPER01.fasta
|
||||
```
|
||||
- Filtering paired files for keeping only pairs of read without ambiguity.
|
||||
|
||||
```bash
|
||||
obigrep -s '^[acgt]+$' \
|
||||
--paired-mode and --paired-with wolf_R.fastq.gz \
|
||||
--out wolf_good.fastq \
|
||||
wolf_F.fastq.gz
|
||||
```
|
||||
|
||||
That command produces two files `wolf_good_R1.fastq` and `wolf_good_R1.fastq`
|
||||
containing respectively the filtered forward and reverse reads.
|
||||
|
||||
# SEE ALSO
|
||||
|
||||
`obiannotate`
|
||||
|
||||
# HISTORY
|
||||
|
||||
# BUGS
|
||||
|
||||
Submit bug reports online at: https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues
|
||||
|
||||
|
Reference in New Issue
Block a user