Start writing of the man pages

Former-commit-id: 533a8a6380a5a0b66314081a5ec2aeb973ebadec
This commit is contained in:
2023-02-23 23:43:54 +01:00
parent c94b2974fb
commit 05323f960c
3 changed files with 433 additions and 0 deletions

239
doc/build/_man/man1/obigrep.man vendored Normal file
View File

@ -0,0 +1,239 @@
.\" Automatically generated by Pandoc 2.19.2
.\"
.\" Define V font for inline verbatim, using C font in formats
.\" that render this, and otherwise B font.
.ie "\f[CB]x\f[]"x" \{\
. ftr V B
. ftr VI BI
. ftr VB B
. ftr VBI BI
.\}
.el \{\
. ftr V CR
. ftr VI CI
. ftr VB CB
. ftr VBI CBI
.\}
.TH "obigrep" "1" "" "" ""
.hy
.SH NAME
.PP
obigrep \[en] filters sequence files according to numerous conditions
.SH SYNOPSIS
.PP
\f[B]obigrep\f[R] [\f[B]--attribute\f[R] | \f[B]-a\f[R]
\f[I]KEY=VALUE\f[R]]\&...
[\f[B]--compress\f[R] | \f[B]-Z\f[R]] [\f[B]--debug\f[R]]
[\f[B]--definition\f[R]|\f[B]-D\f[R] \f[I]PATTERN\f[R]]\&...
.PD 0
.P
.PD
[\f[B]--ecopcr\f[R]] [\f[B]--embl\f[R]] [\f[B]--fasta-output\f[R]]
[\f[B]--fastq-output\f[R]] [\f[B]--genbank\f[R]]
[\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]]\&...
[\f[B]--help\f[R] | \f[B]-h\f[R] | \f[B]-?\f[R]] [\f[B]--id-list\f[R]
\f[I]FILENAME\f[R]] [\f[B]--identifier\f[R] | \f[B]-I\f[R]
\f[I]PATTERN\f[R]]\&...
[\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]]\&...
[\f[B]--input-OBI-header\f[R]] [\f[B]--input-json-header\f[R]]
[\f[B]--inverse-match\f[R] | \f[B]-v\f[R]]
[\f[B]--max-count\f[R]|\f[B]-C\f[R] \f[I]COUNT\f[R]]
[\f[B]--max-cpu\f[R] \f[I]INT\f[R]] [\f[B]--max-length\f[R] |
\f[B]-L\f[R] \f[I]LENGTH\f[R]] [\f[B]--min-count\f[R] | \f[B]-c\f[R]
\f[I]COUNT\f[R]] [\f[B]--min-length\f[R] | \f[B]-l\f[R]
\f[I]LENGTH\f[R]] [\f[B]--no-order\f[R]] [\f[B]--no-progressbar\f[R]]
[\f[B]--out\f[R] | \f[B]-o\f[R] \f[I]FILENAME\f[R]]
[\f[B]--output-OBI-header\f[R] | \f[B]-O\f[R]]
[\f[B]--output-json-header\f[R]] [\f[B]--paired-mode\f[R]
\f[I]forward|reverse|and|or|andnot|xor\f[R]] [\f[B]--paired-with\f[R]
\f[I]FILENAME\f[R]] [\f[B]--predicate\f[R]|\f[B]-p\f[R]
\f[I]EXPRESSION\f[R]]\&...
[\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]]\&...
[\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]]\&...
[\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]]
[\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]]\&...
[\f[B]--solexa\f[R]] [\f[B]--taxdump\f[R] | \f[B]-t\f[R]
\f[I]DIRECTORY\f[R]] [\f[B]--workers\f[R] | \f[B]-w\f[R] \f[I]INT\f[R]]
[\f[I]FILENAMES\f[R]]
.SH DESCRIPTION
.PP
The \f[V]obigrep\f[R] command is somewhat analogous to the standard Unix
\f[V]grep\f[R] command.
It selects a subset of sequence records from a sequence file.
A sequence record is a complex object consisting of an identifier, a set
of attributes (a key, defined by its name, associated with a value), a
definition, and the sequence itself.
Instead of working text line by text line like the standard Unix tool,
\f[V]obigrep\f[R] selection is done sequence record by sequence record.
A large number of options allow you to refine the selection on any
element of the sequence.
\f[V]obigrep\f[R] allows you to specify multiple conditions
simultaneously (which take on the value \f[V]TRUE\f[R] or
\f[V]FALSE\f[R]) and only those sequence records which meet all
conditions (all conditions are \f[V]TRUE\f[R]) are selected.
\f[V]obigrep\f[R] is able to work on two paired read files.
The selection criteria apply to one or the other of the readings in each
pair depending on the mode chosen (\f[B]--paired-mode\f[R] option).
In all cases the selection is applied in the same way to both files,
thus maintaining their consistency.
.SH OPTIONS
.SS General options
.PP
\f[B]Helpful options\f[R]
.TP
\f[B]--help\f[R], \f[B]-h\f[R]
Display a friendly help message.
.PP
\f[B]--no-progressbar\f[R]
.PP
\f[B]Managing parallel execution\f[R]
.TP
\f[B]--max-cpu\f[R]
OBITools V4 are able to run in parallel on all the CPU cores available
on the computer.
It is sometime required to limit the computation to a smaller number of
cores.
That option specify the maximum number of cores that the OBITools
command can use.
This behaviour can also be set up using the \f[V]OBIMAXCPU\f[R]
environment variable.
.PP
\f[B]--workers\f[R], \f[B]-w\f[R]
.PP
\f[B]OBITools debuging related options\f[R]
.PP
\f[B]--debug\f[R]
.SS Input format options
.PP
The OBITools are centered around the [FASTA]
(https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ]
(https://en.wikipedia.org/wiki/FASTQ_format) formats.
These formats are automaticaly recognized when data are read both from
files, and from standard input (\f[V]stdin\f[R]).
Other formats (genbank, EMBL, ecopcr) are also automatically identified
when data are read from files, but for stdin input, input format must be
indicated using one of the following options.
.SS Output format options
.PP
\f[B]--fasta-output\f[R]
.PP
\f[B]--fastq-output\f[R]
.PP
\f[B]--output-OBI-header\f[R], \f[B]-O\f[R]
.PP
\f[B]--output-json-header\f[R]
.TP
\f[B]--out\f[R] \f[I]FILENAME\f[R], \f[B]-o\f[R]
OBITools, as all standard UNIX tools, print their results to the
standard output (\f[V]stdout\f[R]).
To save them, stdout must be redirected to a file.
That option allows to specify explicitely an output file to the command.
This is especially useful when OBITools are processing paired files.
In that later case, the indicated output file names is modified by
adding to it the \f[I]_R1\f[R] (forward file) and \f[I]_R2\f[R] (reverse
file) suffix just before the extensions (\f[I]e.g.\f[R] sequence.fasta
becomes sequence_R1.fasta and sequence_R2.fasta).
If that option is not specified and paired files are processed only the
forward data are ouputed to the \f[I]stdout\f[R].
.TP
\f[B]--compress\f[R], \f[B]-Z\f[R]
The ouput is compressed following the
gzip (https://en.wikipedia.org/wiki/Gzip) format.
.SS Paired reads options
.PP
\f[B]--paired-with\f[R] \f[I]FILENAME\f[R]
.PP
\f[B]--paired-mode\f[R] \f[I]forward|reverse|and|or|andnot|xor\f[R]
.SS Taxonomy related options
.PP
\f[B]--taxdump\f[R] | \f[B]-t\f[R] \f[I]DIRECTORY\f[R]
.PP
\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]
.PP
\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]
.PP
\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]
.SS Filtering options
.PP
\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]\&...
.PP
\f[B]--id-list\f[R] \f[I]FILENAME\f[R]
.PP
\f[B]--identifier\f[R] | \f[B]-I\f[R] \f[I]PATTERN\f[R]
.TP
\f[B]--max-count\f[R] | \f[B]-C\f[R] \f[I]COUNT\f[R]
only sequences reprensenting no more than \f[I]COUNT\f[R] reads will be
selected.
That option rely on the \f[V]count\f[R] attribute.
If the \f[V]count\f[R] attribute is not defined for a sequence record,
it is assumed equal to 1.
.TP
\f[B]--min-count\f[R] | \f[B]-c\f[R] \f[I]COUNT\f[R]
only sequences reprensenting at least \f[I]COUNT\f[R] reads will be
selected.
That option rely on the \f[V]count\f[R] attribute.
If the \f[V]count\f[R] attribute is not defined for a sequence record,
it is assumed equal to 1.
.PP
\f[B]--max-length\f[R] | \f[B]-L\f[R] \f[I]LENGTH\f[R]
.PP
\f[B]--min-length\f[R] | \f[B]-l\f[R] \f[I]LENGTH\f[R]
.PP
\f[B]--predicate\f[R]|\f[B]-p\f[R] \f[I]EXPRESSION\f[R]
.PP
\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]
.PP
\f[B]--inverse-match\f[R] | \f[B]-v\f[R]
.PP
\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]
.SH ENVIRONMENT
.PP
\f[B]OBICPUMAX\f[R]
.SH EXAMPLES
.IP \[bu] 2
Filtering sequence file to keep only barcodes between 8 and 130 bp.
.RS 2
.IP
.nf
\f[C]
obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
\f[R]
.fi
.RE
.IP \[bu] 2
Filtering reads without anbiguity base code in its sequence.
.RS 2
.IP
.nf
\f[C]
obigrep -s \[aq]\[ha][acgt]+$\[aq] data_SPER01.fasta > data_onlyACGT_SPER01.fasta
\f[R]
.fi
.RE
.IP \[bu] 2
Filtering paired files for keeping only pairs of read without ambiguity.
.RS 2
.IP
.nf
\f[C]
obigrep -s \[aq]\[ha][acgt]+$\[aq] \[rs]
--paired-mode and --paired-with wolf_R.fastq.gz \[rs]
--out wolf_good.fastq \[rs]
wolf_F.fastq.gz
\f[R]
.fi
.PP
That command produces two files \f[V]wolf_good_R1.fastq\f[R] and
\f[V]wolf_good_R1.fastq\f[R] containing respectively the filtered
forward and reverse reads.
.RE
.SH SEE ALSO
.PP
\f[V]obiannotate\f[R]
.SH HISTORY
.SH BUGS
.PP
Submit bug reports online at:
https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues
.SH AUTHORS
Eric Coissac <eric.coissac@metabarcoding.org>.

41
doc/man/Makefile Normal file
View File

@ -0,0 +1,41 @@
MANPAGES= obigrep
BUILDDIR=../build
MANDIR=$(BUILDDIR)/_man
MANDEST=$(MANDIR)/man1
HTMLDEST=$(MANDIR)/html
MANSRC=$(MANPAGES:=.qmd)
DEPS=$(patsubst %,depends/%,$(MANPAGES:=.d))
MAN=$(patsubst %,$(MANDEST)/%,$(MANSRC:.qmd=.man))
all: $(MAN)
clean:
rm -f $(MAN)
rm -rf depends
.PHONY: all
$(MANDEST):
@echo Creating $@ directory
@mkdir -p $@
$(MAN) : $(MANDEST)/%.man : %.qmd $(MANDEST)
@echo "Rendering the man page for " $(notdir $(@:.man=))
@quarto render $< --to man
@mv $(notdir $@) $@
@echo =====================================================
@echo
depends/%.d: %.qmd
@mkdir -p depends
@echo Generating depends file for $(notdir $(@:.qmd=))
@awk -v src=$< 'BEGIN {printf("%s: ",src)} \
/\{\{< *include *[^>]+>\}\}/ {sub(/^ *\{\{< *include */,"",$$0); \
sub(/ *> *\}\} */,"",$$0); \
printf("%s ",$$0)}' $< > $@
-include $(DEPS)

153
doc/man/obigrep.qmd Normal file
View File

@ -0,0 +1,153 @@
---
title: "obigrep"
section: 1
author: Eric Coissac <eric.coissac@metabarcoding.org>
format:
html: default
man: default
---
# NAME
obigrep -- filters sequence files according to numerous conditions
# SYNOPSIS
**obigrep** \[**\--attribute** | **-a** _KEY=VALUE_]...
\[**\--compress** | **-Z**]
\[**\--debug**]
\[**\--definition**|**-D** _PATTERN_]...
\[**\--ecopcr**]
\[**\--embl**]
\[**\--fasta-output**]
\[**\--fastq-output**]
\[**\--genbank**]
\[**\--has-attribute** | **-A** _KEY_]...
\[**\--help** | **-h** | **-?**]
\[**\--id-list** _FILENAME_]
\[**\--identifier** | **-I** _PATTERN_]...
\[**\--ignore-taxon** | **-i** _TAXID_]...
\[**\--input-OBI-header**]
\[**\--input-json-header**]
\[**\--inverse-match** | **-v**]
\[**\--max-count**|**-C** _COUNT_]
\[**\--max-cpu** _INT_]
\[**\--max-length** | **-L** _LENGTH_]
\[**\--min-count** | **-c** _COUNT_]
\[**\--min-length** | **-l** _LENGTH_]
\[**\--no-order**]
\[**\--no-progressbar**]
\[**\--out** | **-o** _FILENAME_]
\[**\--output-OBI-header** | **-O**]
\[**\--output-json-header**]
\[**\--paired-mode** _forward|reverse|and|or|andnot|xor_]
\[**\--paired-with** _FILENAME_]
\[**\--predicate**|**-p** _EXPRESSION_]...
\[**\--require-rank** _RANK_NAME_]...
\[**\--restrict-to-taxon** | **-r** _TAXID_]...
\[**\--save-discarded** _FILENAME_]
\[**\--sequence**|**-s** _PATTERN_]...
\[**\--solexa**]
\[**\--taxdump** | **-t** _DIRECTORY_]
\[**\--workers** | **-w** _INT_] [_FILENAMES_]
# DESCRIPTION
{{< include ../lib/descriptions/_obigrep.qmd >}}
# OPTIONS
## General options
{{< include ../lib/options/_system.qmd >}}
## Input format options
The OBITools are centered around the [FASTA] (https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ] (https://en.wikipedia.org/wiki/FASTQ_format) formats. These formats are automaticaly recognized when data are read both from files, and from standard input (`stdin`). Other formats (genbank, EMBL, ecopcr) are also automatically identified when data are read from files, but for stdin input, input format must be indicated using one of the following options.
## Output format options
{{< include ../lib/options/_output.qmd >}}
## Paired reads options
**\--paired-with** _FILENAME_
**\--paired-mode** _forward|reverse|and|or|andnot|xor_
## Taxonomy related options
**\--taxdump** | **-t** _DIRECTORY_
**\--ignore-taxon** | **-i** _TAXID_
**\--require-rank** _RANK_NAME_
**\--restrict-to-taxon** | **-r** _TAXID_
## Filtering options
**\--has-attribute** | **-A** _KEY_...
**\--id-list** _FILENAME_
**\--identifier** | **-I** _PATTERN_
{{< include ../lib/options/selection/_max-count.qmd >}}
{{< include ../lib/options/selection/_min-count.qmd >}}
**\--max-length** | **-L** _LENGTH_
**\--min-length** | **-l** _LENGTH_
**\--predicate**|**-p** _EXPRESSION_
**\--sequence**|**-s** _PATTERN_
**\--inverse-match** | **-v**
**\--save-discarded** _FILENAME_
# ENVIRONMENT
**OBICPUMAX**
# EXAMPLES
- Filtering sequence file to keep only barcodes between 8 and 130 bp.
```bash
obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
```
- Filtering reads without anbiguity base code in its sequence.
```bash
obigrep -s '^[acgt]+$' data_SPER01.fasta > data_onlyACGT_SPER01.fasta
```
- Filtering paired files for keeping only pairs of read without ambiguity.
```bash
obigrep -s '^[acgt]+$' \
--paired-mode and --paired-with wolf_R.fastq.gz \
--out wolf_good.fastq \
wolf_F.fastq.gz
```
That command produces two files `wolf_good_R1.fastq` and `wolf_good_R1.fastq`
containing respectively the filtered forward and reverse reads.
# SEE ALSO
`obiannotate`
# HISTORY
# BUGS
Submit bug reports online at: https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues