Start writing of the man pages

Former-commit-id: 533a8a6380a5a0b66314081a5ec2aeb973ebadec
2025-06-29 16:20:46 +00:00 · 2023-02-23 23:43:54 +01:00
parent c94b2974fb
commit 05323f960c
3 changed files with 433 additions and 0 deletions
--- a/doc/build/_man/man1/obigrep.man
+++ b/doc/build/_man/man1/obigrep.man
@ -0,0 +1,239 @@
+.\" Automatically generated by Pandoc 2.19.2
+.\"
+.\" Define V font for inline verbatim, using C font in formats
+.\" that render this, and otherwise B font.
+.ie "\f[CB]x\f[]"x" \{\
+. ftr V B
+. ftr VI BI
+. ftr VB B
+. ftr VBI BI
+.\}
+.el \{\
+. ftr V CR
+. ftr VI CI
+. ftr VB CB
+. ftr VBI CBI
+.\}
+.TH "obigrep" "1" "" "" ""
+.hy
+.SH NAME
+.PP
+obigrep \[en] filters sequence files according to numerous conditions
+.SH SYNOPSIS
+.PP
+\f[B]obigrep\f[R] [\f[B]--attribute\f[R] | \f[B]-a\f[R]
+\f[I]KEY=VALUE\f[R]]\&...
+[\f[B]--compress\f[R] | \f[B]-Z\f[R]] [\f[B]--debug\f[R]]
+[\f[B]--definition\f[R]|\f[B]-D\f[R] \f[I]PATTERN\f[R]]\&...
+.PD 0
+.P
+.PD
+[\f[B]--ecopcr\f[R]] [\f[B]--embl\f[R]] [\f[B]--fasta-output\f[R]]
+[\f[B]--fastq-output\f[R]] [\f[B]--genbank\f[R]]
+[\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]]\&...
+[\f[B]--help\f[R] | \f[B]-h\f[R] | \f[B]-?\f[R]] [\f[B]--id-list\f[R]
+\f[I]FILENAME\f[R]] [\f[B]--identifier\f[R] | \f[B]-I\f[R]
+\f[I]PATTERN\f[R]]\&...
+[\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]]\&...
+[\f[B]--input-OBI-header\f[R]] [\f[B]--input-json-header\f[R]]
+[\f[B]--inverse-match\f[R] | \f[B]-v\f[R]]
+[\f[B]--max-count\f[R]|\f[B]-C\f[R] \f[I]COUNT\f[R]]
+[\f[B]--max-cpu\f[R] \f[I]INT\f[R]] [\f[B]--max-length\f[R] |
+\f[B]-L\f[R] \f[I]LENGTH\f[R]] [\f[B]--min-count\f[R] | \f[B]-c\f[R]
+\f[I]COUNT\f[R]] [\f[B]--min-length\f[R] | \f[B]-l\f[R]
+\f[I]LENGTH\f[R]] [\f[B]--no-order\f[R]] [\f[B]--no-progressbar\f[R]]
+[\f[B]--out\f[R] | \f[B]-o\f[R] \f[I]FILENAME\f[R]]
+[\f[B]--output-OBI-header\f[R] | \f[B]-O\f[R]]
+[\f[B]--output-json-header\f[R]] [\f[B]--paired-mode\f[R]
+\f[I]forward|reverse|and|or|andnot|xor\f[R]] [\f[B]--paired-with\f[R]
+\f[I]FILENAME\f[R]] [\f[B]--predicate\f[R]|\f[B]-p\f[R]
+\f[I]EXPRESSION\f[R]]\&...
+[\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]]\&...
+[\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]]\&...
+[\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]]
+[\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]]\&...
+[\f[B]--solexa\f[R]] [\f[B]--taxdump\f[R] | \f[B]-t\f[R]
+\f[I]DIRECTORY\f[R]] [\f[B]--workers\f[R] | \f[B]-w\f[R] \f[I]INT\f[R]]
+[\f[I]FILENAMES\f[R]]
+.SH DESCRIPTION
+.PP
+The \f[V]obigrep\f[R] command is somewhat analogous to the standard Unix
+\f[V]grep\f[R] command.
+It selects a subset of sequence records from a sequence file.
+A sequence record is a complex object consisting of an identifier, a set
+of attributes (a key, defined by its name, associated with a value), a
+definition, and the sequence itself.
+Instead of working text line by text line like the standard Unix tool,
+\f[V]obigrep\f[R] selection is done sequence record by sequence record.
+A large number of options allow you to refine the selection on any
+element of the sequence.
+\f[V]obigrep\f[R] allows you to specify multiple conditions
+simultaneously (which take on the value \f[V]TRUE\f[R] or
+\f[V]FALSE\f[R]) and only those sequence records which meet all
+conditions (all conditions are \f[V]TRUE\f[R]) are selected.
+\f[V]obigrep\f[R] is able to work on two paired read files.
+The selection criteria apply to one or the other of the readings in each
+pair depending on the mode chosen (\f[B]--paired-mode\f[R] option).
+In all cases the selection is applied in the same way to both files,
+thus maintaining their consistency.
+.SH OPTIONS
+.SS General options
+.PP
+\f[B]Helpful options\f[R]
+.TP
+\f[B]--help\f[R], \f[B]-h\f[R]
+Display a friendly help message.
+.PP
+\f[B]--no-progressbar\f[R]
+.PP
+\f[B]Managing parallel execution\f[R]
+.TP
+\f[B]--max-cpu\f[R]
+OBITools V4 are able to run in parallel on all the CPU cores available
+on the computer.
+It is sometime required to limit the computation to a smaller number of
+cores.
+That option specify the maximum number of cores that the OBITools
+command can use.
+This behaviour can also be set up using the \f[V]OBIMAXCPU\f[R]
+environment variable.
+.PP
+\f[B]--workers\f[R], \f[B]-w\f[R]
+.PP
+\f[B]OBITools debuging related options\f[R]
+.PP
+\f[B]--debug\f[R]
+.SS Input format options
+.PP
+The OBITools are centered around the [FASTA]
+(https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ]
+(https://en.wikipedia.org/wiki/FASTQ_format) formats.
+These formats are automaticaly recognized when data are read both from
+files, and from standard input (\f[V]stdin\f[R]).
+Other formats (genbank, EMBL, ecopcr) are also automatically identified
+when data are read from files, but for stdin input, input format must be
+indicated using one of the following options.
+.SS Output format options
+.PP
+\f[B]--fasta-output\f[R]
+.PP
+\f[B]--fastq-output\f[R]
+.PP
+\f[B]--output-OBI-header\f[R], \f[B]-O\f[R]
+.PP
+\f[B]--output-json-header\f[R]
+.TP
+\f[B]--out\f[R] \f[I]FILENAME\f[R], \f[B]-o\f[R]
+OBITools, as all standard UNIX tools, print their results to the
+standard output (\f[V]stdout\f[R]).
+To save them, stdout must be redirected to a file.
+That option allows to specify explicitely an output file to the command.
+This is especially useful when OBITools are processing paired files.
+In that later case, the indicated output file names is modified by
+adding to it the \f[I]_R1\f[R] (forward file) and \f[I]_R2\f[R] (reverse
+file) suffix just before the extensions (\f[I]e.g.\f[R] sequence.fasta
+becomes sequence_R1.fasta and sequence_R2.fasta).
+If that option is not specified and paired files are processed only the
+forward data are ouputed to the \f[I]stdout\f[R].
+.TP
+\f[B]--compress\f[R], \f[B]-Z\f[R]
+The ouput is compressed following the
+gzip (https://en.wikipedia.org/wiki/Gzip) format.
+.SS Paired reads options
+.PP
+\f[B]--paired-with\f[R] \f[I]FILENAME\f[R]
+.PP
+\f[B]--paired-mode\f[R] \f[I]forward|reverse|and|or|andnot|xor\f[R]
+.SS Taxonomy related options
+.PP
+\f[B]--taxdump\f[R] | \f[B]-t\f[R] \f[I]DIRECTORY\f[R]
+.PP
+\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]
+.PP
+\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]
+.PP
+\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]
+.SS Filtering options
+.PP
+\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]\&...
+.PP
+\f[B]--id-list\f[R] \f[I]FILENAME\f[R]
+.PP
+\f[B]--identifier\f[R] | \f[B]-I\f[R] \f[I]PATTERN\f[R]
+.TP
+\f[B]--max-count\f[R] | \f[B]-C\f[R] \f[I]COUNT\f[R]
+only sequences reprensenting no more than \f[I]COUNT\f[R] reads will be
+selected.
+That option rely on the \f[V]count\f[R] attribute.
+If the \f[V]count\f[R] attribute is not defined for a sequence record,
+it is assumed equal to 1.
+.TP
+\f[B]--min-count\f[R] | \f[B]-c\f[R] \f[I]COUNT\f[R]
+only sequences reprensenting at least \f[I]COUNT\f[R] reads will be
+selected.
+That option rely on the \f[V]count\f[R] attribute.
+If the \f[V]count\f[R] attribute is not defined for a sequence record,
+it is assumed equal to 1.
+.PP
+\f[B]--max-length\f[R] | \f[B]-L\f[R] \f[I]LENGTH\f[R]
+.PP
+\f[B]--min-length\f[R] | \f[B]-l\f[R] \f[I]LENGTH\f[R]
+.PP
+\f[B]--predicate\f[R]|\f[B]-p\f[R] \f[I]EXPRESSION\f[R]
+.PP
+\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]
+.PP
+\f[B]--inverse-match\f[R] | \f[B]-v\f[R]
+.PP
+\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]
+.SH ENVIRONMENT
+.PP
+\f[B]OBICPUMAX\f[R]
+.SH EXAMPLES
+.IP \[bu] 2
+Filtering sequence file to keep only barcodes between 8 and 130 bp.
+.RS 2
+.IP
+.nf
+\f[C]
+obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
+\f[R]
+.fi
+.RE
+.IP \[bu] 2
+Filtering reads without anbiguity base code in its sequence.
+.RS 2
+.IP
+.nf
+\f[C]
+obigrep -s \[aq]\[ha][acgt]+$\[aq] data_SPER01.fasta > data_onlyACGT_SPER01.fasta
+\f[R]
+.fi
+.RE
+.IP \[bu] 2
+Filtering paired files for keeping only pairs of read without ambiguity.
+.RS 2
+.IP
+.nf
+\f[C]
+obigrep  -s \[aq]\[ha][acgt]+$\[aq] \[rs]
+         --paired-mode and --paired-with wolf_R.fastq.gz \[rs]
+         --out wolf_good.fastq \[rs]
+         wolf_F.fastq.gz
+\f[R]
+.fi
+.PP
+That command produces two files \f[V]wolf_good_R1.fastq\f[R] and
+\f[V]wolf_good_R1.fastq\f[R] containing respectively the filtered
+forward and reverse reads.
+.RE
+.SH SEE ALSO
+.PP
+\f[V]obiannotate\f[R]
+.SH HISTORY
+.SH BUGS
+.PP
+Submit bug reports online at:
+https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues
+.SH AUTHORS
+Eric Coissac <eric.coissac@metabarcoding.org>.
--- a/doc/man/Makefile
+++ b/doc/man/Makefile
@ -0,0 +1,41 @@
+MANPAGES= obigrep 
+
+BUILDDIR=../build
+MANDIR=$(BUILDDIR)/_man
+MANDEST=$(MANDIR)/man1
+HTMLDEST=$(MANDIR)/html
+
+MANSRC=$(MANPAGES:=.qmd)
+DEPS=$(patsubst %,depends/%,$(MANPAGES:=.d))
+MAN=$(patsubst %,$(MANDEST)/%,$(MANSRC:.qmd=.man))
+
+
+
+all: $(MAN) 
+
+clean:
+	rm -f $(MAN) 
+	rm -rf depends
+
+.PHONY: all
+
+$(MANDEST):
+	@echo Creating $@ directory
+	@mkdir -p $@
+
+$(MAN) : $(MANDEST)/%.man : %.qmd $(MANDEST)
+	@echo "Rendering the man page for " $(notdir $(@:.man=))
+	@quarto render  $< --to man 
+	@mv $(notdir $@) $@
+	@echo =====================================================
+	@echo
+
+depends/%.d: %.qmd
+	@mkdir -p depends
+	@echo Generating depends file for $(notdir $(@:.qmd=))
+	@awk -v src=$< 'BEGIN {printf("%s: ",src)} \
+	                /\{\{< *include *[^>]+>\}\}/ {sub(/^ *\{\{< *include */,"",$$0); \
+										sub(/ *> *\}\} */,"",$$0); \
+										printf("%s ",$$0)}' $< > $@
+
+-include $(DEPS)										
--- a/doc/man/obigrep.qmd
+++ b/doc/man/obigrep.qmd
@ -0,0 +1,153 @@
+---
+title: "obigrep"
+section: 1
+author: Eric Coissac <eric.coissac@metabarcoding.org>
+format:  
+  html: default
+  man: default
+---
+
+# NAME
+
+obigrep -- filters sequence files according to numerous conditions
+
+# SYNOPSIS
+
+
+**obigrep**     \[**\--attribute** | **-a** _KEY=VALUE_]... 
+                \[**\--compress** | **-Z**] 
+                \[**\--debug**] 
+                \[**\--definition**|**-D** _PATTERN_]...  
+                \[**\--ecopcr**] 
+                \[**\--embl**] 
+                \[**\--fasta-output**]
+                \[**\--fastq-output**] 
+                \[**\--genbank**] 
+                \[**\--has-attribute** | **-A** _KEY_]...
+                \[**\--help** | **-h** | **-?**] 
+                \[**\--id-list** _FILENAME_] 
+                \[**\--identifier** | **-I** _PATTERN_]...
+                \[**\--ignore-taxon** | **-i** _TAXID_]... 
+                \[**\--input-OBI-header**]
+                \[**\--input-json-header**] 
+                \[**\--inverse-match** | **-v**] 
+                \[**\--max-count**|**-C** _COUNT_]
+                \[**\--max-cpu** _INT_] 
+                \[**\--max-length** | **-L** _LENGTH_] 
+                \[**\--min-count** | **-c** _COUNT_]
+                \[**\--min-length** | **-l** _LENGTH_] 
+                \[**\--no-order**] 
+                \[**\--no-progressbar**]
+                \[**\--out** | **-o** _FILENAME_] 
+                \[**\--output-OBI-header** | **-O**] 
+                \[**\--output-json-header**]
+                \[**\--paired-mode** _forward|reverse|and|or|andnot|xor_]
+                \[**\--paired-with** _FILENAME_] 
+                \[**\--predicate**|**-p** _EXPRESSION_]...
+                \[**\--require-rank** _RANK_NAME_]... 
+                \[**\--restrict-to-taxon** | **-r** _TAXID_]...
+                \[**\--save-discarded** _FILENAME_] 
+                \[**\--sequence**|**-s** _PATTERN_]... 
+                \[**\--solexa**]
+                \[**\--taxdump** | **-t** _DIRECTORY_] 
+                \[**\--workers** | **-w** _INT_] [_FILENAMES_]
+
+# DESCRIPTION
+
+{{< include ../lib/descriptions/_obigrep.qmd >}}
+
+# OPTIONS
+
+## General options
+
+{{< include ../lib/options/_system.qmd >}}
+
+## Input format options
+
+The OBITools are centered around the [FASTA] (https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ] (https://en.wikipedia.org/wiki/FASTQ_format) formats. These formats are automaticaly recognized when data are read both from files, and from standard input (`stdin`). Other formats (genbank, EMBL, ecopcr) are also automatically identified when data are read from files, but for stdin input, input format must be indicated using one of the following options.
+
+
+## Output format options
+
+{{< include ../lib/options/_output.qmd >}}
+
+## Paired reads options
+
+**\--paired-with** _FILENAME_
+
+**\--paired-mode** _forward|reverse|and|or|andnot|xor_
+
+## Taxonomy related options
+
+**\--taxdump** | **-t** _DIRECTORY_
+
+**\--ignore-taxon** | **-i** _TAXID_
+
+**\--require-rank** _RANK_NAME_
+
+**\--restrict-to-taxon** | **-r** _TAXID_
+
+## Filtering options
+
+**\--has-attribute** | **-A** _KEY_...
+
+**\--id-list** _FILENAME_ 
+
+**\--identifier** | **-I** _PATTERN_
+
+{{< include ../lib/options/selection/_max-count.qmd >}}
+
+{{< include ../lib/options/selection/_min-count.qmd >}}
+
+**\--max-length** | **-L** _LENGTH_
+
+**\--min-length** | **-l** _LENGTH_
+
+**\--predicate**|**-p** _EXPRESSION_
+
+**\--sequence**|**-s** _PATTERN_
+
+**\--inverse-match** | **-v**
+
+**\--save-discarded** _FILENAME_
+
+# ENVIRONMENT
+
+**OBICPUMAX**
+
+# EXAMPLES
+
+- Filtering sequence file to keep only barcodes between 8 and 130 bp.
+
+  ```bash
+  obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
+  ```
+
+- Filtering reads without anbiguity base code in its sequence.
+
+  ```bash
+  obigrep -s '^[acgt]+$' data_SPER01.fasta > data_onlyACGT_SPER01.fasta
+  ```  
+- Filtering paired files for keeping only pairs of read without ambiguity.
+
+  ```bash
+  obigrep  -s '^[acgt]+$' \
+           --paired-mode and --paired-with wolf_R.fastq.gz \
+           --out wolf_good.fastq \
+           wolf_F.fastq.gz
+  ```
+
+  That command produces two files `wolf_good_R1.fastq` and `wolf_good_R1.fastq`
+  containing respectively the filtered forward and reverse reads.
+
+# SEE ALSO
+
+`obiannotate`
+
+# HISTORY
+
+# BUGS
+
+Submit bug reports online at: https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues
+
+