Start writing of the man pages

Former-commit-id: 533a8a6380a5a0b66314081a5ec2aeb973ebadec
2025-06-29 16:20:46 +00:00 · 2023-02-23 23:43:54 +01:00
parent c94b2974fb
commit 05323f960c
3 changed files with 433 additions and 0 deletions
--- a/doc/build/_man/man1/obigrep.man
+++ b/doc/build/_man/man1/obigrep.man
@ -0,0 +1,239 @@
+.\" Automatically generated by Pandoc 2.19.2
+.\"
+.\" Define V font for inline verbatim, using C font in formats
+.\" that render this, and otherwise B font.
+.ie "\f[CB]x\f[]"x" \{\
+. ftr V B
+. ftr VI BI
+. ftr VB B
+. ftr VBI BI
+.\}
+.el \{\
+. ftr V CR
+. ftr VI CI
+. ftr VB CB
+. ftr VBI CBI
+.\}
+.TH "obigrep" "1" "" "" ""
+.hy
+.SH NAME
+.PP
+obigrep \[en] filters sequence files according to numerous conditions
+.SH SYNOPSIS
+.PP
+\f[B]obigrep\f[R] [\f[B]--attribute\f[R] | \f[B]-a\f[R]
+\f[I]KEY=VALUE\f[R]]\&...
+[\f[B]--compress\f[R] | \f[B]-Z\f[R]] [\f[B]--debug\f[R]]
+[\f[B]--definition\f[R]|\f[B]-D\f[R] \f[I]PATTERN\f[R]]\&...
+.PD 0
+.P
+.PD
+[\f[B]--ecopcr\f[R]] [\f[B]--embl\f[R]] [\f[B]--fasta-output\f[R]]
+[\f[B]--fastq-output\f[R]] [\f[B]--genbank\f[R]]
+[\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]]\&...
+[\f[B]--help\f[R] | \f[B]-h\f[R] | \f[B]-?\f[R]] [\f[B]--id-list\f[R]
+\f[I]FILENAME\f[R]] [\f[B]--identifier\f[R] | \f[B]-I\f[R]
+\f[I]PATTERN\f[R]]\&...
+[\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]]\&...
+[\f[B]--input-OBI-header\f[R]] [\f[B]--input-json-header\f[R]]
+[\f[B]--inverse-match\f[R] | \f[B]-v\f[R]]
+[\f[B]--max-count\f[R]|\f[B]-C\f[R] \f[I]COUNT\f[R]]
+[\f[B]--max-cpu\f[R] \f[I]INT\f[R]] [\f[B]--max-length\f[R] |
+\f[B]-L\f[R] \f[I]LENGTH\f[R]] [\f[B]--min-count\f[R] | \f[B]-c\f[R]
+\f[I]COUNT\f[R]] [\f[B]--min-length\f[R] | \f[B]-l\f[R]
+\f[I]LENGTH\f[R]] [\f[B]--no-order\f[R]] [\f[B]--no-progressbar\f[R]]
+[\f[B]--out\f[R] | \f[B]-o\f[R] \f[I]FILENAME\f[R]]
+[\f[B]--output-OBI-header\f[R] | \f[B]-O\f[R]]
+[\f[B]--output-json-header\f[R]] [\f[B]--paired-mode\f[R]
+\f[I]forward|reverse|and|or|andnot|xor\f[R]] [\f[B]--paired-with\f[R]
+\f[I]FILENAME\f[R]] [\f[B]--predicate\f[R]|\f[B]-p\f[R]
+\f[I]EXPRESSION\f[R]]\&...
+[\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]]\&...
+[\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]]\&...
+[\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]]
+[\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]]\&...
+[\f[B]--solexa\f[R]] [\f[B]--taxdump\f[R] | \f[B]-t\f[R]
+\f[I]DIRECTORY\f[R]] [\f[B]--workers\f[R] | \f[B]-w\f[R] \f[I]INT\f[R]]
+[\f[I]FILENAMES\f[R]]
+.SH DESCRIPTION
+.PP
+The \f[V]obigrep\f[R] command is somewhat analogous to the standard Unix
+\f[V]grep\f[R] command.
+It selects a subset of sequence records from a sequence file.
+A sequence record is a complex object consisting of an identifier, a set
+of attributes (a key, defined by its name, associated with a value), a
+definition, and the sequence itself.
+Instead of working text line by text line like the standard Unix tool,
+\f[V]obigrep\f[R] selection is done sequence record by sequence record.
+A large number of options allow you to refine the selection on any
+element of the sequence.
+\f[V]obigrep\f[R] allows you to specify multiple conditions
+simultaneously (which take on the value \f[V]TRUE\f[R] or
+\f[V]FALSE\f[R]) and only those sequence records which meet all
+conditions (all conditions are \f[V]TRUE\f[R]) are selected.
+\f[V]obigrep\f[R] is able to work on two paired read files.
+The selection criteria apply to one or the other of the readings in each
+pair depending on the mode chosen (\f[B]--paired-mode\f[R] option).
+In all cases the selection is applied in the same way to both files,
+thus maintaining their consistency.
+.SH OPTIONS
+.SS General options
+.PP
+\f[B]Helpful options\f[R]
+.TP
+\f[B]--help\f[R], \f[B]-h\f[R]
+Display a friendly help message.
+.PP
+\f[B]--no-progressbar\f[R]
+.PP
+\f[B]Managing parallel execution\f[R]
+.TP
+\f[B]--max-cpu\f[R]
+OBITools V4 are able to run in parallel on all the CPU cores available
+on the computer.
+It is sometime required to limit the computation to a smaller number of
+cores.
+That option specify the maximum number of cores that the OBITools
+command can use.
+This behaviour can also be set up using the \f[V]OBIMAXCPU\f[R]
+environment variable.
+.PP
+\f[B]--workers\f[R], \f[B]-w\f[R]
+.PP
+\f[B]OBITools debuging related options\f[R]
+.PP
+\f[B]--debug\f[R]
+.SS Input format options
+.PP
+The OBITools are centered around the [FASTA]
+(https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ]
+(https://en.wikipedia.org/wiki/FASTQ_format) formats.
+These formats are automaticaly recognized when data are read both from
+files, and from standard input (\f[V]stdin\f[R]).
+Other formats (genbank, EMBL, ecopcr) are also automatically identified
+when data are read from files, but for stdin input, input format must be
+indicated using one of the following options.
+.SS Output format options
+.PP
+\f[B]--fasta-output\f[R]
+.PP
+\f[B]--fastq-output\f[R]
+.PP
+\f[B]--output-OBI-header\f[R], \f[B]-O\f[R]
+.PP
+\f[B]--output-json-header\f[R]
+.TP
+\f[B]--out\f[R] \f[I]FILENAME\f[R], \f[B]-o\f[R]
+OBITools, as all standard UNIX tools, print their results to the
+standard output (\f[V]stdout\f[R]).
+To save them, stdout must be redirected to a file.
+That option allows to specify explicitely an output file to the command.
+This is especially useful when OBITools are processing paired files.
+In that later case, the indicated output file names is modified by
+adding to it the \f[I]_R1\f[R] (forward file) and \f[I]_R2\f[R] (reverse
+file) suffix just before the extensions (\f[I]e.g.\f[R] sequence.fasta
+becomes sequence_R1.fasta and sequence_R2.fasta).
+If that option is not specified and paired files are processed only the
+forward data are ouputed to the \f[I]stdout\f[R].
+.TP
+\f[B]--compress\f[R], \f[B]-Z\f[R]
+The ouput is compressed following the
+gzip (https://en.wikipedia.org/wiki/Gzip) format.
+.SS Paired reads options
+.PP
+\f[B]--paired-with\f[R] \f[I]FILENAME\f[R]
+.PP
+\f[B]--paired-mode\f[R] \f[I]forward|reverse|and|or|andnot|xor\f[R]
+.SS Taxonomy related options
+.PP
+\f[B]--taxdump\f[R] | \f[B]-t\f[R] \f[I]DIRECTORY\f[R]
+.PP
+\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]
+.PP
+\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]
+.PP
+\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]
+.SS Filtering options
+.PP
+\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]\&...
+.PP
+\f[B]--id-list\f[R] \f[I]FILENAME\f[R]
+.PP
+\f[B]--identifier\f[R] | \f[B]-I\f[R] \f[I]PATTERN\f[R]
+.TP
+\f[B]--max-count\f[R] | \f[B]-C\f[R] \f[I]COUNT\f[R]
+only sequences reprensenting no more than \f[I]COUNT\f[R] reads will be
+selected.
+That option rely on the \f[V]count\f[R] attribute.
+If the \f[V]count\f[R] attribute is not defined for a sequence record,
+it is assumed equal to 1.
+.TP
+\f[B]--min-count\f[R] | \f[B]-c\f[R] \f[I]COUNT\f[R]
+only sequences reprensenting at least \f[I]COUNT\f[R] reads will be
+selected.
+That option rely on the \f[V]count\f[R] attribute.
+If the \f[V]count\f[R] attribute is not defined for a sequence record,
+it is assumed equal to 1.
+.PP
+\f[B]--max-length\f[R] | \f[B]-L\f[R] \f[I]LENGTH\f[R]
+.PP
+\f[B]--min-length\f[R] | \f[B]-l\f[R] \f[I]LENGTH\f[R]
+.PP
+\f[B]--predicate\f[R]|\f[B]-p\f[R] \f[I]EXPRESSION\f[R]
+.PP
+\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]
+.PP
+\f[B]--inverse-match\f[R] | \f[B]-v\f[R]
+.PP
+\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]
+.SH ENVIRONMENT
+.PP
+\f[B]OBICPUMAX\f[R]
+.SH EXAMPLES
+.IP \[bu] 2
+Filtering sequence file to keep only barcodes between 8 and 130 bp.
+.RS 2
+.IP
+.nf
+\f[C]
+obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
+\f[R]
+.fi
+.RE
+.IP \[bu] 2
+Filtering reads without anbiguity base code in its sequence.
+.RS 2
+.IP
+.nf
+\f[C]
+obigrep -s \[aq]\[ha][acgt]+$\[aq] data_SPER01.fasta > data_onlyACGT_SPER01.fasta
+\f[R]
+.fi
+.RE
+.IP \[bu] 2
+Filtering paired files for keeping only pairs of read without ambiguity.
+.RS 2
+.IP
+.nf
+\f[C]
+obigrep  -s \[aq]\[ha][acgt]+$\[aq] \[rs]
+         --paired-mode and --paired-with wolf_R.fastq.gz \[rs]
+         --out wolf_good.fastq \[rs]
+         wolf_F.fastq.gz
+\f[R]
+.fi
+.PP
+That command produces two files \f[V]wolf_good_R1.fastq\f[R] and
+\f[V]wolf_good_R1.fastq\f[R] containing respectively the filtered
+forward and reverse reads.
+.RE
+.SH SEE ALSO
+.PP
+\f[V]obiannotate\f[R]
+.SH HISTORY
+.SH BUGS
+.PP
+Submit bug reports online at:
+https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues
+.SH AUTHORS
+Eric Coissac <eric.coissac@metabarcoding.org>.