mirror of
https://github.com/metabarcoding/obitools4.git
synced 2025-06-29 16:20:46 +00:00
Start writing of the man pages
Former-commit-id: 533a8a6380a5a0b66314081a5ec2aeb973ebadec
This commit is contained in:
239
doc/build/_man/man1/obigrep.man
vendored
Normal file
239
doc/build/_man/man1/obigrep.man
vendored
Normal file
@ -0,0 +1,239 @@
|
||||
.\" Automatically generated by Pandoc 2.19.2
|
||||
.\"
|
||||
.\" Define V font for inline verbatim, using C font in formats
|
||||
.\" that render this, and otherwise B font.
|
||||
.ie "\f[CB]x\f[]"x" \{\
|
||||
. ftr V B
|
||||
. ftr VI BI
|
||||
. ftr VB B
|
||||
. ftr VBI BI
|
||||
.\}
|
||||
.el \{\
|
||||
. ftr V CR
|
||||
. ftr VI CI
|
||||
. ftr VB CB
|
||||
. ftr VBI CBI
|
||||
.\}
|
||||
.TH "obigrep" "1" "" "" ""
|
||||
.hy
|
||||
.SH NAME
|
||||
.PP
|
||||
obigrep \[en] filters sequence files according to numerous conditions
|
||||
.SH SYNOPSIS
|
||||
.PP
|
||||
\f[B]obigrep\f[R] [\f[B]--attribute\f[R] | \f[B]-a\f[R]
|
||||
\f[I]KEY=VALUE\f[R]]\&...
|
||||
[\f[B]--compress\f[R] | \f[B]-Z\f[R]] [\f[B]--debug\f[R]]
|
||||
[\f[B]--definition\f[R]|\f[B]-D\f[R] \f[I]PATTERN\f[R]]\&...
|
||||
.PD 0
|
||||
.P
|
||||
.PD
|
||||
[\f[B]--ecopcr\f[R]] [\f[B]--embl\f[R]] [\f[B]--fasta-output\f[R]]
|
||||
[\f[B]--fastq-output\f[R]] [\f[B]--genbank\f[R]]
|
||||
[\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]]\&...
|
||||
[\f[B]--help\f[R] | \f[B]-h\f[R] | \f[B]-?\f[R]] [\f[B]--id-list\f[R]
|
||||
\f[I]FILENAME\f[R]] [\f[B]--identifier\f[R] | \f[B]-I\f[R]
|
||||
\f[I]PATTERN\f[R]]\&...
|
||||
[\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]]\&...
|
||||
[\f[B]--input-OBI-header\f[R]] [\f[B]--input-json-header\f[R]]
|
||||
[\f[B]--inverse-match\f[R] | \f[B]-v\f[R]]
|
||||
[\f[B]--max-count\f[R]|\f[B]-C\f[R] \f[I]COUNT\f[R]]
|
||||
[\f[B]--max-cpu\f[R] \f[I]INT\f[R]] [\f[B]--max-length\f[R] |
|
||||
\f[B]-L\f[R] \f[I]LENGTH\f[R]] [\f[B]--min-count\f[R] | \f[B]-c\f[R]
|
||||
\f[I]COUNT\f[R]] [\f[B]--min-length\f[R] | \f[B]-l\f[R]
|
||||
\f[I]LENGTH\f[R]] [\f[B]--no-order\f[R]] [\f[B]--no-progressbar\f[R]]
|
||||
[\f[B]--out\f[R] | \f[B]-o\f[R] \f[I]FILENAME\f[R]]
|
||||
[\f[B]--output-OBI-header\f[R] | \f[B]-O\f[R]]
|
||||
[\f[B]--output-json-header\f[R]] [\f[B]--paired-mode\f[R]
|
||||
\f[I]forward|reverse|and|or|andnot|xor\f[R]] [\f[B]--paired-with\f[R]
|
||||
\f[I]FILENAME\f[R]] [\f[B]--predicate\f[R]|\f[B]-p\f[R]
|
||||
\f[I]EXPRESSION\f[R]]\&...
|
||||
[\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]]\&...
|
||||
[\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]]\&...
|
||||
[\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]]
|
||||
[\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]]\&...
|
||||
[\f[B]--solexa\f[R]] [\f[B]--taxdump\f[R] | \f[B]-t\f[R]
|
||||
\f[I]DIRECTORY\f[R]] [\f[B]--workers\f[R] | \f[B]-w\f[R] \f[I]INT\f[R]]
|
||||
[\f[I]FILENAMES\f[R]]
|
||||
.SH DESCRIPTION
|
||||
.PP
|
||||
The \f[V]obigrep\f[R] command is somewhat analogous to the standard Unix
|
||||
\f[V]grep\f[R] command.
|
||||
It selects a subset of sequence records from a sequence file.
|
||||
A sequence record is a complex object consisting of an identifier, a set
|
||||
of attributes (a key, defined by its name, associated with a value), a
|
||||
definition, and the sequence itself.
|
||||
Instead of working text line by text line like the standard Unix tool,
|
||||
\f[V]obigrep\f[R] selection is done sequence record by sequence record.
|
||||
A large number of options allow you to refine the selection on any
|
||||
element of the sequence.
|
||||
\f[V]obigrep\f[R] allows you to specify multiple conditions
|
||||
simultaneously (which take on the value \f[V]TRUE\f[R] or
|
||||
\f[V]FALSE\f[R]) and only those sequence records which meet all
|
||||
conditions (all conditions are \f[V]TRUE\f[R]) are selected.
|
||||
\f[V]obigrep\f[R] is able to work on two paired read files.
|
||||
The selection criteria apply to one or the other of the readings in each
|
||||
pair depending on the mode chosen (\f[B]--paired-mode\f[R] option).
|
||||
In all cases the selection is applied in the same way to both files,
|
||||
thus maintaining their consistency.
|
||||
.SH OPTIONS
|
||||
.SS General options
|
||||
.PP
|
||||
\f[B]Helpful options\f[R]
|
||||
.TP
|
||||
\f[B]--help\f[R], \f[B]-h\f[R]
|
||||
Display a friendly help message.
|
||||
.PP
|
||||
\f[B]--no-progressbar\f[R]
|
||||
.PP
|
||||
\f[B]Managing parallel execution\f[R]
|
||||
.TP
|
||||
\f[B]--max-cpu\f[R]
|
||||
OBITools V4 are able to run in parallel on all the CPU cores available
|
||||
on the computer.
|
||||
It is sometime required to limit the computation to a smaller number of
|
||||
cores.
|
||||
That option specify the maximum number of cores that the OBITools
|
||||
command can use.
|
||||
This behaviour can also be set up using the \f[V]OBIMAXCPU\f[R]
|
||||
environment variable.
|
||||
.PP
|
||||
\f[B]--workers\f[R], \f[B]-w\f[R]
|
||||
.PP
|
||||
\f[B]OBITools debuging related options\f[R]
|
||||
.PP
|
||||
\f[B]--debug\f[R]
|
||||
.SS Input format options
|
||||
.PP
|
||||
The OBITools are centered around the [FASTA]
|
||||
(https://en.wikipedia.org/wiki/FASTA_format) and [FASTQ]
|
||||
(https://en.wikipedia.org/wiki/FASTQ_format) formats.
|
||||
These formats are automaticaly recognized when data are read both from
|
||||
files, and from standard input (\f[V]stdin\f[R]).
|
||||
Other formats (genbank, EMBL, ecopcr) are also automatically identified
|
||||
when data are read from files, but for stdin input, input format must be
|
||||
indicated using one of the following options.
|
||||
.SS Output format options
|
||||
.PP
|
||||
\f[B]--fasta-output\f[R]
|
||||
.PP
|
||||
\f[B]--fastq-output\f[R]
|
||||
.PP
|
||||
\f[B]--output-OBI-header\f[R], \f[B]-O\f[R]
|
||||
.PP
|
||||
\f[B]--output-json-header\f[R]
|
||||
.TP
|
||||
\f[B]--out\f[R] \f[I]FILENAME\f[R], \f[B]-o\f[R]
|
||||
OBITools, as all standard UNIX tools, print their results to the
|
||||
standard output (\f[V]stdout\f[R]).
|
||||
To save them, stdout must be redirected to a file.
|
||||
That option allows to specify explicitely an output file to the command.
|
||||
This is especially useful when OBITools are processing paired files.
|
||||
In that later case, the indicated output file names is modified by
|
||||
adding to it the \f[I]_R1\f[R] (forward file) and \f[I]_R2\f[R] (reverse
|
||||
file) suffix just before the extensions (\f[I]e.g.\f[R] sequence.fasta
|
||||
becomes sequence_R1.fasta and sequence_R2.fasta).
|
||||
If that option is not specified and paired files are processed only the
|
||||
forward data are ouputed to the \f[I]stdout\f[R].
|
||||
.TP
|
||||
\f[B]--compress\f[R], \f[B]-Z\f[R]
|
||||
The ouput is compressed following the
|
||||
gzip (https://en.wikipedia.org/wiki/Gzip) format.
|
||||
.SS Paired reads options
|
||||
.PP
|
||||
\f[B]--paired-with\f[R] \f[I]FILENAME\f[R]
|
||||
.PP
|
||||
\f[B]--paired-mode\f[R] \f[I]forward|reverse|and|or|andnot|xor\f[R]
|
||||
.SS Taxonomy related options
|
||||
.PP
|
||||
\f[B]--taxdump\f[R] | \f[B]-t\f[R] \f[I]DIRECTORY\f[R]
|
||||
.PP
|
||||
\f[B]--ignore-taxon\f[R] | \f[B]-i\f[R] \f[I]TAXID\f[R]
|
||||
.PP
|
||||
\f[B]--require-rank\f[R] \f[I]RANK_NAME\f[R]
|
||||
.PP
|
||||
\f[B]--restrict-to-taxon\f[R] | \f[B]-r\f[R] \f[I]TAXID\f[R]
|
||||
.SS Filtering options
|
||||
.PP
|
||||
\f[B]--has-attribute\f[R] | \f[B]-A\f[R] \f[I]KEY\f[R]\&...
|
||||
.PP
|
||||
\f[B]--id-list\f[R] \f[I]FILENAME\f[R]
|
||||
.PP
|
||||
\f[B]--identifier\f[R] | \f[B]-I\f[R] \f[I]PATTERN\f[R]
|
||||
.TP
|
||||
\f[B]--max-count\f[R] | \f[B]-C\f[R] \f[I]COUNT\f[R]
|
||||
only sequences reprensenting no more than \f[I]COUNT\f[R] reads will be
|
||||
selected.
|
||||
That option rely on the \f[V]count\f[R] attribute.
|
||||
If the \f[V]count\f[R] attribute is not defined for a sequence record,
|
||||
it is assumed equal to 1.
|
||||
.TP
|
||||
\f[B]--min-count\f[R] | \f[B]-c\f[R] \f[I]COUNT\f[R]
|
||||
only sequences reprensenting at least \f[I]COUNT\f[R] reads will be
|
||||
selected.
|
||||
That option rely on the \f[V]count\f[R] attribute.
|
||||
If the \f[V]count\f[R] attribute is not defined for a sequence record,
|
||||
it is assumed equal to 1.
|
||||
.PP
|
||||
\f[B]--max-length\f[R] | \f[B]-L\f[R] \f[I]LENGTH\f[R]
|
||||
.PP
|
||||
\f[B]--min-length\f[R] | \f[B]-l\f[R] \f[I]LENGTH\f[R]
|
||||
.PP
|
||||
\f[B]--predicate\f[R]|\f[B]-p\f[R] \f[I]EXPRESSION\f[R]
|
||||
.PP
|
||||
\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]
|
||||
.PP
|
||||
\f[B]--inverse-match\f[R] | \f[B]-v\f[R]
|
||||
.PP
|
||||
\f[B]--save-discarded\f[R] \f[I]FILENAME\f[R]
|
||||
.SH ENVIRONMENT
|
||||
.PP
|
||||
\f[B]OBICPUMAX\f[R]
|
||||
.SH EXAMPLES
|
||||
.IP \[bu] 2
|
||||
Filtering sequence file to keep only barcodes between 8 and 130 bp.
|
||||
.RS 2
|
||||
.IP
|
||||
.nf
|
||||
\f[C]
|
||||
obigrep -l 8 -L 130 data_SPER01.fasta > data_goodLength_SPER01.fasta
|
||||
\f[R]
|
||||
.fi
|
||||
.RE
|
||||
.IP \[bu] 2
|
||||
Filtering reads without anbiguity base code in its sequence.
|
||||
.RS 2
|
||||
.IP
|
||||
.nf
|
||||
\f[C]
|
||||
obigrep -s \[aq]\[ha][acgt]+$\[aq] data_SPER01.fasta > data_onlyACGT_SPER01.fasta
|
||||
\f[R]
|
||||
.fi
|
||||
.RE
|
||||
.IP \[bu] 2
|
||||
Filtering paired files for keeping only pairs of read without ambiguity.
|
||||
.RS 2
|
||||
.IP
|
||||
.nf
|
||||
\f[C]
|
||||
obigrep -s \[aq]\[ha][acgt]+$\[aq] \[rs]
|
||||
--paired-mode and --paired-with wolf_R.fastq.gz \[rs]
|
||||
--out wolf_good.fastq \[rs]
|
||||
wolf_F.fastq.gz
|
||||
\f[R]
|
||||
.fi
|
||||
.PP
|
||||
That command produces two files \f[V]wolf_good_R1.fastq\f[R] and
|
||||
\f[V]wolf_good_R1.fastq\f[R] containing respectively the filtered
|
||||
forward and reverse reads.
|
||||
.RE
|
||||
.SH SEE ALSO
|
||||
.PP
|
||||
\f[V]obiannotate\f[R]
|
||||
.SH HISTORY
|
||||
.SH BUGS
|
||||
.PP
|
||||
Submit bug reports online at:
|
||||
https://git.metabarcoding.org/obitools/obitools4/obitools4/-/issues
|
||||
.SH AUTHORS
|
||||
Eric Coissac <eric.coissac@metabarcoding.org>.
|
Reference in New Issue
Block a user