diff --git a/doc/book/annexes.qmd b/doc/book/annexes.qmd
index 80d7e27..7eb0467 100644
--- a/doc/book/annexes.qmd
+++ b/doc/book/annexes.qmd
@@ -1,82 +1,107 @@
# Annexes
-### Sequence attributes
+## Sequence attributes
-#### Reserved sequence attributes
+**ali_dir (`string`)**
-##### `ali_dir`
+ - Set by the *obipairing* tool
+ - The attribute can contain 2 string values `left` or `right`.
-###### Type : `string`
+ The alignment generated by *obipairing* is a 3'-end gap free algorithm.
+ Two cases can occur when aligning the forward and reverse reads. If the
+ barcode is long enough, both the reads overlap only on their 3' ends. In
+ such case, the alignment direction `ali_dir` is set to *left*. If the
+ barcode is shorter than the read length, the paired reads overlap by
+ their 5' ends, and the complete barcode is sequenced by both the reads.
+ In that later case, `ali_dir` is set to *right*.
-The attribute can contain 2 string values `"left"` or `"right".`
+**ali_length (`int`)**
-###### Set by the *obipairing* tool
+ - Set by the *obipairing* tool
-The alignment generated by *obipairing* is a 3'-end gap free algorithm.
-Two cases can occur when aligning the forward and reverse reads. If the
-barcode is long enough, both the reads overlap only on their 3' ends. In
-such case, the alignment direction `ali_dir` is set to *left*. If the
-barcode is shorter than the read length, the paired reads overlap by
-their 5' ends, and the complete barcode is sequenced by both the reads.
-In that later case, `ali_dir` is set to *right*.
+ Length of the aligned parts when merging forward and reverse reads
-##### `ali_length`
-###### Set by the *obipairing* tool
+**count (`int`)**
-Length of the aligned parts when merging forward and reverse reads
+ - Set by the *obiuniq* tool
+ - Getter : method `Count()`
+ - Setter : method `SetCount(int)`
-##### `count` : the number of sequence occurrences
+ The `count` attribute indicates how-many strictly identical reads
+ have been merged in a single record. It contains an integer value. If it
+ is absent this means that the sequence record represents a single
+ occurrence of the sequence.
-###### Set by the *obiuniq* tool
+ The `Count()` method allows to access to the count attribute as an
+ integer value. If the `count` attribute is not defined for the given
+ sequence, the value *1* is returned
-The `count` attribute indicates how-many strictly identical sequences
-have been merged in a single record. It contains an integer value. If it
-is absent this means that the sequence record represents a single
-occurrence of the sequence.
+**merged_* (`map[string]int`)**
-###### Getter : method `Count()`
+ - Set by the *obiuniq* tool
-The `Count()` method allows to access to the count attribute as an
-integer value. If the `count` attribute is not defined for the given
-sequence, the value *1* is returned
+ The `-m` option of the *obiuniq* tools allows for keeping track of the
+ distribution of the values stored in given attribute of interest. Often
+ this option is used to summarise distribution of a sequence variant
+ accross samples when *obiuniq* is run after running *obimultiplex*. The
+ actual name of the attribute depends on the name of the monitored
+ attribute. If `-m` option is used with the attribute *sample*, then this
+ attribute names *merged_sample*.
-##### `merged_*`
+**mode (`string`)**
-###### Type : `map[string]int`
+ - Set by the *obipairing* tool
+ - The attribute can contain 2 string values `join` or `alignment`.
-###### Set by the *obiuniq* tool
-The `-m` option of the *obiuniq* tools allows for keeping track of the
-distribution of the values stored in given attribute of interest. Often
-this option is used to summarise distribution of a sequence variant
-accross samples when *obiuniq* is run after running *obimultiplex*. The
-actual name of the attribute depends on the name of the monitored
-attribute. If `-m` option is used with the attribute *sample*, then this
-attribute names *merged_sample*.
+**obitag_ref_index (`map[string]string`)**
-##### `mode`
+ - Set by the *obirefidx* tool.
-###### Set by the *obipairing* tool
+ It resumes to which taxonomic annotation a match to that sequence must
+ lead according to the number of differences existing between the query
+ sequence and the reference sequence having that tag.
-**`obitag_ref_index`**
+```json
+ {"0":"9606@Homo sapiens@species",
+ "2":"207598@Homininae@subfamily",
+ "3":"9604@Hominidae@family",
+ "8":"314295@Hominoidea@superfamily",
+ "10":"9526@Catarrhini@parvorder",
+ "12":"1437010@Boreoeutheria@clade",
+ "16":"9347@Eutheria@clade",
+ "17":"40674@Mammalia@class",
+ "22":"117571@Euteleostomi@clade",
+ "25":"7776@Gnathostomata@clade",
+ "29":"33213@Bilateria@clade",
+ "30":"6072@Eumetazoa@clade"}
+```
-###### Set by the *obirefidx* tool.
+**pairing_mismatches (`map[string]string`)**
-It resumes to which taxonomic annotation a match to that sequence must
-lead according to the number of differences existing between the query
-sequence and the reference sequence having that tag.
+ - Set by the *obipairing* tool
-###### Getter : method `Count()`
+**seq_a_single (`int`)**
-##### `pairing_mismatches`
+ - Set by the *obipairing* tool
-###### Set by the *obipairing* tool
+**seq_ab_match (`int`)**
-##### `score`
+ - Set by the *obipairing* tool
-###### Set by the *obipairing* tool
+**seq_b_single (`int`)**
-##### `score_norm`
+ - Set by the *obipairing* tool
-###### Set by the *obipairing* tool
+**score (`int`)**
+
+ - Set by the *obipairing* tool
+
+**score_norm (`float`)**
+
+ - Set by the *obipairing* tool
+ - The value ranges between 0 and 1.
+
+ Score of the alignment between forward and reverse reads expressed as a fraction of identity.
+
diff --git a/doc/book/comm_sampling.qmd b/doc/book/comm_sampling.qmd
index 0a3ba45..a8ca2a6 100644
--- a/doc/book/comm_sampling.qmd
+++ b/doc/book/comm_sampling.qmd
@@ -10,13 +10,39 @@
Sequences can be selected on several of their caracteristics, their length, their id, their sequence. Options allow for specifying the condition if selection.
+**Selection based on the sequence**
+
+
+Sequence records can be selected according if they match or not with a pattern. The simplest pattern is as short sequence (*e.g* `AACCTT`). But the usage of regular patterns allows for looking for more complex pattern. As example, `A[TG]C+G` matches a `A`, followed by a `T` or a `G`, then one or several `C` and endly a `G`.
+
+{{< include ../lib/options/selection/_sequence.qmd >}}
+
+*Examples:*
+
+: Selects only the sequence records that contain an *EcoRI* restriction site.
+
+```bash
+obigrep -s 'GAATTC' seq1.fasta > seq2.fasta
+```
+
+: Selects only the sequence records that contain a stretch of at least 10 ``A``.
+
+```bash
+obigrep -s 'A{10,}' seq1.fasta > seq2.fasta
+```
+
+: Selects only the sequence records that do not contain ambiguous nucleotides.
+
+```bash
+obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta
+```
{{< include ../lib/options/selection/_min-count.qmd >}}
{{< include ../lib/options/selection/_max-count.qmd >}}
-Example
+*Examples*
: Selecting sequence records representing at least five reads in the dataset.
diff --git a/doc/book/expressions.qmd b/doc/book/expressions.qmd
index 71261c4..4a1a883 100644
--- a/doc/book/expressions.qmd
+++ b/doc/book/expressions.qmd
@@ -11,26 +11,64 @@ Several OBITools (*e.g.* obigrep, obiannotate) allow the user to specify some si
### Instrospection functions {.unnumbered}
-- `len(x)`is a generic function allowing to retreive the size of a object. It returns
+**`len(x)`**
+
+: It is a generic function allowing to retreive the size of a object. It returns
the length of a sequences, the number of element in a map like `annotations`, the number
of elements in an array. The reurned value is an `int`.
### Cast functions {.unnumbered}
-- `int(x)` converts if possible the `x` value to an integer value. The function
+**`int(x)`**
+
+: Converts if possible the `x` value to an integer value. The function
returns an `int`.
-- `numeric(x)` converts if possible the `x` value to a float value. The function
+
+**`numeric(x)`**
+
+: Converts if possible the `x` value to a float value. The function
returns a `float`.
-- `bool(x)` converts if possible the `x` value to a boolean value. The function
+
+**`bool(x)`**
+
+: Converts if possible the `x` value to a boolean value. The function
returns a `bool`.
### String related functions {.unnumbered}
-- `printf(format,...)` allows to combine several values to build a string. `format` follows the
+**`printf(format,...)`**
+
+: Allows to combine several values to build a string. `format` follows the
classical C `printf` syntax. The function returns a `string`.
-- `subspc(x)` substitutes every space in the `x` string by the underscore (`_`) character. The function
+
+**`subspc(x)`**
+
+: substitutes every space in the `x` string by the underscore (`_`) character. The function
returns a `string`.
+### Condition function {.unnumbered}
+
+**`ifelse(condition,val1,val2)`**
+
+: The `condition` value has to be a `bool` value. If it is `true` the function returns `val1`,
+ otherwise, it is returning `val2`.
+
+### Sequence analysis related function
+
+**`composition(sequence)`**
+
+: The nucleotide composition of the sequence is returned as as map indexed by `a`, `c`, `g`, or `t` and
+ each value is the number of occurrences of that nucleotide. A fifth key `others` accounts for
+ all others symboles.
+
+**`gcskew(sequence)`**
+
+: Computes the excess of g compare to c of the sequence, known as the GC skew.
+
+ $$
+ Skew_{GC}=\frac{G-C}{G+C}
+ $$
+
## Accessing to the sequence annotations
The `annotations` variable is a map object containing all the annotations associated to the currently processed sequence. Index of the map are the attribute names. It exists to possibillities to retreive
@@ -53,4 +91,7 @@ Special attributes of the sequence are accessible only by dedicated methods of t
- The sequence identifier : `Id()`
- THe sequence definition : `Definition()`
+```go
+sequence.Id()
+```
diff --git a/doc/build/_book/OBITools-V4.epub b/doc/build/_book/OBITools-V4.epub
index 934b2b4..8bffd15 100644
Binary files a/doc/build/_book/OBITools-V4.epub and b/doc/build/_book/OBITools-V4.epub differ
diff --git a/doc/build/_book/annexes.html b/doc/build/_book/annexes.html
index bbdf381..1185fc3 100644
--- a/doc/build/_book/annexes.html
+++ b/doc/build/_book/annexes.html
@@ -20,6 +20,69 @@ ul.task-list li input[type="checkbox"] {
margin: 0 0.8em 0.2em -1.6em;
vertical-align: middle;
}
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+ { counter-reset: source-line 0; }
+pre.numberSource code > span
+ { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+ { content: counter(source-line);
+ position: relative; left: -1em; text-align: right; vertical-align: baseline;
+ border: none; display: inline-block;
+ -webkit-touch-callout: none; -webkit-user-select: none;
+ -khtml-user-select: none; -moz-user-select: none;
+ -ms-user-select: none; user-select: none;
+ padding: 0 4px; width: 4em;
+ color: #aaaaaa;
+ }
+pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
+div.sourceCode
+ { }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
+code span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code span.at { color: #7d9029; } /* Attribute */
+code span.bn { color: #40a070; } /* BaseN */
+code span.bu { color: #008000; } /* BuiltIn */
+code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code span.ch { color: #4070a0; } /* Char */
+code span.cn { color: #880000; } /* Constant */
+code span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code span.dt { color: #902000; } /* DataType */
+code span.dv { color: #40a070; } /* DecVal */
+code span.er { color: #ff0000; font-weight: bold; } /* Error */
+code span.ex { } /* Extension */
+code span.fl { color: #40a070; } /* Float */
+code span.fu { color: #06287e; } /* Function */
+code span.im { color: #008000; font-weight: bold; } /* Import */
+code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+code span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code span.op { color: #666666; } /* Operator */
+code span.ot { color: #007020; } /* Other */
+code span.pp { color: #bc7a00; } /* Preprocessor */
+code span.sc { color: #4070a0; } /* SpecialChar */
+code span.ss { color: #bb6688; } /* SpecialString */
+code span.st { color: #4070a0; } /* String */
+code span.va { color: #19177c; } /* Variable */
+code span.vs { color: #4070a0; } /* VerbatimString */
+code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
@@ -215,7 +278,7 @@ ul.task-list li input[type="checkbox"] {
@@ -239,84 +302,82 @@ ul.task-list li input[type="checkbox"] {
-
-
A.0.1 Sequence attributes
-
-
A.0.1.1 Reserved sequence attributes
-
-
A.0.1.1.1ali_dir
-
-
A.0.1.1.1.1 Type : string
-
The attribute can contain 2 string values "left" or "right".
-
-
-
A.0.1.1.1.2 Set by the obipairing tool
+
+
A.1 Sequence attributes
+
ali_dir (string)
+
+
Set by the obipairing tool
+
The attribute can contain 2 string values left or right.
+
The alignment generated by obipairing is a 3’-end gap free algorithm. Two cases can occur when aligning the forward and reverse reads. If the barcode is long enough, both the reads overlap only on their 3’ ends. In such case, the alignment direction ali_dir is set to left. If the barcode is shorter than the read length, the paired reads overlap by their 5’ ends, and the complete barcode is sequenced by both the reads. In that later case, ali_dir is set to right.
-
-
-
-
A.0.1.1.2ali_length
-
-
A.0.1.1.2.1 Set by the obipairing tool
+
ali_length (int)
+
+
Set by the obipairing tool
+
Length of the aligned parts when merging forward and reverse reads
-
-
-
-
A.0.1.1.3count : the number of sequence occurrences
-
-
A.0.1.1.3.1 Set by the obiuniq tool
-
The count attribute indicates how-many strictly identical sequences have been merged in a single record. It contains an integer value. If it is absent this means that the sequence record represents a single occurrence of the sequence.
-
-
-
A.0.1.1.3.2 Getter : method Count()
+
count (int)
+
+
Set by the obiuniq tool
+
Getter : method Count()
+
Setter : method SetCount(int)
+
+
The count attribute indicates how-many strictly identical reads have been merged in a single record. It contains an integer value. If it is absent this means that the sequence record represents a single occurrence of the sequence.
The Count() method allows to access to the count attribute as an integer value. If the count attribute is not defined for the given sequence, the value 1 is returned
-
-
-
-
A.0.1.1.4merged_*
-
-
A.0.1.1.4.1 Type : map[string]int
-
-
-
A.0.1.1.4.2 Set by the obiuniq tool
+
merged_* (map[string]int)
+
+
Set by the obiuniq tool
+
The -m option of the obiuniq tools allows for keeping track of the distribution of the values stored in given attribute of interest. Often this option is used to summarise distribution of a sequence variant accross samples when obiuniq is run after running obimultiplex. The actual name of the attribute depends on the name of the monitored attribute. If -m option is used with the attribute sample, then this attribute names merged_sample.
-
-
-
-
A.0.1.1.5mode
-
-
A.0.1.1.5.1 Set by the obipairing tool
-
obitag_ref_index
-
-
-
A.0.1.1.5.2 Set by the obirefidx tool.
+
mode (string)
+
+
Set by the obipairing tool
+
The attribute can contain 2 string values join or alignment.
+
+
obitag_ref_index (map[string]string)
+
+
Set by the obirefidx tool.
+
It resumes to which taxonomic annotation a match to that sequence must lead according to the number of differences existing between the query sequence and the reference sequence having that tag.
12.1.1.1 Selecting sequences based on their caracteristics
Sequences can be selected on several of their caracteristics, their length, their id, their sequence. Options allow for specifying the condition if selection.
+
Selection based on the sequence
+
Sequence records can be selected according if they match or not with a pattern. The simplest pattern is as short sequence (e.gAACCTT). But the usage of regular patterns allows for looking for more complex pattern. As example, A[TG]C+G matches a A, followed by a T or a G, then one or several C and endly a G.
+
+
--sequence|-sPATTERN
+
+
Regular expression pattern to be tested against the sequence itself. The pattern is case insensitive. A complete description of the regular pattern grammar is available here.
+
+
Examples:
+
+
Selects only the sequence records that contain an EcoRI restriction site.
+
+
+
obigrep-s'GAATTC' seq1.fasta > seq2.fasta
+
: Selects only the sequence records that contain a stretch of at least 10 A.
+
obigrep-s'A{10,}' seq1.fasta > seq2.fasta
+
: Selects only the sequence records that do not contain ambiguous nucleotides.
only sequences reprensenting no more than COUNT reads will be selected. That option rely on the count attribute. If the count attribute is not defined for a sequence record, it is assumed equal to \(1\).
-
Example
+
Examples
Selecting sequence records representing at least five reads in the dataset.
len(x)is a generic function allowing to retreive the size of a object. It returns the length of a sequences, the number of element in a map like annotations, the number of elements in an array. The reurned value is an int.
-
+
+
len(x)
+
+
It is a generic function allowing to retreive the size of a object. It returns the length of a sequences, the number of element in a map like annotations, the number of elements in an array. The reurned value is an int.
+
+
Cast functions
-
-
int(x) converts if possible the x value to an integer value. The function returns an int.
-
numeric(x) converts if possible the x value to a float value. The function returns a float.
-
bool(x) converts if possible the x value to a boolean value. The function returns a bool.
-
+
+
int(x)
+
+
Converts if possible the x value to an integer value. The function returns an int.
+
+
numeric(x)
+
+
Converts if possible the x value to a float value. The function returns a float.
+
+
bool(x)
+
+
Converts if possible the x value to a boolean value. The function returns a bool.
+
+
String related functions
-
-
printf(format,...) allows to combine several values to build a string. format follows the classical C printf syntax. The function returns a string.
-
subspc(x) substitutes every space in the x string by the underscore (_) character. The function returns a string.
-
+
+
printf(format,...)
+
+
Allows to combine several values to build a string. format follows the classical C printf syntax. The function returns a string.
+
+
subspc(x)
+
+
substitutes every space in the x string by the underscore (_) character. The function returns a string.
+
+
+
+
+
Condition function
+
+
ifelse(condition,val1,val2)
+
+
The condition value has to be a bool value. If it is true the function returns val1, otherwise, it is returning val2.
+
+
+
+
+
7.2.1 Sequence analysis related function
+
+
composition(sequence)
+
+
The nucleotide composition of the sequence is returned as as map indexed by a, c, g, or t and each value is the number of occurrences of that nucleotide. A fifth key others accounts for all others symboles.
+
+
gcskew(sequence)
+
+
Computes the excess of g compare to c of the sequence, known as the GC skew.