mirror of
https://github.com/metabarcoding/obitools4.git
synced 2025-06-29 16:20:46 +00:00
Refactoring codes for removing buffer size options. An some other changes...
Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
This commit is contained in:
@ -1,82 +1,107 @@
|
||||
# Annexes
|
||||
|
||||
### Sequence attributes
|
||||
## Sequence attributes
|
||||
|
||||
#### Reserved sequence attributes
|
||||
**ali_dir (`string`)**
|
||||
|
||||
##### `ali_dir`
|
||||
- Set by the *obipairing* tool
|
||||
- The attribute can contain 2 string values `left` or `right`.
|
||||
|
||||
###### Type : `string`
|
||||
The alignment generated by *obipairing* is a 3'-end gap free algorithm.
|
||||
Two cases can occur when aligning the forward and reverse reads. If the
|
||||
barcode is long enough, both the reads overlap only on their 3' ends. In
|
||||
such case, the alignment direction `ali_dir` is set to *left*. If the
|
||||
barcode is shorter than the read length, the paired reads overlap by
|
||||
their 5' ends, and the complete barcode is sequenced by both the reads.
|
||||
In that later case, `ali_dir` is set to *right*.
|
||||
|
||||
The attribute can contain 2 string values `"left"` or `"right".`
|
||||
**ali_length (`int`)**
|
||||
|
||||
###### Set by the *obipairing* tool
|
||||
- Set by the *obipairing* tool
|
||||
|
||||
The alignment generated by *obipairing* is a 3'-end gap free algorithm.
|
||||
Two cases can occur when aligning the forward and reverse reads. If the
|
||||
barcode is long enough, both the reads overlap only on their 3' ends. In
|
||||
such case, the alignment direction `ali_dir` is set to *left*. If the
|
||||
barcode is shorter than the read length, the paired reads overlap by
|
||||
their 5' ends, and the complete barcode is sequenced by both the reads.
|
||||
In that later case, `ali_dir` is set to *right*.
|
||||
Length of the aligned parts when merging forward and reverse reads
|
||||
|
||||
##### `ali_length`
|
||||
|
||||
###### Set by the *obipairing* tool
|
||||
**count (`int`)**
|
||||
|
||||
Length of the aligned parts when merging forward and reverse reads
|
||||
- Set by the *obiuniq* tool
|
||||
- Getter : method `Count()`
|
||||
- Setter : method `SetCount(int)`
|
||||
|
||||
##### `count` : the number of sequence occurrences
|
||||
The `count` attribute indicates how-many strictly identical reads
|
||||
have been merged in a single record. It contains an integer value. If it
|
||||
is absent this means that the sequence record represents a single
|
||||
occurrence of the sequence.
|
||||
|
||||
###### Set by the *obiuniq* tool
|
||||
The `Count()` method allows to access to the count attribute as an
|
||||
integer value. If the `count` attribute is not defined for the given
|
||||
sequence, the value *1* is returned
|
||||
|
||||
The `count` attribute indicates how-many strictly identical sequences
|
||||
have been merged in a single record. It contains an integer value. If it
|
||||
is absent this means that the sequence record represents a single
|
||||
occurrence of the sequence.
|
||||
**merged_* (`map[string]int`)**
|
||||
|
||||
###### Getter : method `Count()`
|
||||
- Set by the *obiuniq* tool
|
||||
|
||||
The `Count()` method allows to access to the count attribute as an
|
||||
integer value. If the `count` attribute is not defined for the given
|
||||
sequence, the value *1* is returned
|
||||
The `-m` option of the *obiuniq* tools allows for keeping track of the
|
||||
distribution of the values stored in given attribute of interest. Often
|
||||
this option is used to summarise distribution of a sequence variant
|
||||
accross samples when *obiuniq* is run after running *obimultiplex*. The
|
||||
actual name of the attribute depends on the name of the monitored
|
||||
attribute. If `-m` option is used with the attribute *sample*, then this
|
||||
attribute names *merged_sample*.
|
||||
|
||||
##### `merged_*`
|
||||
**mode (`string`)**
|
||||
|
||||
###### Type : `map[string]int`
|
||||
- Set by the *obipairing* tool
|
||||
- The attribute can contain 2 string values `join` or `alignment`.
|
||||
|
||||
###### Set by the *obiuniq* tool
|
||||
|
||||
The `-m` option of the *obiuniq* tools allows for keeping track of the
|
||||
distribution of the values stored in given attribute of interest. Often
|
||||
this option is used to summarise distribution of a sequence variant
|
||||
accross samples when *obiuniq* is run after running *obimultiplex*. The
|
||||
actual name of the attribute depends on the name of the monitored
|
||||
attribute. If `-m` option is used with the attribute *sample*, then this
|
||||
attribute names *merged_sample*.
|
||||
**obitag_ref_index (`map[string]string`)**
|
||||
|
||||
##### `mode`
|
||||
- Set by the *obirefidx* tool.
|
||||
|
||||
###### Set by the *obipairing* tool
|
||||
It resumes to which taxonomic annotation a match to that sequence must
|
||||
lead according to the number of differences existing between the query
|
||||
sequence and the reference sequence having that tag.
|
||||
|
||||
**`obitag_ref_index`**
|
||||
```json
|
||||
{"0":"9606@Homo sapiens@species",
|
||||
"2":"207598@Homininae@subfamily",
|
||||
"3":"9604@Hominidae@family",
|
||||
"8":"314295@Hominoidea@superfamily",
|
||||
"10":"9526@Catarrhini@parvorder",
|
||||
"12":"1437010@Boreoeutheria@clade",
|
||||
"16":"9347@Eutheria@clade",
|
||||
"17":"40674@Mammalia@class",
|
||||
"22":"117571@Euteleostomi@clade",
|
||||
"25":"7776@Gnathostomata@clade",
|
||||
"29":"33213@Bilateria@clade",
|
||||
"30":"6072@Eumetazoa@clade"}
|
||||
```
|
||||
|
||||
###### Set by the *obirefidx* tool.
|
||||
**pairing_mismatches (`map[string]string`)**
|
||||
|
||||
It resumes to which taxonomic annotation a match to that sequence must
|
||||
lead according to the number of differences existing between the query
|
||||
sequence and the reference sequence having that tag.
|
||||
- Set by the *obipairing* tool
|
||||
|
||||
###### Getter : method `Count()`
|
||||
**seq_a_single (`int`)**
|
||||
|
||||
##### `pairing_mismatches`
|
||||
- Set by the *obipairing* tool
|
||||
|
||||
###### Set by the *obipairing* tool
|
||||
**seq_ab_match (`int`)**
|
||||
|
||||
##### `score`
|
||||
- Set by the *obipairing* tool
|
||||
|
||||
###### Set by the *obipairing* tool
|
||||
**seq_b_single (`int`)**
|
||||
|
||||
##### `score_norm`
|
||||
- Set by the *obipairing* tool
|
||||
|
||||
###### Set by the *obipairing* tool
|
||||
**score (`int`)**
|
||||
|
||||
- Set by the *obipairing* tool
|
||||
|
||||
**score_norm (`float`)**
|
||||
|
||||
- Set by the *obipairing* tool
|
||||
- The value ranges between 0 and 1.
|
||||
|
||||
Score of the alignment between forward and reverse reads expressed as a fraction of identity.
|
||||
|
||||
|
@ -10,13 +10,39 @@
|
||||
|
||||
Sequences can be selected on several of their caracteristics, their length, their id, their sequence. Options allow for specifying the condition if selection.
|
||||
|
||||
**Selection based on the sequence**
|
||||
|
||||
|
||||
Sequence records can be selected according if they match or not with a pattern. The simplest pattern is as short sequence (*e.g* `AACCTT`). But the usage of regular patterns allows for looking for more complex pattern. As example, `A[TG]C+G` matches a `A`, followed by a `T` or a `G`, then one or several `C` and endly a `G`.
|
||||
|
||||
{{< include ../lib/options/selection/_sequence.qmd >}}
|
||||
|
||||
*Examples:*
|
||||
|
||||
: Selects only the sequence records that contain an *EcoRI* restriction site.
|
||||
|
||||
```bash
|
||||
obigrep -s 'GAATTC' seq1.fasta > seq2.fasta
|
||||
```
|
||||
|
||||
: Selects only the sequence records that contain a stretch of at least 10 ``A``.
|
||||
|
||||
```bash
|
||||
obigrep -s 'A{10,}' seq1.fasta > seq2.fasta
|
||||
```
|
||||
|
||||
: Selects only the sequence records that do not contain ambiguous nucleotides.
|
||||
|
||||
```bash
|
||||
obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta
|
||||
```
|
||||
|
||||
|
||||
{{< include ../lib/options/selection/_min-count.qmd >}}
|
||||
|
||||
{{< include ../lib/options/selection/_max-count.qmd >}}
|
||||
|
||||
Example
|
||||
*Examples*
|
||||
|
||||
: Selecting sequence records representing at least five reads in the dataset.
|
||||
|
||||
|
@ -11,26 +11,64 @@ Several OBITools (*e.g.* obigrep, obiannotate) allow the user to specify some si
|
||||
|
||||
### Instrospection functions {.unnumbered}
|
||||
|
||||
- `len(x)`is a generic function allowing to retreive the size of a object. It returns
|
||||
**`len(x)`**
|
||||
|
||||
: It is a generic function allowing to retreive the size of a object. It returns
|
||||
the length of a sequences, the number of element in a map like `annotations`, the number
|
||||
of elements in an array. The reurned value is an `int`.
|
||||
|
||||
### Cast functions {.unnumbered}
|
||||
|
||||
- `int(x)` converts if possible the `x` value to an integer value. The function
|
||||
**`int(x)`**
|
||||
|
||||
: Converts if possible the `x` value to an integer value. The function
|
||||
returns an `int`.
|
||||
- `numeric(x)` converts if possible the `x` value to a float value. The function
|
||||
|
||||
**`numeric(x)`**
|
||||
|
||||
: Converts if possible the `x` value to a float value. The function
|
||||
returns a `float`.
|
||||
- `bool(x)` converts if possible the `x` value to a boolean value. The function
|
||||
|
||||
**`bool(x)`**
|
||||
|
||||
: Converts if possible the `x` value to a boolean value. The function
|
||||
returns a `bool`.
|
||||
|
||||
### String related functions {.unnumbered}
|
||||
|
||||
- `printf(format,...)` allows to combine several values to build a string. `format` follows the
|
||||
**`printf(format,...)`**
|
||||
|
||||
: Allows to combine several values to build a string. `format` follows the
|
||||
classical C `printf` syntax. The function returns a `string`.
|
||||
- `subspc(x)` substitutes every space in the `x` string by the underscore (`_`) character. The function
|
||||
|
||||
**`subspc(x)`**
|
||||
|
||||
: substitutes every space in the `x` string by the underscore (`_`) character. The function
|
||||
returns a `string`.
|
||||
|
||||
### Condition function {.unnumbered}
|
||||
|
||||
**`ifelse(condition,val1,val2)`**
|
||||
|
||||
: The `condition` value has to be a `bool` value. If it is `true` the function returns `val1`,
|
||||
otherwise, it is returning `val2`.
|
||||
|
||||
### Sequence analysis related function
|
||||
|
||||
**`composition(sequence)`**
|
||||
|
||||
: The nucleotide composition of the sequence is returned as as map indexed by `a`, `c`, `g`, or `t` and
|
||||
each value is the number of occurrences of that nucleotide. A fifth key `others` accounts for
|
||||
all others symboles.
|
||||
|
||||
**`gcskew(sequence)`**
|
||||
|
||||
: Computes the excess of g compare to c of the sequence, known as the GC skew.
|
||||
|
||||
$$
|
||||
Skew_{GC}=\frac{G-C}{G+C}
|
||||
$$
|
||||
|
||||
## Accessing to the sequence annotations
|
||||
|
||||
The `annotations` variable is a map object containing all the annotations associated to the currently processed sequence. Index of the map are the attribute names. It exists to possibillities to retreive
|
||||
@ -53,4 +91,7 @@ Special attributes of the sequence are accessible only by dedicated methods of t
|
||||
- The sequence identifier : `Id()`
|
||||
- THe sequence definition : `Definition()`
|
||||
|
||||
```go
|
||||
sequence.Id()
|
||||
```
|
||||
|
||||
|
Reference in New Issue
Block a user