Files
obitools4/doc/book/expressions.qmd
Eric Coissac d88de15cdc Refactoring codes for removing buffer size options. An some other changes...
Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
2023-03-07 11:12:13 +07:00

98 lines
3.0 KiB
Plaintext

# OBITools expression language
Several OBITools (*e.g.* obigrep, obiannotate) allow the user to specify some simple expressions to compute values or define predicates. This expressions are parsed and evaluated using the [gval](https://pkg.go.dev/github.com/PaesslerAG/gval "Gval (Go eVALuate) for evaluating arbitrary expressions Go-like expressions.") go package, which allows for evaluating go-Like expression.
## Variables usable in the expression
- `sequence` is the sequence object on which the expression is evaluated.
- `annotations`is a map object containing every annotations associated to the currently processed sequence.
## Function defined in the language
### Instrospection functions {.unnumbered}
**`len(x)`**
: It is a generic function allowing to retreive the size of a object. It returns
the length of a sequences, the number of element in a map like `annotations`, the number
of elements in an array. The reurned value is an `int`.
### Cast functions {.unnumbered}
**`int(x)`**
: Converts if possible the `x` value to an integer value. The function
returns an `int`.
**`numeric(x)`**
: Converts if possible the `x` value to a float value. The function
returns a `float`.
**`bool(x)`**
: Converts if possible the `x` value to a boolean value. The function
returns a `bool`.
### String related functions {.unnumbered}
**`printf(format,...)`**
: Allows to combine several values to build a string. `format` follows the
classical C `printf` syntax. The function returns a `string`.
**`subspc(x)`**
: substitutes every space in the `x` string by the underscore (`_`) character. The function
returns a `string`.
### Condition function {.unnumbered}
**`ifelse(condition,val1,val2)`**
: The `condition` value has to be a `bool` value. If it is `true` the function returns `val1`,
otherwise, it is returning `val2`.
### Sequence analysis related function
**`composition(sequence)`**
: The nucleotide composition of the sequence is returned as as map indexed by `a`, `c`, `g`, or `t` and
each value is the number of occurrences of that nucleotide. A fifth key `others` accounts for
all others symboles.
**`gcskew(sequence)`**
: Computes the excess of g compare to c of the sequence, known as the GC skew.
$$
Skew_{GC}=\frac{G-C}{G+C}
$$
## Accessing to the sequence annotations
The `annotations` variable is a map object containing all the annotations associated to the currently processed sequence. Index of the map are the attribute names. It exists to possibillities to retreive
an annotation. It is possible to use the classical `[]` indexing operator, putting the attribute name
quoted by double quotes between them.
```go
annotations["direction"]
```
The above code retreives the `direction` annotation. A second notation using the dot (`.`) is often
more convenient.
```go
annotations.direction
```
Special attributes of the sequence are accessible only by dedicated methods of the `sequence` object.
- The sequence identifier : `Id()`
- THe sequence definition : `Definition()`
```go
sequence.Id()
```