obitools4/doc/commands.qmd

# The *OBITools V4* commands

## Specifying the input files to *OBITools* commands

## Options common to most of the *OBITools* commands

### Specifying input format

Five sequence formats are accepted for input files. *Fasta* (@sec-fasta) and *Fastq* (@sec-fastq) are the main ones, EMBL and Genbank allow the use of flat files produced by these two international databases. The last one, ecoPCR, is maintained for compatibility with previous *OBITools* and allows to read *ecoPCR* outputs as sequence files.

-   `--ecopcr` : Read data following the *ecoPCR* output format.
-   `--embl` Read data following the *EMBL* flatfile format.
-   `--genbank` Read data following the *Genbank* flatfile format.

Several encoding schemes have been proposed for quality scores in *Fastq* format. Currently, *OBITools* considers Sanger encoding as the standard. For reasons of compatibility with older datasets produced with *Solexa* sequencers, it is possible, by using the following option, to force the use of the corresponding quality encoding scheme when reading these older files.

-   `--solexa` Decodes quality string according to the Solexa specification. (default: false)

### Specifying output format

Only two output sequence formats are supported by OBITools, Fasta and Fastq. Fastq is used when output sequences are associated with quality information. Otherwise, Fasta is the default format. However, it is possible to force the output format by using one of the following two options. Forcing the use of Fasta results in the loss of quality information. Conversely, when the Fastq format is forced with sequences that have no quality data, dummy qualities set to 40 for each nucleotide are added.

-   `--fasta-output` Read data following the ecoPCR output format.
-   `--fastq-output` Read data following the EMBL flatfile format.

OBITools allows multiple input files to be specified for a single command.

-   `--no-order` When several input files are provided, indicates that there is no order among them. (default: false). 
                 Using such option can increase a lot the processing of the data.

### The Fasta and Fastq annotations format

OBITools extend the [Fasta](#the-fasta-sequence-format) and [Fastq](#the-fastq-sequence-format) formats by introducing a format for the title lines of these formats allowing to annotate every sequence. While the previous version of OBITools used an *ad-hoc* format for these annotation, this new version introduce the usage of the standard JSON format to store them.

On input, OBITools automatically recognize the format of the annotations, but two options allows to force the parsing following one of them. You should normally not need to use these options.

-   `--input-OBI-header` FASTA/FASTQ title line annotations follow OBI format. (default: false)

-   `--input-json-header` FASTA/FASTQ title line annotations follow json format. (default: false)

On output, by default annotation are formatted using the new JSON format. For compatibility with previous version of OBITools and with external scripts and software, it is possible to force the usage of the previous OBITools format.

-   `--output-OBI-header|-O` output FASTA/FASTQ title line annotations follow OBI format. (default: false)

-   `--output-json-header` output FASTA/FASTQ title line annotations follow json format. (default: false)

#### System related options

-   `--debug` (default: false)
-   `--help\|-h\|-?` (default: false)
-   `--max-cpu <int>` Number of parallele threads computing the result (default: 10)
-   `--workers\|-w <int>` Number of parallele threads computing the result (default: 9)

## OBITools expression language

Several OBITools (*e.g.* obigrep, obiannotate) allow the user to specify some simple expressions to compute values or define predicates. This expressions are parsed and evaluated using the [gval](https://pkg.go.dev/github.com/PaesslerAG/gval "Gval (Go eVALuate) for evaluating arbitrary expressions Go-like expressions.") go package, which allows for evaluating go-Like expression.

### Variables usable in the expression

- `sequence` is the sequence object on which the expression is evaluated.
- `annotations`is a map object containing every annotations associated to the currently processed sequence.

### Function defined in the language 

#### Instrospection functions {.unnumbered}

- `len(x)`is a generic function allowing to retreive the size of a object. It returns 
  the length of a sequences, the number of element in a map like `annotations`, the number
  of elements in an array. The reurned value is an `int`.

#### Cast functions {.unnumbered}

- `int(x)`  converts if possible the `x` value to an integer value. The function 
  returns an `int`.
- `numeric(x)` converts if possible the `x` value to a float value. The function 
  returns a `float`.
- `bool(x)` converts if possible the `x` value to a boolean value. The function 
  returns a `bool`.

#### String related functions {.unnumbered}

- `printf(format,...)` allows to combine several values to build a string. `format` follows the
   classical C `printf` syntax. The function returns a `string`.
- `subspc(x)` substitutes every space in the `x` string by the underscore (`_`) character. The function 
   returns a `string`. 

### Accessing to the sequence annotations

The `annotations` variable is a map object containing all the annotations associated to the currently processed sequence. Index of the map are the attribute names. It exists to possibillities to retreive
an annotation. It is possible to use the classical `[]` indexing operator, putting the attribute name
quoted by double quotes between them. 

```go
annotations["direction"]
```

The above code retreives the `direction` annotation. A second notation using the dot (`.`) is often
more convenient.

```go
annotations.direction
```

Special attributes of the sequence are accessible only by dedicated methods of the `sequence` object.

- The sequence identifier : `Id()`
- THe sequence definition : `Definition()`


## Metabarcode design and quality assessment

### `obipcr`

> Replace the `ecoPCR` original *OBITools*

## File format conversions

### `obiconvert`

## Sequence annotations

### `obiannotate` 

### `obitag` 

## Computations on sequences

{{< include _obipairing.qmd >}}

### `obimultiplex`

> Replace the `ngsfilter` original *OBITools*

### `obicomplement`

### `obiclean`

### `obiuniq`

## Sequence sampling and filtering 

### `obigrep` 

{{< include _utilities.qmd >}}
Complement on the doc 2023-01-27 10:49:28 +01:00			`# The OBITools V4 commands`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
			`## Specifying the input files to OBITools commands`

			`## Options common to most of the OBITools commands`

			`### Specifying input format`

Documentation writting 2023-02-03 23:00:23 +01:00			`Five sequence formats are accepted for input files. Fasta (@sec-fasta) and Fastq (@sec-fastq) are the main ones, EMBL and Genbank allow the use of flat files produced by these two international databases. The last one, ecoPCR, is maintained for compatibility with previous OBITools and allows to read ecoPCR outputs as sequence files.`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
			- `--ecopcr` : Read data following the ecoPCR output format.
			- `--embl` Read data following the EMBL flatfile format.
			- `--genbank` Read data following the Genbank flatfile format.

Documentation writting 2023-02-03 23:00:23 +01:00			`Several encoding schemes have been proposed for quality scores in Fastq format. Currently, OBITools considers Sanger encoding as the standard. For reasons of compatibility with older datasets produced with Solexa sequencers, it is possible, by using the following option, to force the use of the corresponding quality encoding scheme when reading these older files.`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
			- `--solexa` Decodes quality string according to the Solexa specification. (default: false)

			`### Specifying output format`

			`Only two output sequence formats are supported by OBITools, Fasta and Fastq. Fastq is used when output sequences are associated with quality information. Otherwise, Fasta is the default format. However, it is possible to force the output format by using one of the following two options. Forcing the use of Fasta results in the loss of quality information. Conversely, when the Fastq format is forced with sequences that have no quality data, dummy qualities set to 40 for each nucleotide are added.`

			- `--fasta-output` Read data following the ecoPCR output format.
			- `--fastq-output` Read data following the EMBL flatfile format.

			`OBITools allows multiple input files to be specified for a single command.`

Documentation writting 2023-02-03 23:00:23 +01:00			- `--no-order` When several input files are provided, indicates that there is no order among them. (default: false).
			`Using such option can increase a lot the processing of the data.`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Documentation writting 2023-02-03 23:00:23 +01:00			`### The Fasta and Fastq annotations format`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Documentation writting 2023-02-03 23:00:23 +01:00			`OBITools extend the [Fasta](#the-fasta-sequence-format) and [Fastq](#the-fastq-sequence-format) formats by introducing a format for the title lines of these formats allowing to annotate every sequence. While the previous version of OBITools used an ad-hoc format for these annotation, this new version introduce the usage of the standard JSON format to store them.`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
			`On input, OBITools automatically recognize the format of the annotations, but two options allows to force the parsing following one of them. You should normally not need to use these options.`

			- `--input-OBI-header` FASTA/FASTQ title line annotations follow OBI format. (default: false)

			- `--input-json-header` FASTA/FASTQ title line annotations follow json format. (default: false)

			`On output, by default annotation are formatted using the new JSON format. For compatibility with previous version of OBITools and with external scripts and software, it is possible to force the usage of the previous OBITools format.`

			- `--output-OBI-header\|-O` output FASTA/FASTQ title line annotations follow OBI format. (default: false)

			- `--output-json-header` output FASTA/FASTQ title line annotations follow json format. (default: false)

			`#### System related options`

			- `--debug` (default: false)
			- `--help\\|-h\\|-?` (default: false)
			- `--max-cpu <int>` Number of parallele threads computing the result (default: 10)
			- `--workers\\|-w <int>` Number of parallele threads computing the result (default: 9)

			`## OBITools expression language`

			`Several OBITools (e.g. obigrep, obiannotate) allow the user to specify some simple expressions to compute values or define predicates. This expressions are parsed and evaluated using the [gval](https://pkg.go.dev/github.com/PaesslerAG/gval "Gval (Go eVALuate) for evaluating arbitrary expressions Go-like expressions.") go package, which allows for evaluating go-Like expression.`

			`### Variables usable in the expression`

Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			- `sequence` is the sequence object on which the expression is evaluated.
			- `annotations`is a map object containing every annotations associated to the currently processed sequence.
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`### Function defined in the language`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`#### Instrospection functions {.unnumbered}`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			- `len(x)`is a generic function allowing to retreive the size of a object. It returns
			the length of a sequences, the number of element in a map like `annotations`, the number
			of elements in an array. The reurned value is an `int`.
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`#### Cast functions {.unnumbered}`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			- `int(x)` converts if possible the `x` value to an integer value. The function
			returns an `int`.
			- `numeric(x)` converts if possible the `x` value to a float value. The function
			returns a `float`.
			- `bool(x)` converts if possible the `x` value to a boolean value. The function
			returns a `bool`.
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`#### String related functions {.unnumbered}`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			- `printf(format,...)` allows to combine several values to build a string. `format` follows the
			classical C `printf` syntax. The function returns a `string`.
			- `subspc(x)` substitutes every space in the `x` string by the underscore (`_`) character. The function
			returns a `string`.
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
			`### Accessing to the sequence annotations`

Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			The `annotations` variable is a map object containing all the annotations associated to the currently processed sequence. Index of the map are the attribute names. It exists to possibillities to retreive
			an annotation. It is possible to use the classical `[]` indexing operator, putting the attribute name
			`quoted by double quotes between them.`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			```go
			`annotations["direction"]`
			```
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			The above code retreives the `direction` annotation. A second notation using the dot (`.`) is often
			`more convenient.`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			```go
			`annotations.direction`
			```
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			Special attributes of the sequence are accessible only by dedicated methods of the `sequence` object.
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			- The sequence identifier : `Id()`
			- THe sequence definition : `Definition()`
Complement on the doc 2023-01-27 10:49:28 +01:00

Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`## Metabarcode design and quality assessment`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obipcr`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			> Replace the `ecoPCR` original OBITools
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`## File format conversions`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obiconvert`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`## Sequence annotations`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obiannotate`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obitag`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`## Computations on sequences`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`{{< include _obipairing.qmd >}}`
Complement on the doc 2023-01-27 10:49:28 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obimultiplex`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
			> Replace the `ngsfilter` original OBITools

Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obicomplement`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obiclean`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obiuniq`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`## Sequence sampling and filtering`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			### `obigrep`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00
Complete the documentation and add a Release note file 2023-02-18 19:54:21 +01:00			`{{< include _utilities.qmd >}}`
Adds the new version of the doc as a quarto book 2023-01-17 19:06:14 +01:00