mirror of
https://github.com/metabarcoding/obitools4.git
synced 2025-06-29 16:20:46 +00:00
158 lines
4.5 KiB
Plaintext
158 lines
4.5 KiB
Plaintext
# The GO *OBITools* library
|
|
|
|
## BioSequence
|
|
|
|
The `BioSequence` class is used to represent biological sequences. It
|
|
allows for storing : - the sequence itself as a `[]byte` - the
|
|
sequencing quality score as a `[]byte` if needed - an identifier as a
|
|
`string` - a definition as a `string` - a set of *(key, value)* pairs in
|
|
a `map[sting]interface{}`
|
|
|
|
BioSequence is defined in the obiseq module and is included using the
|
|
code
|
|
|
|
``` go
|
|
import (
|
|
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiseq"
|
|
)
|
|
```
|
|
|
|
### Creating new instances
|
|
|
|
To create new instance, use
|
|
|
|
- `MakeBioSequence(id string, sequence []byte, definition string) obiseq.BioSequence`
|
|
- `NewBioSequence(id string, sequence []byte, definition string) *obiseq.BioSequence`
|
|
|
|
Both create a `BioSequence` instance, but when the first one returns the
|
|
instance, the second returns a pointer on the new instance. Two other
|
|
functions `MakeEmptyBioSequence`, and `NewEmptyBioSequence` do the same
|
|
job but provide an uninitialized objects.
|
|
|
|
- `id` parameters corresponds to the unique identifier of the
|
|
sequence. It mist be a string constituted of a single word (not
|
|
containing any space).
|
|
- `sequence` is the DNA sequence itself, provided as a `byte` array
|
|
(`[]byte`).
|
|
- `definition` is a `string`, potentially empty, but usualy containing
|
|
a sentence explaining what is that sequence.
|
|
|
|
``` go
|
|
import (
|
|
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiseq"
|
|
)
|
|
|
|
func main() {
|
|
myseq := obiseq.NewBiosequence(
|
|
"seq_GH0001",
|
|
bytes.FromString("ACGTGTCAGTCG"),
|
|
"A short test sequence",
|
|
)
|
|
}
|
|
```
|
|
|
|
When formated as fasta the parameters correspond to the following schema
|
|
|
|
>id definition containing potentially several words
|
|
sequence
|
|
|
|
### End of life of a `BioSequence` instance
|
|
|
|
When a `BioSequence` instance is no more used, it is normally taken in
|
|
charge by the GO garbage collector. You can if you want call the
|
|
`Recycle` method on the instance to store the allocated memory element
|
|
in a `pool` to limit allocation effort when many sequences are
|
|
manipulated.
|
|
|
|
### Accessing to the elements of a sequence
|
|
|
|
The different elements of an `obiseq.BioSequence` must be accessed using
|
|
a set of methods. For the three main elements provided during the
|
|
creation of a new instance methodes are :
|
|
|
|
- `Id() string`
|
|
- `Sequence() []byte`
|
|
- `Definition() string`
|
|
|
|
It exists pending method to change the value of these elements
|
|
|
|
- `SetId(id string)`
|
|
- `SetSequence(sequence []byte)`
|
|
- `SetDefinition(definition string)`
|
|
|
|
``` go
|
|
import (
|
|
"fmt"
|
|
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiseq"
|
|
)
|
|
|
|
func main() {
|
|
myseq := obiseq.NewBiosequence(
|
|
"seq_GH0001",
|
|
bytes.FromString("ACGTGTCAGTCG"),
|
|
"A short test sequence",
|
|
)
|
|
|
|
fmt.Println(myseq.Id())
|
|
myseq.SetId("SPE01_0001")
|
|
fmt.Println(myseq.Id())
|
|
}
|
|
```
|
|
|
|
#### Different ways for accessing an editing the sequence
|
|
|
|
If `Sequence()`and `SetSequence(sequence []byte)` methods are the basic
|
|
ones, several other methods exist.
|
|
|
|
- `String() string` return the sequence directly converted to a
|
|
`string` instance.
|
|
- The `Write` method family allows for extending an existing sequence
|
|
following the buffer protocol.
|
|
- `Write(data []byte) (int, error)` allows for appending a byte
|
|
array on 3' end of the sequence.
|
|
- `WriteString(data string) (int, error)` allows for appending a
|
|
`string`.
|
|
- `WriteByte(data byte) error` allows for appending a single
|
|
`byte`.
|
|
|
|
The `Clear` method empties the sequence buffer.
|
|
|
|
``` go
|
|
import (
|
|
"fmt"
|
|
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiseq"
|
|
)
|
|
|
|
func main() {
|
|
myseq := obiseq.NewEmptyBiosequence()
|
|
|
|
myseq.WriteString("accc")
|
|
myseq.WriteByte(byte('c'))
|
|
fmt.Println(myseq.String())
|
|
}
|
|
```
|
|
|
|
#### Sequence quality scores
|
|
|
|
Sequence quality scores cannot be initialized at the time of instance
|
|
creation. You must use dedicated methods to add quality scores to a
|
|
sequence.
|
|
|
|
To be coherent the length of both the DNA sequence and que quality score
|
|
sequence must be equal. But assessment of this constraint is realized.
|
|
It is of the programmer responsability to check that invariant.
|
|
|
|
While accessing to the quality scores relies on the method
|
|
`Quality() []byte`, setting the quality need to call one of the
|
|
following method. They run similarly to their sequence dedicated
|
|
conterpart.
|
|
|
|
- `SetQualities(qualities Quality)`
|
|
- `WriteQualities(data []byte) (int, error)`
|
|
- `WriteByteQualities(data byte) error`
|
|
|
|
In a way analogous to the `Clear` method, `ClearQualities()` empties the
|
|
sequence of quality scores.
|
|
|
|
|