Files
obitools4/autodoc/docmd/pkg/obiseq/language.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

42 lines
2.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Semantic Description of `obiseq` Language Extensions
The `package obiseq` extends the [Gval](https://github.com/PaesslerAG/gval) expression language with domain-specific functions tailored for bioinformatics and data processing. It integrates utility helpers from `obiutils` to provide type-flexible, robust operations over sequences and collections.
## Core Functionalities
- **Data Inspection**:
`len`, `ismap`, `isvector` — retrieve size and type information.
- **Aggregation & Comparison**:
`min`, `max` — compute extremal values in slices/maps (via `obiutils.Min/Max`).
*(Note: commented-out helper functions suggest prior attempts at manual implementations.)*
- **Type Conversion**:
`int`, `numeric` (→ float64), `bool`, `string` — safely coerce arbitrary inputs to target types; fail with fatal logs on invalid data.
- **String Manipulation**:
`sprintf`, `subspc` (replace spaces with underscores), `replace` (regex-based substitution), and `substr` — support formatting, normalization, and slicing.
- **Sequence Analysis (Bioinformatics)**:
`gc`, `gcskew`, and `composition` — compute nucleotide composition metrics for DNA/RNA sequences (`BioSequence`).
- `gc`: GC content ratio (excluding ambiguous bases `'o'`)
- `gcskew`: `(GC)/(G+C)` asymmetry measure
- `composition`: returns a map of base counts (e.g., `"a":20.0`, `"g":15.0`)
- **Element Access**:
`elementof(seq, idx)` — retrieves item at index/key for slices (`[]interface{}`), maps (`map[string]interface{}`), or strings (by byte position).
- **Control Flow**:
`ifelse(cond, then_val, else_val)` — conditional branching within expressions.
- **Quality Support**:
`qualities(seq)` — extracts per-base quality scores as a float slice from sequencing reads.
## Design Principles
- **Dynamic Typing**: Accepts `...interface{}` arguments for flexibility.
- **Error Handling**: Uses fatal logging (`log.Fatalf`) on conversion failures; returns typed errors for runtime issues.
- **Extensibility**: Built atop `gval.Language`, enabling custom expression evaluation in pipelines (e.g., filtering reads via GC thresholds).
This package serves as a bridge between high-level scripting and low-level biosequence computation, ideal for rule-based filtering or annotation in NGS workflows.