Files
obitools4/autodoc/docmd/pkg/obiseq/language.md
T

42 lines
2.3 KiB
Markdown
Raw Normal View History

2026-04-07 08:36:50 +02:00
# Semantic Description of `obiseq` Language Extensions
The `package obiseq` extends the [Gval](https://github.com/PaesslerAG/gval) expression language with domain-specific functions tailored for bioinformatics and data processing. It integrates utility helpers from `obiutils` to provide type-flexible, robust operations over sequences and collections.
## Core Functionalities
- **Data Inspection**:
`len`, `ismap`, `isvector` — retrieve size and type information.
- **Aggregation & Comparison**:
`min`, `max` — compute extremal values in slices/maps (via `obiutils.Min/Max`).
*(Note: commented-out helper functions suggest prior attempts at manual implementations.)*
- **Type Conversion**:
`int`, `numeric` (→ float64), `bool`, `string` — safely coerce arbitrary inputs to target types; fail with fatal logs on invalid data.
- **String Manipulation**:
`sprintf`, `subspc` (replace spaces with underscores), `replace` (regex-based substitution), and `substr` — support formatting, normalization, and slicing.
- **Sequence Analysis (Bioinformatics)**:
`gc`, `gcskew`, and `composition` — compute nucleotide composition metrics for DNA/RNA sequences (`BioSequence`).
- `gc`: GC content ratio (excluding ambiguous bases `'o'`)
- `gcskew`: `(GC)/(G+C)` asymmetry measure
- `composition`: returns a map of base counts (e.g., `"a":20.0`, `"g":15.0`)
- **Element Access**:
`elementof(seq, idx)` — retrieves item at index/key for slices (`[]interface{}`), maps (`map[string]interface{}`), or strings (by byte position).
- **Control Flow**:
`ifelse(cond, then_val, else_val)` — conditional branching within expressions.
- **Quality Support**:
`qualities(seq)` — extracts per-base quality scores as a float slice from sequencing reads.
## Design Principles
- **Dynamic Typing**: Accepts `...interface{}` arguments for flexibility.
- **Error Handling**: Uses fatal logging (`log.Fatalf`) on conversion failures; returns typed errors for runtime issues.
- **Extensibility**: Built atop `gval.Language`, enabling custom expression evaluation in pipelines (e.g., filtering reads via GC thresholds).
This package serves as a bridge between high-level scripting and low-level biosequence computation, ideal for rule-based filtering or annotation in NGS workflows.