Files
obitools4/autodoc/docmd/pkg/obiseq/language.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

2.3 KiB
Raw Blame History

Semantic Description of obiseq Language Extensions

The package obiseq extends the Gval expression language with domain-specific functions tailored for bioinformatics and data processing. It integrates utility helpers from obiutils to provide type-flexible, robust operations over sequences and collections.

Core Functionalities

  • Data Inspection:
    len, ismap, isvector — retrieve size and type information.

  • Aggregation & Comparison:
    min, max — compute extremal values in slices/maps (via obiutils.Min/Max).
    (Note: commented-out helper functions suggest prior attempts at manual implementations.)

  • Type Conversion:
    int, numeric (→ float64), bool, string — safely coerce arbitrary inputs to target types; fail with fatal logs on invalid data.

  • String Manipulation:
    sprintf, subspc (replace spaces with underscores), replace (regex-based substitution), and substr — support formatting, normalization, and slicing.

  • Sequence Analysis (Bioinformatics):
    gc, gcskew, and composition — compute nucleotide composition metrics for DNA/RNA sequences (BioSequence).

    • gc: GC content ratio (excluding ambiguous bases 'o')
    • gcskew: (GC)/(G+C) asymmetry measure
    • composition: returns a map of base counts (e.g., "a":20.0, "g":15.0)
  • Element Access:
    elementof(seq, idx) — retrieves item at index/key for slices ([]interface{}), maps (map[string]interface{}), or strings (by byte position).

  • Control Flow:
    ifelse(cond, then_val, else_val) — conditional branching within expressions.

  • Quality Support:
    qualities(seq) — extracts per-base quality scores as a float slice from sequencing reads.

Design Principles

  • Dynamic Typing: Accepts ...interface{} arguments for flexibility.
  • Error Handling: Uses fatal logging (log.Fatalf) on conversion failures; returns typed errors for runtime issues.
  • Extensibility: Built atop gval.Language, enabling custom expression evaluation in pipelines (e.g., filtering reads via GC thresholds).

This package serves as a bridge between high-level scripting and low-level biosequence computation, ideal for rule-based filtering or annotation in NGS workflows.