- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.3 KiB
Semantic Description of obiseq Language Extensions
The package obiseq extends the Gval expression language with domain-specific functions tailored for bioinformatics and data processing. It integrates utility helpers from obiutils to provide type-flexible, robust operations over sequences and collections.
Core Functionalities
-
Data Inspection:
len,ismap,isvector— retrieve size and type information. -
Aggregation & Comparison:
min,max— compute extremal values in slices/maps (viaobiutils.Min/Max).
(Note: commented-out helper functions suggest prior attempts at manual implementations.) -
Type Conversion:
int,numeric(→ float64),bool,string— safely coerce arbitrary inputs to target types; fail with fatal logs on invalid data. -
String Manipulation:
sprintf,subspc(replace spaces with underscores),replace(regex-based substitution), andsubstr— support formatting, normalization, and slicing. -
Sequence Analysis (Bioinformatics):
gc,gcskew, andcomposition— compute nucleotide composition metrics for DNA/RNA sequences (BioSequence).gc: GC content ratio (excluding ambiguous bases'o')gcskew:(G−C)/(G+C)asymmetry measurecomposition: returns a map of base counts (e.g.,"a":20.0,"g":15.0)
-
Element Access:
elementof(seq, idx)— retrieves item at index/key for slices ([]interface{}), maps (map[string]interface{}), or strings (by byte position). -
Control Flow:
ifelse(cond, then_val, else_val)— conditional branching within expressions. -
Quality Support:
qualities(seq)— extracts per-base quality scores as a float slice from sequencing reads.
Design Principles
- Dynamic Typing: Accepts
...interface{}arguments for flexibility. - Error Handling: Uses fatal logging (
log.Fatalf) on conversion failures; returns typed errors for runtime issues. - Extensibility: Built atop
gval.Language, enabling custom expression evaluation in pipelines (e.g., filtering reads via GC thresholds).
This package serves as a bridge between high-level scripting and low-level biosequence computation, ideal for rule-based filtering or annotation in NGS workflows.