Files
obitools4/autodoc/docmd/pkg/obiseq/predicate.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

2.1 KiB

Sequence Predicate Framework in obiseq

This Go package provides a flexible and composable predicate system for filtering biological sequences (BioSequence) based on diverse criteria.

Core Concepts

  • SequencePredicate: A function type func(*BioSequence) bool, enabling conditional logic on sequences.
  • Predicate Composition: Supports logical operations (And, Or, Xor, Not) and chaining.
  • Paired-end Support: Predicates can be adapted to consider read pairs via PredicateOnPaired and PairedPredicat, with modes:
    • ForwardOnly: Only the forward read is evaluated.
    • ReverseOnly, And, Or, AndNot, Xor: Combine forward and reverse evaluations.

Built-in Predicates

Predicate Description
HasAttribute(name) Checks if a sequence has an annotation with the given name.
IsAttributeMatch(name, pattern) Tests if a named annotation matches the provided regex (case-sensitive).
IsMoreAbundantOrEqualTo(count) / IsLessAbundantOrEqualTo(count) Filters by sequence abundance (count field).
IsLongerOrEqualTo(length) / IsShorterOrEqualTo(length) Filters by sequence length.
OccurInAtleast(sample, n) Checks if the sequence appears in at least n samples (via description stats).
IsSequenceMatch(pattern) Matches the raw sequence against a regex (case-insensitive).
IsDefinitionMatch(pattern) Matches the definition/description line against a regex.
IsIdMatch(pattern) / IsIdIn(ids...) Filters by sequence ID using regex or explicit set.
ExpressionPredicat(expression) Evaluates a custom boolean expression (via OBILang) using annotations and sequence metadata.

Design Highlights

  • Null-safe: nil predicates are handled gracefully in compositions.
  • Extensible: Custom predicates can be defined and combined seamlessly.
  • Logging & Safety: Invalid regex patterns or expression syntax trigger fatal errors; runtime evaluation issues emit warnings.

This framework enables powerful, declarative filtering pipelines for high-throughput sequencing data analysis.