Files
obitools4/autodoc/docmd/pkg/obiseq/predicate.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

34 lines
2.1 KiB
Markdown

# Sequence Predicate Framework in `obiseq`
This Go package provides a flexible and composable predicate system for filtering biological sequences (`BioSequence`) based on diverse criteria.
## Core Concepts
- **`SequencePredicate`**: A function type `func(*BioSequence) bool`, enabling conditional logic on sequences.
- **Predicate Composition**: Supports logical operations (`And`, `Or`, `Xor`, `Not`) and chaining.
- **Paired-end Support**: Predicates can be adapted to consider read pairs via `PredicateOnPaired` and `PairedPredicat`, with modes:
- `ForwardOnly`: Only the forward read is evaluated.
- `ReverseOnly`, `And`, `Or`, `AndNot`, `Xor`: Combine forward and reverse evaluations.
## Built-in Predicates
| Predicate | Description |
|-----------|-------------|
| `HasAttribute(name)` | Checks if a sequence has an annotation with the given name. |
| `IsAttributeMatch(name, pattern)` | Tests if a named annotation matches the provided regex (case-sensitive). |
| `IsMoreAbundantOrEqualTo(count)` / `IsLessAbundantOrEqualTo(count)` | Filters by sequence abundance (count field). |
| `IsLongerOrEqualTo(length)` / `IsShorterOrEqualTo(length)` | Filters by sequence length. |
| `OccurInAtleast(sample, n)` | Checks if the sequence appears in at least *n* samples (via description stats). |
| `IsSequenceMatch(pattern)` | Matches the raw sequence against a regex (case-insensitive). |
| `IsDefinitionMatch(pattern)` | Matches the definition/description line against a regex. |
| `IsIdMatch(pattern)` / `IsIdIn(ids...)` | Filters by sequence ID using regex or explicit set. |
| `ExpressionPredicat(expression)` | Evaluates a custom boolean expression (via OBILang) using annotations and sequence metadata. |
## Design Highlights
- **Null-safe**: `nil` predicates are handled gracefully in compositions.
- **Extensible**: Custom predicates can be defined and combined seamlessly.
- **Logging & Safety**: Invalid regex patterns or expression syntax trigger fatal errors; runtime evaluation issues emit warnings.
This framework enables powerful, declarative filtering pipelines for high-throughput sequencing data analysis.