- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.9 KiB
BioSequence Classifier Module Overview
This Go package (obiseq) provides a flexible and thread-safe framework for classifying biological sequences using different strategies. Each classifier implements four core methods:
Code(sequence) int: assigns an integer class to a sequence.Value(k) string: retrieves the original value (or representation) for class index k.Reset(): clears internal state.Clone() *BioSequenceClassifier: creates a fresh copy of the classifier.
Supported Classifier Types
-
AnnotationClassifier(key, na)
Classifies sequences based on a single annotation field. Missing annotations default tona. Internally maps string values → integer codes via a thread-safe dictionary. -
DualAnnotationClassifier(key1, key2, na)
Uses two annotation fields. Combines them (as JSON array) to form unique class identifiers, enabling multi-dimensional classification. -
PredicateClassifier(predicate)
Binary classifier: returns1if the provided predicate function evaluates to true, else0. Useful for rule-based grouping (e.g., length > 200). -
HashClassifier(size)
Assigns sequences to one ofsizebuckets via CRC32 hash of the raw sequence. Deterministic and memory-efficient, but may cause collisions. -
SequenceClassifier()
Unique class per exact sequence string (case-sensitive). Uses a lock-protected map to deduplicate and index sequences. -
RotateClassifier(size)
Cyclic assignment: sequence i → classi mod size. No memoization; state resets only manually. -
CompositeClassifier(...)
Combines multiple classifiers: concatenates their integer outputs (e.g.,"3:17:0") to form a composite class key. Enables layered or hierarchical classification.
All classifiers are immutable after creation (state is internal and synchronized), supporting concurrent use in pipelines.