mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.6 KiB
1.6 KiB
K-mer Spectrum Analysis Package (obikmer)
This Go package provides tools for analyzing k-mer frequency distributions in biological sequences.
Core Data Structures
-
SpectrumEntry: Represents a bin in the k-mer frequency spectrum:
Frequency: how often a k-mer was observed;Count: number of distinct k-mers with that frequency. -
KmerSpectrum: A sorted list of non-zeroSpectrumEntrys (ascending by frequency), enabling efficient statistics and serialization.
Key Functionalities
Spectrum Management
MapToSpectrum()/ToMap(): Convert between map and structured spectrum representations.MergeSpectraMaps()/MergeTopN(): Combine spectral or top-k data from multiple sources.MaxFrequency()returns the highest observed k-mer count.
I/O & Persistence
- Binary format (
KSP\x01magic header) with varint encoding for compact storage:WriteSpectrum()/ReadSpectrum(): Save/load full spectra to disk.
- CSV export:
WriteTopKmersCSV(): Outputs top-k k-mers with their sequences (decoded from uint64) and frequencies.
Top-N K-mer Tracking
- Uses a min-heap to efficiently maintain the N most frequent k-mers in streaming scenarios:
NewTopNKmers(n): Initialize collector.Add(kmer, freq): Insert/update while respecting capacity n.Results(): Return top-kmers sorted descending by frequency.
Design Highlights
- Memory-efficient: Uses
uint64for k-mers (suitable up to k ≤ 32). - Streaming-friendly: Top-N collector supports incremental updates.
- Thread-safety note: External synchronization required for concurrent access.