mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.6 KiB
2.6 KiB
Semantic Description of the obikmer Package
This Go package implements a De Bruijn graph for efficient k-mer manipulation and sequence assembly, primarily used in bioinformatics (e.g., metagenomic read error correction or consensus building).
Core Functionalities
- K-mer Encoding: K-mers are encoded as
uint64using 2 bits per nucleotide (A=0, C=1, G=2, T=3), supporting IUPAC ambiguity codes via theiupacmap. - Reverse Complement Handling: The
revcompnuctable enables nucleotide-wise reverse complementation. - Graph Construction: The
DeBruijnGraphstruct maintains a map from k-mer hashes to integer weights (e.g., observed counts), with helper masks for bit manipulation (kmermask,prevc/g/t).
Graph Operations
- Node Queries:
Previouses()/Nexts(): Return predecessor/successor k-mers in the graph.MaxNext()/MaxHead(): Find neighbors or heads (sources) with maximum weight.
- Path Exploration:
MaxPath(): Greedily traces the highest-weight path from a head.LongestPath(): Explores all heads to find the path with maximum cumulative weight (optionally bounded in length).HaviestPath(): Uses Dijkstra-like priority queue to find the heaviest (sum-weight) path, with cycle detection via DFS (HasCycle()).
Consensus & Filtering
- Consensus Generation:
BestConsensus()returns a sequence from the greedy max-weight path.LongestConsensus(id, min_cov)trims low-coverage ends using a coverage threshold (mode-based).
- Weight Statistics:
MaxWeight(),WeightMean(),WeightMode()provide distribution summaries.FilterMinWeight(min)removes low-count nodes.
- Decoding:
DecodeNode()converts a k-mer index to its DNA string.DecodePath()reconstructs the full consensus from a path.
I/O & Diagnostics
- GML Export:
WriteGml()outputs a directed graph in Graph Modelling Language (for visualization), with edge thickness and labels reflecting weights. - Hamming Distance:
HammingDistance()computes edit distance between two encoded k-mers using bit operations. - Sequence Insertion:
Push()adds a biosequence (with count weight) to the graph, expanding all IUPAC variants recursively.
Dependencies & Design
- Leverages
obiseqfor sequence representation andlogrus/slices/heapfrom Go’s stdlib. - Designed for scalability and speed, using bit-level operations to minimize memory footprint.
Overall: a robust k-mer graph engine for de novo assembly, error correction, and consensus recovery in high-throughput sequencing data.