mirror of https://github.com/metabarcoding/obitools4.git synced 2026-04-30 12:00:39 +00:00

Files

T

Eric Coissac 8c7017a99d ⬆️ version bump to v4.5

- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)

2026-04-13 13:34:53 +02:00

1.8 KiB

Raw Blame History

Semantic Description of `obikmer` Package Functionalities

The obikmer package provides disk-backed operations on k-mer sets derived from biological sequences. It supports scalable set algebra and similarity computations via the KmerSetGroup type.

Core Features

Sequence-to-k-mer Indexing: Sequences are converted into k-mers (of length k) and stored in a group of sets (KmerSetGroup), with one set per sequence. Minimizer-based sampling (parameter m) reduces redundancy.
Set Operations on Disk: Efficient disk-resident implementations of standard set operations:
- Union: Merges all k-mers from selected sets.
- Intersect: Retains only k-mers present in all input sets.
- Difference (A \ B): Keeps k-mers present in set A but not in B.
- QuorumAtLeast(r): Returns k-mers appearing in ≥r sets (generalizes union (r=1) and intersection (r=n)).
Consistency Guarantees: Operations obey mathematical identities (e.g., |A ∪ B| = |A| + |B| − |A ∩ B|), validated via unit tests.
Similarity & Distance Metrics:
- JaccardDistanceMatrix(): Computes pairwise Jaccard distances (1 − similarity) between all sets.
- JaccardSimilarityMatrix(): Computes pairwise Jaccard similarities (|A ∩ B| / |A ∪ B|).
- Identical sets yield distance = 0.0, disjoint ones give 1.0; similarity is complementary.

Design Principles

Temporary Directory Usage: All operations use OS temp dirs for isolation and cleanup.
Testing-Focused API: Helper functions (buildGroupFromSeqs, collectKmers) simplify test setup.
Scalability: Disk-backed design avoids memory overflow for large sequence collections.

This package enables robust, reproducible k-mer set analysis in bioinformatics pipelines—especially useful for metagenomic binning, error correction, or read clustering.

1.8 KiB Raw Blame History Unescape Escape

Semantic Description of obikmer Package Functionalities

Core Features

Design Principles

1.8 KiB

Raw Blame History

Semantic Description of `obikmer` Package Functionalities