Eric Coissac
05de9ca58e
Add SuperKmer extraction functionality
...
This commit introduces the ExtractSuperKmers function which identifies maximal subsequences where all consecutive k-mers share the same minimizer. It includes:
- SuperKmer struct to represent the maximal subsequences
- dequeItem struct for tracking minimizers in a sliding window
- Efficient algorithm using monotone deque for O(1) amortized minimizer tracking
- Comprehensive parameter validation
- Support for buffer reuse for performance optimization
- Extensive test cases covering basic functionality, edge cases, and performance benchmarks
The implementation uses simultaneous forward/reverse m-mer encoding for O(1) canonical m-mer computation and maintains a monotone deque to track minimizers efficiently.
2026-02-04 16:04:06 +01:00
Eric Coissac
500144051a
Add jj Makefile targets and k-mer encoding utilities
...
Add new Makefile targets for jj operations (jjnew, jjpush, jjfetch) to streamline commit workflow.
Introduce k-mer encoding utilities in pkg/obikmer:
- EncodeKmers: converts DNA sequences to encoded k-mers
- ReverseComplement: computes reverse complement of k-mers
- NormalizeKmer: returns canonical form of k-mers
- EncodeNormalizedKmers: encodes sequences with normalized k-mers
Add comprehensive tests for k-mer encoding functions including edge cases, buffer reuse, and performance benchmarks.
Document k-mer index design for large genomes, covering:
- Use cases and objectives
- Volume estimations
- Distance metrics (Jaccard, Sørensen-Dice, Bray-Curtis)
- Indexing options (Bloom filters, sorted sets, MPHF)
- Optimization techniques (k-2-mer indexing)
- MinHash for distance acceleration
- Recommended architecture for presence/absence and counting queries
2026-02-04 14:27:10 +01:00
Eric Coissac
4603d7973e
implementation de obilowmask
2025-11-18 15:30:20 +01:00
Eric Coissac
5a3705b6bb
Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands.
2025-03-25 16:44:46 +01:00
Eric Coissac
ef05d4975f
Upadte the scoring schema of obipairing
2025-02-21 22:41:34 +01:00
Eric Coissac
65ae82622e
correction of several small bugs
2024-09-03 06:08:07 -03:00
Eric Coissac
373464cb06
On development genome skim tools
2024-08-30 11:17:33 +02:00
Eric Coissac
31bfc88eb9
Patch a bug on writing to stdout, and add clearer error on openning data files
2024-08-13 09:45:28 +02:00
Eric Coissac
bdb96dda94
Adds the obimicrosat command
2024-08-05 15:31:20 +02:00
Eric Coissac
67665a6b40
Xprize update
...
Former-commit-id: d38919a897961e4d40da3b844057c3fb94fdb6d7
2024-07-25 18:09:03 -04:00
Eric Coissac
e6b87ecd02
Reduce memory allocation events
...
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
dd9307a4cd
Swich to the system min and max functions and remove the version from obiutils
...
Former-commit-id: 8c4558921b0d0c266b070f16e83813de6e6d4a0f
2024-05-30 08:27:24 +02:00
Eric Coissac
61be8a55b1
Merge obiminion and obiconsensus
...
Former-commit-id: 49d65d671e9fe4454de60c20507c3d8df6e9c51c
2024-05-14 17:53:32 +02:00
Eric Coissac
7fcb0538a3
Cleaning of obiminion
...
Former-commit-id: 75148afd70e5006cc6855bcddc86506b099761a1
2024-05-14 11:45:46 +02:00
Eric Coissac
017030bcce
Add obiminion first version
...
Former-commit-id: aa5ace7bd4d2266333715fca7094d1c3cbbb5e6d
2024-05-14 08:16:12 +02:00
Eric Coissac
3d1d9f32df
Make obiconsensus using the count of the sequences
...
Former-commit-id: 7fc5292aeb225843a86cd85591a5405e35125e3d
2024-04-03 12:58:32 +02:00
8d77cc4133
Change path of the obitools pkg
...
Former-commit-id: 311cbf8df3b990b393c6f4885d62e74564423b65
2023-11-29 12:14:37 +01:00
b556e045e5
Adds option to tune the pairing of the sequences in obipairing and some stats to the results
...
Former-commit-id: a6cf9cb4d4ab20a433a2534fd7d11cd3ca8ebbaa
2023-11-24 12:29:37 +01:00
3f8c0d6a2f
Replace MakeBioSequence call by NewBioSequence call,
...
Implements a new file format guesser
Adds some more API doc
Former-commit-id: 9837bf1c28beca6ddb599b367f93548950ba83c1
2023-08-30 19:59:46 +02:00
8895381c92
Correct a bug when sequence length is shorter than kmer size in debruijngraph
...
Former-commit-id: 56cd670d24065d8774abdcbf685ff4a13e7c4132
2023-06-07 17:43:32 +02:00
e9dcacbf24
Correction of one base deletion in the consensus
...
Former-commit-id: 6aabd8bfdb5263a2285718cb2eca27628c0e717c
2023-04-19 09:12:12 +02:00
21819cd41e
Add some geometry data to the GML edges
...
Former-commit-id: 81022c8b6916819e5351026ebabe9641856cc06d
2023-03-28 11:43:46 +07:00
e8a8f746d3
few bug in the graph algorithm
...
Former-commit-id: b61bdd9f671e2f5e90d32c1beac1ed84efa5e05c
2023-03-27 22:43:45 +07:00
a33e471b39
First attempt for obiconsensus... The graph traversing algorithm is too simple
...
Former-commit-id: 0456e6c7fd55d6d0fcf9856c40386b976b912cba
2023-03-27 19:51:10 +07:00
d5e84ec676
rename goutils to obiutils
...
Former-commit-id: 2147f53db972bba571dfdae30c51b62d3e69cec5
2023-03-24 10:25:12 +07:00
4117cbdd08
Stabilize the obipairing output. whant run twice on the same dataset results are identical
2023-02-05 11:06:31 +01:00
29563aa94e
Rename the Length methods Len to follow GO standart
2022-11-17 11:09:58 +01:00
8aa323dad5
Add a first version of obitag the successor of ecotag
2022-10-26 13:16:56 +02:00
2e7c1834b0
Big change iin the data model, and a first version of obiuniq
2022-02-21 19:00:23 +01:00
b9b9c0f179
Patch module name from oa2 to obitools
2022-01-13 23:43:01 +01:00
f53bf1b804
First commit
2022-01-13 23:27:39 +01:00