obitools4

mirror of https://github.com/metabarcoding/obitools4.git synced 2026-06-24 09:41:00 +00:00

Author	SHA1	Message	Date
Eric Coissac	42910c7db9	🔧 Rename mismatch fields to error in pcrtag.go - Renamed `obimultiplex_forward_mismatches` to "error" for consistency - Similarly renamed `obimultiplex_reverse_mismatches` to "error" - Applied changes in both annotation dictionaries (aanot, banot)	2026-04-29 15:29:25 +02:00
Eric Coissac	8b4cf677c6	[obitools] Add validation for paired files and config template support - Enforce requirement of both forward (-F) and reverse files in obipairing/main.go - Add config template support to obtagpcr via CLIAskConfigTemplate() - Remove redundant Required() constraints in options.go - Introduce new helper CLIHasPairedFiles()	2026-04-29 15:01:37 +02:00
Eric Coissac	434d2e5930	+feat: add support for map_summaries aggregation in obisummary - Implement merging logic of `map summaries` across datasets - Ensure proper initialization and population in multi-threaded context - Add `map_summaries` to final output dictionary when non-empty	2026-04-16 14:58:18 +02:00
Eric Coissac	7cb02ded69	Refactor: Extract utility function for string reversal - Introduce `inverser_chaine()` helper to centralize logic - Replace inline reverse implementations across modules	2026-04-16 13:42:51 +02:00
Eric Coissac	a2b26712b2	refactor: replace fixed batch size with dynamic flushing based on count and memory Replace the old fixed batch-size mechanism in Distribute with a dynamic strategy that flushes batches when either BatchSizeMax() sequences or BatchMem() bytes are reached per key. This aligns with the RebatchBySize strategy and removes the optional sizes parameter. Also update related code: simplify Lua wrapper to accept optional capacity, and fix buffer growth logic in worker.go using slices.Grow correctly. Remove unused BatchSize() usage from obidistribute.	2026-03-16 22:06:44 +01:00
Eric Coissac	c188580aac	Replace Rebatch with RebatchBySize using default batch parameters Replace calls to Rebatch(size) with RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax()) in batchiterator.go, fragment.go, and obirefidx.go to ensure consistent use of default memory and size limits for batch rebatching.	2026-03-13 15:16:33 +01:00
Eric Coissac	1e1f575d1c	refactor: replace single batch size with min/max bounds and memory limits Introduce separate _BatchSize (min) and _BatchSizeMax (max) constants to replace the single _BatchSize variable. Update RebatchBySize to accept both maxBytes and maxCount parameters, flushing when either limit is exceeded. Set default batch size min to 1, max to 2000, and memory limit to 128 MB. Update CLI options and sequence_reader.go accordingly.	2026-03-13 15:07:35 +01:00
Eric Coissac	40769bf827	Add memory-based batching support Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes: - Added _BatchMem and related getters/setters in pkg/obidefault - Implemented RebatchBySize() in pkg/obiter for memory-constrained batching - Added BioSequence.MemorySize() for conservative memory estimation - Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G) - Added obiutils.ParseMemSize/FormatMemSize for unit conversion - Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards - Updated sequence_reader.go to apply memory-based rebatching when enabled	2026-03-13 14:54:21 +01:00
Eric Coissac	8dd32dc1bf	Fix CompressStream call to use compressed variable Replace hardcoded boolean with the `compressed` variable in CompressStream call to ensure correct compression behavior.	2026-03-12 18:48:22 +01:00
Eric Coissac	1ce5da9bee	Support new sequence file formats and improve error handling Add support for .gbff and .gbff.gz file extensions in sequence reader. Update the logic to return an error instead of using NilIBioSequence when no sequence files are found, improving the error handling and user feedback.	2026-02-11 06:31:10 +01:00
Eric Coissac	ac41dd8a22	Refactor k-mer matching pipeline with improved concurrency and memory management Refactor k-mer matching to use a pipeline architecture with improved concurrency and memory management: - Replace sort.Slice with slices.SortFunc and cmp.Compare for better performance - Introduce PreparedQueries struct to encapsulate query buckets with metadata - Implement MergeQueries function to merge query buckets from multiple batches - Rewrite MatchBatch to use pre-allocated results and mutexes instead of map-based accumulation - Add seek optimization in matchPartition to reduce linear scanning - Refactor match command to use a multi-stage pipeline with proper batching and merging - Add index directory option for match command - Improve parallel processing of sequence batches This refactoring improves performance by reducing memory allocations, optimizing k-mer lookup, and implementing a more efficient pipeline for large-scale k-mer matching operations.	2026-02-10 22:10:36 +01:00
Eric Coissac	bebbbbfe7d	Add entropy-based filtering for k-mers This commit introduces entropy-based filtering for k-mers to remove low-complexity sequences. It adds: - New KmerEntropy and KmerEntropyFilter functions in pkg/obikmer/entropy.go for computing and filtering k-mer entropy - Integration of entropy filtering in the k-mer set builder (pkg/obikmer/kmer_set_builder.go) - A new 'filter' command in obik tool (pkg/obitools/obik/filter.go) to apply entropy filtering on existing indices - CLI options for configuring entropy filtering during index building and filtering The entropy filter helps improve the quality of k-mer sets by removing repetitive sequences that may interfere with downstream analyses.	2026-02-10 18:20:35 +01:00
Eric Coissac	c6e04265f1	Add sparse index support for KDI files with fast seeking This commit introduces sparse index support for KDI files to enable fast random access during k-mer matching. It adds a new .kdx index file format and updates the KDI reader and writer to handle index creation and seeking. The changes include: - New KdxIndex struct and related functions for loading, searching, and writing .kdx files - Modified KdiReader to support seeking with the new index - Updated KdiWriter to create .kdx index files during writing - Enhanced KmerSetGroup.Contains to use the new index for faster lookups - Added a new 'match' command to annotate sequences with k-mer match positions The index is created automatically during KDI file creation and allows for O(log N / stride) binary search followed by at most stride linear scan steps, significantly improving performance for large datasets.	2026-02-10 13:24:24 +01:00
Eric Coissac	9babcc0fae	Refactor lowmask options and shared kmer options Refactor lowmask options to use shared kmer options and CLI getters This commit refactors the lowmask subcommand to use shared kmer options and CLI getters instead of local variables. It also moves the kmer size and minimizer size options to a shared location and adds new CLI getters for the lowmask options. - Move kmer size and minimizer size options to shared location - Add CLI getters for lowmask options - Refactor lowmask to use CLI getters - Remove unused strings import - Add MaskingMode type and related functions	2026-02-10 09:52:38 +01:00
Eric Coissac	e775f7e256	Add option to keep shorter fragments in lowmask Add a new boolean option 'keep-shorter' to preserve fragments shorter than kmer-size during split/extract mode. This change introduces a new flag _lowmaskKeepShorter that controls whether fragments shorter than the kmer size should be kept during split/extract operations. The implementation: 1. Adds the new boolean variable _lowmaskKeepShorter 2. Registers the command-line option "keep-shorter" 3. Updates the lowMaskWorker function signature to accept the keepShorter parameter 4. Modifies the fragment selection logic to check the keepShorter flag 5. Updates the worker creation to pass the global flag value This allows users to control the behavior when dealing with short sequences in split/extract modes, providing more flexibility in low-complexity masking.	2026-02-10 09:36:42 +01:00
Eric Coissac	f2937af1ad	Add max frequency filtering and top-kmer saving capabilities This commit introduces max frequency filtering to limit k-mer occurrences and adds functionality to save the N most frequent k-mers per set to CSV files. It also includes the ability to output k-mer frequency spectra as CSV and updates the CLI options accordingly.	2026-02-10 09:27:04 +01:00
Eric Coissac	56c1f4180c	Refactor k-mer index management with subcommands and enhanced metadata support This commit refactors the k-mer index management tools to use a unified subcommand structure with obik, adds support for per-set metadata and ID management, enhances the k-mer set group builder to support appending to existing groups, and improves command-line option handling with a new global options registration system. Key changes: - Introduce obik command with subcommands (index, ls, summary, cp, mv, rm, super, lowmask) - Add support for per-set metadata and ID management in kmer set groups - Implement ability to append to existing kmer index groups - Refactor option parsing to use a global options registration system - Add new commands for listing, copying, moving, and removing sets - Enhance low-complexity masking with new options and output formats - Improve kmer index summary with Jaccard distance matrix support - Remove deprecated obikindex and obisuperkmer commands - Update build process to use the new subcommand structure	2026-02-10 06:49:31 +01:00
Eric Coissac	f78543ee75	Refactor k-mer index building to use disk-based KmerSetGroupBuilder Refactor k-mer index building to use the new disk-based KmerSetGroupBuilder instead of the old KmerSet and FrequencyFilter approaches. This change introduces a more efficient and scalable approach to building k-mer indices by using partitioned disk storage with streaming operations. - Replace BuildKmerIndex and BuildFrequencyFilterIndex with KmerSetGroupBuilder - Add support for frequency filtering via WithMinFrequency option - Remove deprecated k-mer set persistence methods - Update CLI to use new builder approach - Add new disk-based k-mer operations (union, intersect, difference, quorum) - Introduce KDI (K-mer Delta Index) file format for efficient storage - Add K-way merge operations for combining sorted k-mer streams - Update documentation and examples to reflect new API This refactoring provides better memory usage, faster operations on large datasets, and more flexible k-mer set operations.	2026-02-10 06:49:31 +01:00
Eric Coissac	a016ad5b8a	Refactor kmer index to disk-based partitioning with minimizer Refactor kmer index package to use disk-based partitioning with minimizer - Replace roaring64 bitmaps with disk-based kmer index - Implement partitioned kmer sets with delta-varint encoding - Add support for frequency filtering during construction - Introduce new builder pattern for index construction - Add streaming operations for set operations (union, intersect, etc.) - Add support for super-kmer encoding during construction - Update command line tool to use new index format - Remove dependency on roaring bitmap library This change introduces a new architecture for kmer indexing that is more memory efficient and scalable for large datasets.	2026-02-09 17:52:37 +01:00
Eric Coissac	99a8e69d10	Optimize low-complexity masking algorithm This commit optimizes the low-complexity masking algorithm by: 1. Precomputing logarithm values and normalization tables to avoid repeated calculations 2. Replacing the MinMultiset-based sliding minimum with a more efficient deque-based implementation 3. Improving entropy calculation by using precomputed n*log(n) values 4. Simplifying the circular normalization process with precomputed tables 5. Removing unused imports and log statements The changes significantly improve performance while maintaining the same masking behavior.	2026-02-09 09:05:46 +01:00
Eric Coissac	1a28d5ed64	Add progress bar configuration and conditional display This commit introduces a new configuration module `obidefault` to manage progress bar settings, allowing users to disable progress bars via a `--no-progressbar` option. It updates various packages to conditionally display progress bars based on this new configuration, improving user experience by providing control over progress bar output. The changes also include improvements to progress bar handling in several packages, ensuring they are only displayed when appropriate (e.g., when stderr is a terminal and stdout is not piped).	2026-02-08 16:14:02 +01:00
Eric Coissac	7c12b1ee83	Disable progress bar when output is piped Modify CLIProgressBar function to check if stdout is a named pipe and disable the progress bar accordingly. This prevents the progress bar from being displayed when the output is redirected or piped to another command.	2026-02-08 14:48:13 +01:00
Eric Coissac	7a979ba77f	Add obisuperkmer command implementation and tests This commit adds the implementation of the obisuperkmer command, including: - The main command in cmd/obitools/obisuperkmer/ - The package implementation in pkg/obitools/obisuperkmer/ - Automated tests in obitests/obitools/obisuperkmer/ - Documentation for the implementation and tests The obisuperkmer command extracts super k-mers from DNA sequences, following the standard OBITools architecture. It includes proper CLI option handling, validation of parameters, and integration with the OBITools pipeline system. Tests cover basic functionality, parameter validation, output format, metadata preservation, and file I/O operations.	2026-02-07 13:54:02 +01:00
Eric Coissac	16f72e6305	refactoring of obikmer	2026-02-05 16:05:48 +01:00
Eric Coissac	371e702423	obiannotate --cut bug	2025-12-18 14:11:11 +01:00
Eric Coissac	547135c747	End of obilowmask	2025-12-03 11:49:07 +01:00
Eric Coissac	e65b2a5efe	obimatrix bugs	2025-11-21 13:24:06 +01:00
Eric Coissac	ccc827afd3	finalise obilowmask	2025-11-18 15:33:08 +01:00
Eric Coissac	cef29005a5	debug url reading	2025-11-18 15:30:20 +01:00
Eric Coissac	4603d7973e	implementation de obilowmask	2025-11-18 15:30:20 +01:00
Eric Coissac	07cdd6f758	debug obimultiplex bug option obimultiplex	2025-11-06 15:43:13 +01:00
Eric Coissac	0844dcc607	bug obimatrix	2025-10-28 13:57:31 +01:00
Eric Coissac	d17a9520b9	work on obiclean chimera detection	2025-10-20 17:29:47 +02:00
Eric Coissac	29bf4ce871	add a feature to obimatrix adding obicsv option to obimatrix	2025-10-20 16:34:58 +02:00
Eric Coissac	82b6bb1ab6	correct a bug in func (worker SeqWorker) ChainWorkers(next SeqWorker) SeqWorker	2025-08-11 15:09:49 +02:00
Eric Coissac	412b54822c	Patch a bug in obliclean for d>1 leading to some instability in the result	2025-08-07 17:01:38 -04:00
Eric Coissac	04f3af3e60	some renaming of functions	2025-08-06 15:54:50 -04:00
Eric Coissac	f239e8da92	Rename ISequenceChunk	2025-08-05 08:49:45 -04:00
Eric Coissac	ed28d3fb5b	Adds a --u-to-t option	2025-07-07 15:35:26 +02:00
Eric Coissac	235a7e202a	Update obisummary to account new obiseq.StatsOnValues type	2025-06-19 17:21:30 +02:00
Eric Coissac	27fa984a63	Patch obimatrix accoring to the new type obiseq.StatsOnValues	2025-06-19 16:51:53 +02:00
Eric Coissac	9965370d85	Manage a lock on StatsOnValues	2025-06-17 16:46:11 +02:00
Eric Coissac	d31e677304	Patch a bug in obitag	2025-06-04 14:47:28 +02:00
Eric Coissac	6cb7a5a352	Changes to be committed: modified: cmd/obitools/obitag/main.go modified: cmd/obitools/obitaxonomy/main.go modified: pkg/obiformats/csvtaxdump_read.go modified: pkg/obiformats/ecopcr_read.go modified: pkg/obiformats/ncbitaxdump_read.go modified: pkg/obiformats/ncbitaxdump_readtar.go modified: pkg/obiformats/newick_write.go modified: pkg/obiformats/options.go modified: pkg/obiformats/taxonomy_read.go modified: pkg/obiformats/universal_read.go modified: pkg/obiiter/extract_taxonomy.go modified: pkg/obioptions/options.go modified: pkg/obioptions/version.go new file: pkg/obiphylo/tree.go modified: pkg/obiseq/biosequenceslice.go modified: pkg/obiseq/taxonomy_methods.go modified: pkg/obitax/taxonomy.go modified: pkg/obitax/taxonset.go modified: pkg/obitools/obiconvert/sequence_reader.go modified: pkg/obitools/obitag/obitag.go modified: pkg/obitools/obitaxonomy/obitaxonomy.go modified: pkg/obitools/obitaxonomy/options.go deleted: sample/.DS_Store	2025-06-04 09:48:10 +02:00
Eric Coissac	c0ecaf90ab	Add the --number option to obiannotate	2025-04-22 18:35:51 +02:00
Eric Coissac	7542e33010	Several bugs dicoverd during the doc writing	2025-04-04 16:59:27 +02:00
Eric Coissac	03b5ce9397	Patch a bug in obitag when some reference sequences have taxid absent from the taxonomy	2025-03-27 16:45:02 +01:00
Eric Coissac	fd80249b85	Patch a bug in obitag when a taxon from the reference library is unknown in the taxonomy	2025-03-27 14:28:15 +01:00
Eric Coissac	5a3705b6bb	Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands.	2025-03-25 16:44:46 +01:00
Eric Coissac	2ab6f67d58	Add a progress bar to chimera detection	2025-03-25 08:37:27 +01:00

1 2 3 4 5 ...

282 Commits