Commit Graph

42 Commits

Author SHA1 Message Date
Eric Coissac
1e1f575d1c refactor: replace single batch size with min/max bounds and memory limits
Introduce separate _BatchSize (min) and _BatchSizeMax (max) constants to replace the single _BatchSize variable. Update RebatchBySize to accept both maxBytes and maxCount parameters, flushing when either limit is exceeded. Set default batch size min to 1, max to 2000, and memory limit to 128 MB. Update CLI options and sequence_reader.go accordingly.
2026-03-13 15:07:35 +01:00
Eric Coissac
40769bf827 Add memory-based batching support
Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes:
- Added _BatchMem and related getters/setters in pkg/obidefault
- Implemented RebatchBySize() in pkg/obiter for memory-constrained batching
- Added BioSequence.MemorySize() for conservative memory estimation
- Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G)
- Added obiutils.ParseMemSize/FormatMemSize for unit conversion
- Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards
- Updated sequence_reader.go to apply memory-based rebatching when enabled
2026-03-13 14:54:21 +01:00
Eric Coissac
56c1f4180c Refactor k-mer index management with subcommands and enhanced metadata support
This commit refactors the k-mer index management tools to use a unified subcommand structure with obik, adds support for per-set metadata and ID management, enhances the k-mer set group builder to support appending to existing groups, and improves command-line option handling with a new global options registration system.

Key changes:
- Introduce obik command with subcommands (index, ls, summary, cp, mv, rm, super, lowmask)
- Add support for per-set metadata and ID management in kmer set groups
- Implement ability to append to existing kmer index groups
- Refactor option parsing to use a global options registration system
- Add new commands for listing, copying, moving, and removing sets
- Enhance low-complexity masking with new options and output formats
- Improve kmer index summary with Jaccard distance matrix support
- Remove deprecated obikindex and obisuperkmer commands
- Update build process to use the new subcommand structure
2026-02-10 06:49:31 +01:00
Eric Coissac
730d448fc3 Allows for only one cpu and it should work 2025-08-06 16:09:25 -04:00
Eric Coissac
6cb7a5a352 Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitaxonomy/main.go
	modified:   pkg/obiformats/csvtaxdump_read.go
	modified:   pkg/obiformats/ecopcr_read.go
	modified:   pkg/obiformats/ncbitaxdump_read.go
	modified:   pkg/obiformats/ncbitaxdump_readtar.go
	modified:   pkg/obiformats/newick_write.go
	modified:   pkg/obiformats/options.go
	modified:   pkg/obiformats/taxonomy_read.go
	modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obiiter/extract_taxonomy.go
	modified:   pkg/obioptions/options.go
	modified:   pkg/obioptions/version.go
	new file:   pkg/obiphylo/tree.go
	modified:   pkg/obiseq/biosequenceslice.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonset.go
	modified:   pkg/obitools/obiconvert/sequence_reader.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitaxonomy/obitaxonomy.go
	modified:   pkg/obitools/obitaxonomy/options.go
	deleted:    sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac
5a3705b6bb Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands. 2025-03-25 16:44:46 +01:00
Eric Coissac
8448783499 Make sequence files recognized as a taxonomy 2025-03-14 14:22:22 +01:00
Eric Coissac
d1c31c54de add a first version of the inline documentation 2025-03-12 14:40:42 +01:00
Eric Coissac
78caabd2fd Add basic test on -h for all the commands 2025-03-08 16:28:06 +01:00
Eric Coissac
b18c9b7ac6 add the --raw-taxid option 2025-03-08 09:40:06 +01:00
Eric Coissac
6a8061cc4f Add managment of the taxonomy alias politic 2025-02-10 14:05:47 +01:00
Eric Coissac
2452aef7a9 patch multiple -Z options 2025-01-29 21:35:28 +01:00
Eric Coissac
7c4042df6b introduce obidefault 2025-01-27 17:12:45 +01:00
Eric Coissac
9acb4a85a8 Refactoring of the default values 2025-01-24 18:09:59 +01:00
Eric Coissac
3137c1f841 Adds the ability to read gzip-tar file for the taxonomy dump 2025-01-24 11:47:59 +01:00
Eric Coissac
ccd3b06532 Merge branch 'master' into taxonomy 2024-12-20 20:06:57 +01:00
Eric Coissac
40fb4e9767 reduce the memory impact of obiuniq. 2024-11-27 13:30:16 +01:00
Eric Coissac
f3d8707c08 Add default taxonomy 2024-11-16 10:01:07 +01:00
Eric Coissac
373464cb06 On development genome skim tools 2024-08-30 11:17:33 +02:00
Eric Coissac
e40d0bfbe7 Debug fasta and fastq writer when the first sequence is hudge
Former-commit-id: d208ff838abb7e19e117067f6243298492d60f14
2024-06-26 18:39:42 +02:00
Eric Coissac
e6b87ecd02 Reduce memory allocation events
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
411124d1b3 Add automatic rules to manage version number
Former-commit-id: f4fcc1927f4169025c1d8cc88c5f3abcdc76037c
2024-06-01 17:26:16 +02:00
Eric Coissac
4487723d14 Adds possibility to provide the ngsfilter configuration as a CSV file
Former-commit-id: f0fd2cb1a7b149ae2a330edc5087b21be2c4585b
2024-05-31 11:08:20 +02:00
Eric Coissac
55ce36f329 Update of obipcr and homogenization of logging
Former-commit-id: 46abf47c19ace5248042c02cf1f81d9f6c12eb10
2024-05-16 15:18:30 +02:00
Eric Coissac
5b98393a68 Refactor sequence file reading
Former-commit-id: 3dcb96e68da648d72bb585da047e3496427d7851
2024-05-01 00:50:23 +02:00
Eric Coissac
33d4d63acd Adds a warning on max-cpu = 1
Former-commit-id: ed02328d7d0e36b7c6b77ea776c0bde7c0eb64f3
2024-04-30 13:27:19 +02:00
Eric Coissac
cb35612a6c new --version option and qualities method in obiscript
Former-commit-id: 7b0ce2901785d5c7494dec3a7a95d1fc5dc4a52b
2024-04-13 12:40:43 +02:00
61c30f9b6a Patch rev complement and first implementation of --auto in obicsv
Former-commit-id: f3020e81283b1073c4d1c2d2ff0887e3998e6764
2023-11-07 09:37:07 +02:00
d46f6b06c5 several small changes
Former-commit-id: c1cdb95885e44fd6ee7d1c963860d7ab41230c96
2023-06-07 17:50:10 +02:00
d70bb45f3f Small change in parallelisation tuning
Former-commit-id: 3fe2495b7fd86a0ba47dd87907323a457bae481a
2023-04-05 14:15:47 +02:00
84b3e4d097 Reduce memomry inprint of obipcr
Former-commit-id: bd25be2d454f083c729346a828e27f07ad1a216e
2023-03-31 10:53:53 +02:00
e863dc456a Add an option --pprof
Former-commit-id: 3ca1280e8daddbf1075e3189f9851211ce8882ae
2023-03-28 20:07:26 +07:00
7ed567fbad Make the --help or -h options working when mandatory options are declared
Former-commit-id: db502ff81dcf20449d126978fcebf890edb814ae
2023-03-21 22:01:20 +07:00
d88de15cdc Refactoring codes for removing buffer size options. An some other changes...
Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
2023-03-07 11:12:13 +07:00
072b85e155 change the model for representing paired reads and extend its usage to other commands 2023-02-23 23:35:58 +01:00
abcf02e488 Start to use leveled log 2022-02-24 12:14:52 +01:00
38e4655f38 Correct for a strange bug... 2022-02-07 11:51:35 +01:00
30d80db02d All an option to limit number of CPU 2022-02-06 18:52:53 +01:00
8dbda68746 Adds the command obimultiplex 2022-02-01 17:31:28 +01:00
e8fff6477b Work on iterators and recycling of biosequences 2022-01-14 23:11:36 +01:00
ff40222902 Code reefactoring 2022-01-14 16:10:19 +01:00
f53bf1b804 First commit 2022-01-13 23:27:39 +01:00