Commit Graph

66 Commits

Author SHA1 Message Date
Eric Coissac
1e1f575d1c refactor: replace single batch size with min/max bounds and memory limits
Introduce separate _BatchSize (min) and _BatchSizeMax (max) constants to replace the single _BatchSize variable. Update RebatchBySize to accept both maxBytes and maxCount parameters, flushing when either limit is exceeded. Set default batch size min to 1, max to 2000, and memory limit to 128 MB. Update CLI options and sequence_reader.go accordingly.
2026-03-13 15:07:35 +01:00
Eric Coissac
40769bf827 Add memory-based batching support
Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes:
- Added _BatchMem and related getters/setters in pkg/obidefault
- Implemented RebatchBySize() in pkg/obiter for memory-constrained batching
- Added BioSequence.MemorySize() for conservative memory estimation
- Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G)
- Added obiutils.ParseMemSize/FormatMemSize for unit conversion
- Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards
- Updated sequence_reader.go to apply memory-based rebatching when enabled
2026-03-13 14:54:21 +01:00
Eric Coissac
1a28d5ed64 Add progress bar configuration and conditional display
This commit introduces a new configuration module `obidefault` to manage progress bar settings, allowing users to disable progress bars via a `--no-progressbar` option. It updates various packages to conditionally display progress bars based on this new configuration, improving user experience by providing control over progress bar output. The changes also include improvements to progress bar handling in several packages, ensuring they are only displayed when appropriate (e.g., when stderr is a terminal and stdout is not piped).
2026-02-08 16:14:02 +01:00
Eric Coissac
43b285587e Debug on taxonomy extraction and CSV conversion 2025-07-07 15:29:40 +02:00
Eric Coissac
6cb7a5a352 Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitaxonomy/main.go
	modified:   pkg/obiformats/csvtaxdump_read.go
	modified:   pkg/obiformats/ecopcr_read.go
	modified:   pkg/obiformats/ncbitaxdump_read.go
	modified:   pkg/obiformats/ncbitaxdump_readtar.go
	modified:   pkg/obiformats/newick_write.go
	modified:   pkg/obiformats/options.go
	modified:   pkg/obiformats/taxonomy_read.go
	modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obiiter/extract_taxonomy.go
	modified:   pkg/obioptions/options.go
	modified:   pkg/obioptions/version.go
	new file:   pkg/obiphylo/tree.go
	modified:   pkg/obiseq/biosequenceslice.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonset.go
	modified:   pkg/obitools/obiconvert/sequence_reader.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitaxonomy/obitaxonomy.go
	modified:   pkg/obitools/obitaxonomy/options.go
	deleted:    sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac
c0ecaf90ab Add the --number option to obiannotate 2025-04-22 18:35:51 +02:00
Eric Coissac
5a3705b6bb Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands. 2025-03-25 16:44:46 +01:00
Eric Coissac
3b5d4ba455 patch a bug in obiannotate 2025-03-11 16:35:38 +01:00
Eric Coissac
51b3e83d32 some cleaning 2025-02-24 11:31:49 +01:00
Eric Coissac
0df082da06 Adds possibility to extract a taxonomy from taxonomic path included in sequence files 2025-01-30 11:18:21 +01:00
Eric Coissac
c50a0f409d break the import cyccle 2025-01-27 17:23:07 +01:00
Eric Coissac
7c4042df6b introduce obidefault 2025-01-27 17:12:45 +01:00
Eric Coissac
9acb4a85a8 Refactoring of the default values 2025-01-24 18:09:59 +01:00
Eric Coissac
40fb4e9767 reduce the memory impact of obiuniq. 2024-11-27 13:30:16 +01:00
Eric Coissac
241f2286f2 remove the slice pool management 2024-09-24 16:31:30 +02:00
Eric Coissac
31bfc88eb9 Patch a bug on writing to stdout, and add clearer error on openning data files 2024-08-13 09:45:28 +02:00
Eric Coissac
886b5d9a96 Optimize memory for readers and writers 2024-08-05 10:48:28 +02:00
Eric Coissac
1b1cd41fd3 Add some code refactoring from the blackboard branch 2024-08-02 12:35:46 +02:00
Eric Coissac
4e4fac491f Fisrt versin of the two levels indexing
Former-commit-id: 4d86483bc120e27cb6f5d2c216596d410274fc69
2024-07-12 15:17:48 +02:00
Eric Coissac
e40d0bfbe7 Debug fasta and fastq writer when the first sequence is hudge
Former-commit-id: d208ff838abb7e19e117067f6243298492d60f14
2024-06-26 18:39:42 +02:00
Eric Coissac
54a138196c Patch a bug in fasta and fastq reading
Former-commit-id: bcaa264b4c4a7c67617eb909b199176bf09913db
2024-06-21 14:28:57 +02:00
Eric Coissac
65f5109957 Plenty of small bugs
Former-commit-id: 42c7fab7d65906c80ab4cd32da6867ff21842ea8
2024-06-04 16:49:12 +02:00
Eric Coissac
dd9307a4cd Swich to the system min and max functions and remove the version from obiutils
Former-commit-id: 8c4558921b0d0c266b070f16e83813de6e6d4a0f
2024-05-30 08:27:24 +02:00
Eric Coissac
55ce36f329 Update of obipcr and homogenization of logging
Former-commit-id: 46abf47c19ace5248042c02cf1f81d9f6c12eb10
2024-05-16 15:18:30 +02:00
Eric Coissac
017030bcce Add obiminion first version
Former-commit-id: aa5ace7bd4d2266333715fca7094d1c3cbbb5e6d
2024-05-14 08:16:12 +02:00
Eric Coissac
5b98393a68 Refactor sequence file reading
Former-commit-id: 3dcb96e68da648d72bb585da047e3496427d7851
2024-05-01 00:50:23 +02:00
Eric Coissac
d30d736e48 Simplify the workers code by removing dupplicates
Former-commit-id: 638fcf8d88dd93755d1ec89c8fe92f6ed3f733df
2024-04-30 12:22:22 +02:00
b4afd784dc few small correction
Former-commit-id: 9319387ef5379b66e008233dbd7b6ea60b5d3b1e
2024-03-06 12:52:22 -03:00
7bd073ccd4 First version of obisplit and patch a bug in the new workers API
Former-commit-id: f28af9f104c08d68e29fd866739d8dd58241da63
2024-03-03 11:16:24 -04:00
0f3871d203 Change the API of workers
Former-commit-id: 9b07306edd8cf28266f86f95823948fa99d39ea9
2024-03-02 16:03:46 -04:00
23758b00f6 Patch a bug in the embl reader and adds some doc
Former-commit-id: 9b5f75fb14bcc3043da1647055279987a295d271
2024-01-31 15:43:02 +01:00
eb351a7530 patch bug in worker
Former-commit-id: f83cc62fc7a85f732e871f8866f80f738f494f9e
2023-12-03 22:44:13 +01:00
8d77cc4133 Change path of the obitools pkg
Former-commit-id: 311cbf8df3b990b393c6f4885d62e74564423b65
2023-11-29 12:14:37 +01:00
2e0c1bd801 Correct the number of workers
Former-commit-id: febbccfb853263e0761ecfccb0f09c8c1bf88475
2023-11-22 09:46:30 +01:00
62b57f4ede A go implementation of the fasta reader
Former-commit-id: 603592c4761fb0722e9e0501d78de1bd3ba238fa
2023-09-01 09:30:12 +02:00
2a11adb346 Add some doc and switch to the parallel gzip library
Former-commit-id: 2c1187001f989ba3de5895f516d4c8b54d52a4c4
2023-08-25 14:36:38 +02:00
3f69fa41d6 Patch a bug for multiple amplicon per sequence.
Former-commit-id: b252d2de8e1a85d65c2951aa1958ee038e35741d
2023-03-31 15:10:25 +02:00
84b3e4d097 Reduce memomry inprint of obipcr
Former-commit-id: bd25be2d454f083c729346a828e27f07ad1a216e
2023-03-31 10:53:53 +02:00
988ae79989 Optimize memory allocation of the apat algorithms
Former-commit-id: 5010c5a666b322715b3b81c1078d325e1f647ede
2023-03-28 19:37:05 +07:00
a33e471b39 First attempt for obiconsensus... The graph traversing algorithm is too simple
Former-commit-id: 0456e6c7fd55d6d0fcf9856c40386b976b912cba
2023-03-27 19:51:10 +07:00
d5e84ec676 rename goutils to obiutils
Former-commit-id: 2147f53db972bba571dfdae30c51b62d3e69cec5
2023-03-24 10:25:12 +07:00
5fbe52368c Patch the empty batch bug
Former-commit-id: fcee04b58f2c4a0bf2c27792f991391c0b6ce78e
2023-03-07 20:16:06 +07:00
d88de15cdc Refactoring codes for removing buffer size options. An some other changes...
Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
2023-03-07 11:12:13 +07:00
072b85e155 change the model for representing paired reads and extend its usage to other commands 2023-02-23 23:35:58 +01:00
85349668d0 Add some options to obiannotate 2023-02-16 13:32:27 +01:00
526bf79c7f Patch for some lost of data during sequence writing 2023-02-08 13:14:26 +01:00
2d375df94f move the worker class to the obiseq package 2023-01-22 22:39:13 +01:00
f97f92df72 rename the iterator class 2023-01-22 22:04:17 +01:00
29563aa94e Rename the Length methods Len to follow GO standart 2022-11-17 11:09:58 +01:00
09fc426b67 Refactoring related to iterators 2022-11-16 17:13:03 +01:00