Eric Coissac
a2b26712b2
refactor: replace fixed batch size with dynamic flushing based on count and memory
...
Replace the old fixed batch-size mechanism in Distribute with a dynamic strategy that flushes batches when either BatchSizeMax() sequences or BatchMem() bytes are reached per key. This aligns with the RebatchBySize strategy and removes the optional sizes parameter. Also update related code: simplify Lua wrapper to accept optional capacity, and fix buffer growth logic in worker.go using slices.Grow correctly. Remove unused BatchSize() usage from obidistribute.
2026-03-16 22:06:44 +01:00
Eric Coissac
40769bf827
Add memory-based batching support
...
Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes:
- Added _BatchMem and related getters/setters in pkg/obidefault
- Implemented RebatchBySize() in pkg/obiter for memory-constrained batching
- Added BioSequence.MemorySize() for conservative memory estimation
- Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G)
- Added obiutils.ParseMemSize/FormatMemSize for unit conversion
- Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards
- Updated sequence_reader.go to apply memory-based rebatching when enabled
2026-03-13 14:54:21 +01:00
Eric Coissac
6ee8750635
Replace SplitInTwo with LeftSplitInTwo/RightSplitInTwo for precise splitting
...
Replace SplitInTwo calls with LeftSplitInTwo or RightSplitInTwo depending on the intended split direction. In fastseq_json_header.go, extract rank from suffix without splitting; in biosequenceslice.go and taxid.go, use LeftSplitInTwo to split from the left; add RightSplitInTwo utility function for splitting from the right.
2026-03-12 18:41:28 +01:00
Eric Coissac
3d2e205722
Refactor rope scanner and add FASTQ rope parser
...
This commit refactors the rope scanner implementation by renaming gbRopeScanner to ropeScanner and extracting the common functionality into a new file. It also introduces a new FastqChunkParserRope function that parses FASTQ chunks directly from a rope without Pack(), enabling more efficient memory usage. The existing parsers are updated to use the new rope-based parser when available. The BioSequence type is enhanced with a TakeQualities method for more efficient quality data handling.
2026-03-10 16:47:03 +01:00
Eric Coissac
1342c83db6
Use NewBioSequenceOwning to avoid unnecessary sequence copying
...
Replace NewBioSequence with NewBioSequenceOwning in genbank_read.go to take ownership of sequence slices without copying, improving performance. Update biosequence.go to add the new TakeSequence method and NewBioSequenceOwning constructor.
2026-03-10 15:51:35 +01:00
Eric Coissac
ac0d3f3fe4
Update obiuniq for very large dataset
2025-12-18 14:11:11 +01:00
Eric Coissac
86e60aedd0
obicsv bug with stat on value map fields
2025-11-21 14:03:31 +01:00
Eric Coissac
4603d7973e
implementation de obilowmask
2025-11-18 15:30:20 +01:00
Eric Coissac
d17a9520b9
work on obiclean chimera detection
2025-10-20 17:29:47 +02:00
Eric Coissac
add9d89ccc
Patch the Min and Max values of the expression language
2025-06-19 16:43:26 +02:00
Eric Coissac
9965370d85
Manage a lock on StatsOnValues
2025-06-17 16:46:11 +02:00
Eric Coissac
8a2bb1fe82
Changes to be committed:
...
modified: pkg/obioptions/version.go
modified: pkg/obiseq/merge.go
2025-06-17 12:11:35 +02:00
Eric Coissac
efc3f3af29
Patch a concurrent access problem
2025-06-17 12:05:42 +02:00
Eric Coissac
6cb7a5a352
Changes to be committed:
...
modified: cmd/obitools/obitag/main.go
modified: cmd/obitools/obitaxonomy/main.go
modified: pkg/obiformats/csvtaxdump_read.go
modified: pkg/obiformats/ecopcr_read.go
modified: pkg/obiformats/ncbitaxdump_read.go
modified: pkg/obiformats/ncbitaxdump_readtar.go
modified: pkg/obiformats/newick_write.go
modified: pkg/obiformats/options.go
modified: pkg/obiformats/taxonomy_read.go
modified: pkg/obiformats/universal_read.go
modified: pkg/obiiter/extract_taxonomy.go
modified: pkg/obioptions/options.go
modified: pkg/obioptions/version.go
new file: pkg/obiphylo/tree.go
modified: pkg/obiseq/biosequenceslice.go
modified: pkg/obiseq/taxonomy_methods.go
modified: pkg/obitax/taxonomy.go
modified: pkg/obitax/taxonset.go
modified: pkg/obitools/obiconvert/sequence_reader.go
modified: pkg/obitools/obitag/obitag.go
modified: pkg/obitools/obitaxonomy/obitaxonomy.go
modified: pkg/obitools/obitaxonomy/options.go
deleted: sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac
f9324dd8f4
add min and max to the obitools expression language
2025-05-13 16:03:03 +02:00
Eric Coissac
f1b9ac4a13
Update the expression language
2025-05-07 20:45:05 +02:00
Eric Coissac
c0ecaf90ab
Add the --number option to obiannotate
2025-04-22 18:35:51 +02:00
Eric Coissac
a57cfda675
Make the replace function of the eval language accepting regex
2025-04-10 15:17:15 +02:00
Eric Coissac
5a3705b6bb
Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands.
2025-03-25 16:44:46 +01:00
Eric Coissac
f21f51ae62
Correct the logic of --update-taxid and --fail-on-taxonomy
2025-03-11 16:56:02 +01:00
Eric Coissac
3b5d4ba455
patch a bug in obiannotate
2025-03-11 16:35:38 +01:00
Eric Coissac
286e27d6ba
patch the scienctific_name tag name to "scientific_name"
2025-03-05 14:22:12 +01:00
Eric Coissac
51b3e83d32
some cleaning
2025-02-24 11:31:49 +01:00
Eric Coissac
8671285d02
add the --min-sample-count option to obiclean.
2025-02-24 08:48:31 +01:00
Eric Coissac
4774438644
Changes to be committed:
...
modified: pkg/obiformats/universal_read.go
modified: pkg/obioptions/version.go
modified: pkg/obiseq/taxonomy_methods.go
2025-02-12 08:40:38 +01:00
Eric Coissac
6a8061cc4f
Add managment of the taxonomy alias politic
2025-02-10 14:05:47 +01:00
Eric Coissac
0df082da06
Adds possibility to extract a taxonomy from taxonomic path included in sequence files
2025-01-30 11:18:21 +01:00
Eric Coissac
9acb4a85a8
Refactoring of the default values
2025-01-24 18:09:59 +01:00
Eric Coissac
ccd3b06532
Merge branch 'master' into taxonomy
2024-12-20 20:06:57 +01:00
Eric Coissac
5d0f996625
Patch a small bug on json write
2024-12-20 19:42:03 +01:00
Eric Coissac
795df34d1a
Changes to be committed:
...
modified: cmd/obitools/obitag/main.go
modified: cmd/obitools/obitag2/main.go
modified: go.mod
modified: go.sum
modified: pkg/obiformats/ncbitaxdump/read.go
modified: pkg/obioptions/version.go
modified: pkg/obiseq/attributes.go
modified: pkg/obiseq/taxonomy_lca.go
modified: pkg/obiseq/taxonomy_methods.go
modified: pkg/obiseq/taxonomy_predicate.go
modified: pkg/obitax/inner.go
modified: pkg/obitax/lca.go
new file: pkg/obitax/taxid.go
modified: pkg/obitax/taxon.go
modified: pkg/obitax/taxonomy.go
modified: pkg/obitax/taxonslice.go
modified: pkg/obitools/obicleandb/obicleandb.go
modified: pkg/obitools/obigrep/options.go
modified: pkg/obitools/obilandmark/obilandmark.go
modified: pkg/obitools/obilandmark/options.go
modified: pkg/obitools/obirefidx/famlilyindexing.go
modified: pkg/obitools/obirefidx/geomindexing.go
modified: pkg/obitools/obirefidx/obirefidx.go
modified: pkg/obitools/obirefidx/options.go
modified: pkg/obitools/obitag/obigeomtag.go
modified: pkg/obitools/obitag/obitag.go
modified: pkg/obitools/obitag/options.go
modified: pkg/obiutils/strings.go
2024-12-19 13:36:59 +01:00
Eric Coissac
00b0edc15a
refactoring of the file chunck writing
2024-11-29 18:15:03 +01:00
Eric Coissac
d29a56dcbf
Changes to be committed:
...
modified: Release-notes.md
modified: pkg/obialign/pairedendalign.go
modified: pkg/obilua/obiseq.go
modified: pkg/obioptions/version.go
modified: pkg/obiseq/biosequence.go
modified: pkg/obitools/obipairing/pairing.go
2024-11-27 09:56:22 +01:00
Eric Coissac
7884a74f9c
Patch a bug in obitagpcr
2024-11-18 21:10:47 +01:00
Eric Coissac
03f4e88a17
Fisrt functional version
2024-11-14 19:10:23 +01:00
Eric Coissac
241f2286f2
remove the slice pool management
2024-09-24 16:31:30 +02:00
Eric Coissac
05bf2bfd6c
Add option related to agrep match on obigrep and obiannotate
2024-09-09 16:52:13 +02:00
Eric Coissac
65ae82622e
correction of several small bugs
2024-09-03 06:08:07 -03:00
Eric Coissac
bdb96dda94
Adds the obimicrosat command
2024-08-05 15:31:20 +02:00
Eric Coissac
67665a6b40
Xprize update
...
Former-commit-id: d38919a897961e4d40da3b844057c3fb94fdb6d7
2024-07-25 18:09:03 -04:00
Eric Coissac
4e4fac491f
Fisrt versin of the two levels indexing
...
Former-commit-id: 4d86483bc120e27cb6f5d2c216596d410274fc69
2024-07-12 15:17:48 +02:00
Eric Coissac
c7ed47e110
first version of obidemerge, obijoin and a new filter for obicleandb but to be finnished
...
Former-commit-id: 8a1ed26e5548c30db75644c294d478ec4d753f19
2024-07-10 15:21:42 +02:00
Eric Coissac
bd855c4965
Adds CSV as an input format
...
Former-commit-id: a365bb6947064adc2709d66df05fa54c6fe47fad
2024-07-03 21:04:27 +02:00
Eric Coissac
fd663357b5
First version of obicleandb...
...
Former-commit-id: e60b61d015abbf029a555b51de99b4252c50ab59
2024-07-01 17:12:42 +02:00
Eric Coissac
93f9dcb95f
Reducing memory allocation events
...
Former-commit-id: c94e79ba116464504580fc397270ead154063971
2024-06-22 22:32:31 +02:00
Eric Coissac
e6b87ecd02
Reduce memory allocation events
...
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
54a138196c
Patch a bug in fasta and fastq reading
...
Former-commit-id: bcaa264b4c4a7c67617eb909b199176bf09913db
2024-06-21 14:28:57 +02:00
Eric Coissac
818ce87bab
Patch some bugs in writing files
...
Former-commit-id: 612868a281dc0ecf4e6c5776973735e5c71bd517
2024-06-19 13:15:30 +02:00
Eric Coissac
65f5109957
Plenty of small bugs
...
Former-commit-id: 42c7fab7d65906c80ab4cd32da6867ff21842ea8
2024-06-04 16:49:12 +02:00
Eric Coissac
aa42df326a
Correct a bug in the fastq reader affecting the quality of the last record of each chunk
...
Former-commit-id: b842d60af9c2f1f971946d99999d13cfc15793b3
2024-06-04 11:57:16 +02:00