Commit Graph

134 Commits

Author SHA1 Message Date
Eric Coissac
40769bf827 Add memory-based batching support
Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes:
- Added _BatchMem and related getters/setters in pkg/obidefault
- Implemented RebatchBySize() in pkg/obiter for memory-constrained batching
- Added BioSequence.MemorySize() for conservative memory estimation
- Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G)
- Added obiutils.ParseMemSize/FormatMemSize for unit conversion
- Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards
- Updated sequence_reader.go to apply memory-based rebatching when enabled
2026-03-13 14:54:21 +01:00
Eric Coissac
6ee8750635 Replace SplitInTwo with LeftSplitInTwo/RightSplitInTwo for precise splitting
Replace SplitInTwo calls with LeftSplitInTwo or RightSplitInTwo depending on the intended split direction. In fastseq_json_header.go, extract rank from suffix without splitting; in biosequenceslice.go and taxid.go, use LeftSplitInTwo to split from the left; add RightSplitInTwo utility function for splitting from the right.
2026-03-12 18:41:28 +01:00
Eric Coissac
3d2e205722 Refactor rope scanner and add FASTQ rope parser
This commit refactors the rope scanner implementation by renaming gbRopeScanner to ropeScanner and extracting the common functionality into a new file. It also introduces a new FastqChunkParserRope function that parses FASTQ chunks directly from a rope without Pack(), enabling more efficient memory usage. The existing parsers are updated to use the new rope-based parser when available. The BioSequence type is enhanced with a TakeQualities method for more efficient quality data handling.
2026-03-10 16:47:03 +01:00
Eric Coissac
1342c83db6 Use NewBioSequenceOwning to avoid unnecessary sequence copying
Replace NewBioSequence with NewBioSequenceOwning in genbank_read.go to take ownership of sequence slices without copying, improving performance. Update biosequence.go to add the new TakeSequence method and NewBioSequenceOwning constructor.
2026-03-10 15:51:35 +01:00
Eric Coissac
ac0d3f3fe4 Update obiuniq for very large dataset 2025-12-18 14:11:11 +01:00
Eric Coissac
86e60aedd0 obicsv bug with stat on value map fields 2025-11-21 14:03:31 +01:00
Eric Coissac
4603d7973e implementation de obilowmask 2025-11-18 15:30:20 +01:00
Eric Coissac
d17a9520b9 work on obiclean chimera detection 2025-10-20 17:29:47 +02:00
Eric Coissac
add9d89ccc Patch the Min and Max values of the expression language 2025-06-19 16:43:26 +02:00
Eric Coissac
9965370d85 Manage a lock on StatsOnValues 2025-06-17 16:46:11 +02:00
Eric Coissac
8a2bb1fe82 Changes to be committed:
modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/merge.go
2025-06-17 12:11:35 +02:00
Eric Coissac
efc3f3af29 Patch a concurrent access problem 2025-06-17 12:05:42 +02:00
Eric Coissac
6cb7a5a352 Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitaxonomy/main.go
	modified:   pkg/obiformats/csvtaxdump_read.go
	modified:   pkg/obiformats/ecopcr_read.go
	modified:   pkg/obiformats/ncbitaxdump_read.go
	modified:   pkg/obiformats/ncbitaxdump_readtar.go
	modified:   pkg/obiformats/newick_write.go
	modified:   pkg/obiformats/options.go
	modified:   pkg/obiformats/taxonomy_read.go
	modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obiiter/extract_taxonomy.go
	modified:   pkg/obioptions/options.go
	modified:   pkg/obioptions/version.go
	new file:   pkg/obiphylo/tree.go
	modified:   pkg/obiseq/biosequenceslice.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonset.go
	modified:   pkg/obitools/obiconvert/sequence_reader.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitaxonomy/obitaxonomy.go
	modified:   pkg/obitools/obitaxonomy/options.go
	deleted:    sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac
f9324dd8f4 add min and max to the obitools expression language 2025-05-13 16:03:03 +02:00
Eric Coissac
f1b9ac4a13 Update the expression language 2025-05-07 20:45:05 +02:00
Eric Coissac
c0ecaf90ab Add the --number option to obiannotate 2025-04-22 18:35:51 +02:00
Eric Coissac
a57cfda675 Make the replace function of the eval language accepting regex 2025-04-10 15:17:15 +02:00
Eric Coissac
5a3705b6bb Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands. 2025-03-25 16:44:46 +01:00
Eric Coissac
f21f51ae62 Correct the logic of --update-taxid and --fail-on-taxonomy 2025-03-11 16:56:02 +01:00
Eric Coissac
3b5d4ba455 patch a bug in obiannotate 2025-03-11 16:35:38 +01:00
Eric Coissac
286e27d6ba patch the scienctific_name tag name to "scientific_name" 2025-03-05 14:22:12 +01:00
Eric Coissac
51b3e83d32 some cleaning 2025-02-24 11:31:49 +01:00
Eric Coissac
8671285d02 add the --min-sample-count option to obiclean. 2025-02-24 08:48:31 +01:00
Eric Coissac
4774438644 Changes to be committed:
modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/taxonomy_methods.go
2025-02-12 08:40:38 +01:00
Eric Coissac
6a8061cc4f Add managment of the taxonomy alias politic 2025-02-10 14:05:47 +01:00
Eric Coissac
0df082da06 Adds possibility to extract a taxonomy from taxonomic path included in sequence files 2025-01-30 11:18:21 +01:00
Eric Coissac
9acb4a85a8 Refactoring of the default values 2025-01-24 18:09:59 +01:00
Eric Coissac
ccd3b06532 Merge branch 'master' into taxonomy 2024-12-20 20:06:57 +01:00
Eric Coissac
5d0f996625 Patch a small bug on json write 2024-12-20 19:42:03 +01:00
Eric Coissac
795df34d1a Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitag2/main.go
	modified:   go.mod
	modified:   go.sum
	modified:   pkg/obiformats/ncbitaxdump/read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/attributes.go
	modified:   pkg/obiseq/taxonomy_lca.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obiseq/taxonomy_predicate.go
	modified:   pkg/obitax/inner.go
	modified:   pkg/obitax/lca.go
	new file:   pkg/obitax/taxid.go
	modified:   pkg/obitax/taxon.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonslice.go
	modified:   pkg/obitools/obicleandb/obicleandb.go
	modified:   pkg/obitools/obigrep/options.go
	modified:   pkg/obitools/obilandmark/obilandmark.go
	modified:   pkg/obitools/obilandmark/options.go
	modified:   pkg/obitools/obirefidx/famlilyindexing.go
	modified:   pkg/obitools/obirefidx/geomindexing.go
	modified:   pkg/obitools/obirefidx/obirefidx.go
	modified:   pkg/obitools/obirefidx/options.go
	modified:   pkg/obitools/obitag/obigeomtag.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitag/options.go
	modified:   pkg/obiutils/strings.go
2024-12-19 13:36:59 +01:00
Eric Coissac
00b0edc15a refactoring of the file chunck writing 2024-11-29 18:15:03 +01:00
Eric Coissac
d29a56dcbf Changes to be committed:
modified:   Release-notes.md
	modified:   pkg/obialign/pairedendalign.go
	modified:   pkg/obilua/obiseq.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/biosequence.go
	modified:   pkg/obitools/obipairing/pairing.go
2024-11-27 09:56:22 +01:00
Eric Coissac
7884a74f9c Patch a bug in obitagpcr 2024-11-18 21:10:47 +01:00
Eric Coissac
03f4e88a17 Fisrt functional version 2024-11-14 19:10:23 +01:00
Eric Coissac
241f2286f2 remove the slice pool management 2024-09-24 16:31:30 +02:00
Eric Coissac
05bf2bfd6c Add option related to agrep match on obigrep and obiannotate 2024-09-09 16:52:13 +02:00
Eric Coissac
65ae82622e correction of several small bugs 2024-09-03 06:08:07 -03:00
Eric Coissac
bdb96dda94 Adds the obimicrosat command 2024-08-05 15:31:20 +02:00
Eric Coissac
67665a6b40 Xprize update
Former-commit-id: d38919a897961e4d40da3b844057c3fb94fdb6d7
2024-07-25 18:09:03 -04:00
Eric Coissac
4e4fac491f Fisrt versin of the two levels indexing
Former-commit-id: 4d86483bc120e27cb6f5d2c216596d410274fc69
2024-07-12 15:17:48 +02:00
Eric Coissac
c7ed47e110 first version of obidemerge, obijoin and a new filter for obicleandb but to be finnished
Former-commit-id: 8a1ed26e5548c30db75644c294d478ec4d753f19
2024-07-10 15:21:42 +02:00
Eric Coissac
bd855c4965 Adds CSV as an input format
Former-commit-id: a365bb6947064adc2709d66df05fa54c6fe47fad
2024-07-03 21:04:27 +02:00
Eric Coissac
fd663357b5 First version of obicleandb...
Former-commit-id: e60b61d015abbf029a555b51de99b4252c50ab59
2024-07-01 17:12:42 +02:00
Eric Coissac
93f9dcb95f Reducing memory allocation events
Former-commit-id: c94e79ba116464504580fc397270ead154063971
2024-06-22 22:32:31 +02:00
Eric Coissac
e6b87ecd02 Reduce memory allocation events
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
54a138196c Patch a bug in fasta and fastq reading
Former-commit-id: bcaa264b4c4a7c67617eb909b199176bf09913db
2024-06-21 14:28:57 +02:00
Eric Coissac
818ce87bab Patch some bugs in writing files
Former-commit-id: 612868a281dc0ecf4e6c5776973735e5c71bd517
2024-06-19 13:15:30 +02:00
Eric Coissac
65f5109957 Plenty of small bugs
Former-commit-id: 42c7fab7d65906c80ab4cd32da6867ff21842ea8
2024-06-04 16:49:12 +02:00
Eric Coissac
aa42df326a Correct a bug in the fastq reader affecting the quality of the last record of each chunk
Former-commit-id: b842d60af9c2f1f971946d99999d13cfc15793b3
2024-06-04 11:57:16 +02:00
Eric Coissac
dbeb44bc79 Patch a bug in the quality string writing
Former-commit-id: 1a76d58b8648378d10e8b59d05208263e96238c9
2024-05-31 11:07:25 +02:00