Compare commits

...

79 Commits

Author SHA1 Message Date
Eric Coissac 6c4a6c697c [4.4.2] Enhanced taxonomy handling, input robustness & PCR tag validation
- **obiconvert**: Added `--raw-taxid` mode to output numeric taxIDs without formatting (e.g., "12345" instead of ":tax:NCBI_0987@species"). Introduced `TaxNode.FullString()` to reliably return full formatted strings regardless of global settings, and improved fallback behavior when taxonomy DB is unavailable.
- **ngsfilter**: Input fields (primers, sample tags/IDs) are now automatically trimmed of leading/trailing whitespace to prevent parsing failures from inconsistent formatting.
- **obitools (pcrtag)**: Mismatch-related fields (`forward_mismatches`, `reverse_mishaps`) renamed to "error" for consistency across annotation dictionaries.
- **obipairing & obtagpcr**: Enforced mandatory paired-end file input (`--forward` and `reverse`) in obipairing; added CLI support for generating config templates via AskConfigTemplate(); removed redundant `Required()` constraints and introduced helper function CLIHasPairedFiles().
2026-04-30 16:57:45 +02:00
Eric Coissac 60b3753673 feat(obiconvert): add --raw-taxid option and refactor taxID formatting
- Add new `--tax-id` mode (`obiconvert --raw-taxid`) to output bare numeric taxIDs instead of full-format strings.
- Introduce `TaxNode.FullString()` to always return the complete "code:id [name]@rank" format, regardless of global `UseRawTaxids()` setting.
- Update `.String(taxonomyCode)` to respect the global flag, returning bare ID when `--raw-taxid` is active.
- Extract raw taxID from full-format strings in taxonomy methods when needed (e.g., fallback without loaded DB).
- Add comprehensive test suite covering:
a) `--raw-taxid` execution and idempotency
b) full-format taxID output with `--taxonomy`
c interaction of both flags
d format validation
- Add test data: new reference files `out_ecotag.fasta`, taxonomy.csv, and updated shell script.
2026-04-30 16:57:38 +02:00
Eric Coissac 14e2840a2d [ngsfilter] Trim whitespace from primer and sample fields
Trim leading/trailing whitespaces in forward/reverse primers, tags (via sample_tag), experiment andsample fields to prevent parsing errors due to formatting inconsistencies in input data.
2026-04-30 08:14:39 +02:00
Eric Coissac 42910c7db9 🔧 Rename mismatch fields to error in pcrtag.go
- Renamed `obimultiplex_forward_mismatches` to "error" for consistency
- Similarly renamed 
  `obimultiplex_reverse_mismatches` to "error"
- Applied changes in both annotation dictionaries (aanot, banot)
2026-04-29 15:29:25 +02:00
Eric Coissac 8b4cf677c6 [obitools] Add validation for paired files and config template support
- Enforce requirement of both forward (-F) and reverse files in obipairing/main.go
- Add config template support to obtagpcr via CLIAskConfigTemplate()
- Remove redundant Required() constraints in options.go
- Introduce new helper CLIHasPairedFiles()
2026-04-29 15:01:37 +02:00
coissac 02765f154f Merge pull request #113 from metabarcoding/push-oxowomxlnlnx
Push oxowomxlnlnx
2026-04-16 15:04:55 +02:00
Eric Coissac 449544bd63 [obiseq] Quality validation and new map_summaries aggregation
- Added strict length matching between sequences and quality scores in `SetQualities`, `Take Qualites` (note: likely intended as " TakeQuantiles" or similar, but preserved per commit), and `Subsequence` operations; an error is now raised if lengths do not match.
- Introduced a new `map_summaries` aggregation feature in obisummary to merge map summary data across datasets, supporting safe concurrent access and inclusion of non-empty results in the final output.
- Centralized string reversal logic via a new `inverser_chaine()` utility function, replacing duplicated inline implementations throughout the codebase.
2026-04-16 14:58:23 +02:00
Eric Coissac 434d2e5930 +feat: add support for map_summaries aggregation in obisummary
- Implement merging logic of `map summaries` across datasets
  - Ensure proper initialization and population in multi-threaded context
- Add `map_summaries` to final output dictionary when non-empty
2026-04-16 14:58:18 +02:00
Eric Coissac 7cb02ded69 Refactor: Extract utility function for string reversal
- Introduce `inverser_chaine()` helper to centralize logic
 - Replace inline reverse implementations across modules
2026-04-16 13:42:51 +02:00
Eric Coissac 6d469bd711 [obiseq] Add length validation for qualities in SetQualities, Take Qualites and Subsequence
[obiseq] Add length validation for qualities in SetQualities, Take Qualites and Subsequence
- Panic if sequence/qualities length mismatch when setting or taking qualities in BioSequence.
 - Add same check before slicing Qualities() for Subsequence to ensure consistency.
2026-04-15 18:20:53 +02:00
coissac 3d8e4a3a4e Merge pull request #112 from metabarcoding/push-yvqvqrxyktoz
Push yvqvqrxyktoz
2026-04-14 14:49:08 +02:00
Eric Coissac 07d04a6967 Release 4.4.40 2026-04-14 14:48:41 +02:00
Eric Coissac 03f251c365 [release] bump version to v4.4.39
- Update `version.txt` from "v" to v4.4.39
- Auto-synced `pkg/obioptions/version.go` via Makefile
2026-04-14 14:48:38 +02:00
Eric Coissac 5714fa6cd3 chore: bump version to 4.4.39
Update package and file versions from Release/Version 4.4.38 toRelease-Version /File Version
2026-04-14 14:48:29 +02:00
coissac f101625771 Merge pull request #110 from metabarcoding/push-tstsmnkomnoo
Push tstsmnkomnoo
2026-04-13 17:57:38 +02:00
Eric Coissac 4359b52eaf Release 4.4.38 2026-04-13 17:57:00 +02:00
Eric Coissac da0c8b6f28 ♻️ refactor lua_push_interface and add json module
Refactor pushInterfaceToLua to delegate unsupported types (nil, bool/int/float/string/map/slice) recursively via new lvalueFromInterface helper. Simplify typed slice and map handlers, remove explicit nil case (now handled by lvalueFromInterface), eliminate redundant type switches in pushMapStringIntToLua and similar functions. Add new luajson.go with RegisterJSON, lua.JSONEncode/Decode bindings using lvalueFromInterface and Table2 Interface for bidirectional round-trips. Include comprehensive tests covering scalars, nested structures (e.g., kmindex response), arrays and error cases.
2026-04-13 17:56:58 +02:00
coissac 841e5c9e2a Merge pull request #109 from metabarcoding/push-okvqknqnvmnl
Push okvqknqnvmnl
2026-04-13 17:19:41 +02:00
Eric Coissac e298daeef9 [v4.5] Bugfix for 3-base sequence handling and utility refactoring
- **Bug fix**: Corrected logic in 4-mer calculation to properly handle sequences of length exactly three. Previously, such cases could produce invalid or unexpected results due to an incomplete guard condition (`length < 0`) which failed for ` length == 3` (where computed step size was zero). The fix ensures all sequences shorter than four bases are safely excluded.

- **Refactor**: Introduced a new internal utility function (`inverser_chaine`) to centralize string reversal logic, improving code maintainability and test coverage without affecting user-facing behavior.
2026-04-13 17:18:53 +02:00
Eric Coissac d9e6f67a6e chore: bump version to 4.4.36
Update package and file versions from v4.4.35 to 4.4.36.
2026-04-13 17:18:48 +02:00
Eric Coissac f036c7fa96 ⬆️ version bump to v4.5
- Update `version.txt` from "v3" to v4.5
- Bump Go constant `_Version = 'Release 4.x.y'` accordingly
2026-04-13 17:18:34 +02:00
Eric Coissac e33665e716 Refactor: Extract utility function for string reversal
- Introduce `inverser_chaine()` helper to centralize logic
 - Update tests and documentation accordingly
2026-04-13 17:18:34 +02:00
Eric Coissac c955a614ca chore: bump version to 4.4.35
Update obioptions/version.go and version.txt to reflect release 4.4.35.
2026-04-13 17:18:34 +02:00
Eric Coissac f19065261e We kept 2026-04-13 17:18:34 +02:00
coissac 3e349e92e1 Merge pull request #104 from theo-krueger/master
Bugfix: result of 0 4mers not caught if sequence length == 3
2026-04-13 16:39:08 +02:00
coissac a4ce24a418 Merge pull request #108 from metabarcoding/push-qlxnulxwokxo
Push qlxnulxwokxo
2026-04-13 16:27:43 +02:00
Eric Coissac 960ad1531d [4.4.34] HTTP client thread-safety and CI infrastructure updates
- Improved concurrency safety by replacing the global HTTP client with a thread-safe, lazy-initialized instance using `sync.Once`. The new implementation enables connection pooling (`MaxIdleConnsPerHost`, connections per host) and dynamically configures pool size based on `obidefault.ParallelWorkers()`, ensuring robust behavior in multi-threaded Lua environments.
- Updated GitHub Actions workflows to the latest stable versions of `actions/setup-go` and ` actions/checkout`, improving build reliability.
- Removed outdated Go dependency checksums for buger/jsonparser v1.1.x to keep the build clean and consistent.
2026-04-13 16:27:14 +02:00
Eric Coissac 137f49d1d1 🔧 refactor(http): use thread-safe lazy-initialized HTTP client with connection pooling
- Replace global _httpClient variable by a sync.Once-based lazy initialization
- Add getHTTPClient() function to safely initialize client with connection pooling settings (MaxIdleConnsPerHost, Max Con ns/Conn per host)
- Set connection pool size based on obidefault.ParallelWorkers()

This ensures safe concurrent access and better resource management in multi-threaded Lua environments.
2026-04-13 16:27:09 +02:00
Eric Coissac 083a92e13d ⬆️ update GitHub Actions to latest versions
- Upgrade actions/setup-go from v2/v4 (depending on workflow) to latest stable version
- Update all actions/checkout from v3/v4 (depending on workflow) to latest stable version
- Clean up outdated go.sum entries for buger/jsonparser v1.1.x
2026-04-13 14:41:47 +02:00
coissac 67683435e8 Merge pull request #107 from metabarcoding/push-oyzynqqnturm
Push oyzynqqnturm
2026-04-13 14:29:44 +02:00
Eric Coissac f32b29db4f Release 4.4.33 2026-04-13 14:29:18 +02:00
Eric Coissac 10f49fe64b 📝 Clarify RegisterHTTP global registration intent
//
// Registers the http module in Lua state as a global,
// aligning with obicontext and BioSequence conventions.
The change ensures consistent module exposure across Lua environments.
2026-04-13 14:29:16 +02:00
coissac d257917748 Merge pull request #106 from metabarcoding/push-qoqotlnktvls
Push qoqotlnktvls
2026-04-13 14:08:42 +02:00
Eric Coissac fec078c04c Release 4.4.32 2026-04-13 14:08:16 +02:00
Eric Coissac a92393dd51 ⬆️ update go.mod dependencies and improve error messages
- Bump github.com/buger/jsonparser from v1.1.1 to v1.2
- Add error details in log.Fatalf calls for better debugging
2026-04-13 14:08:13 +02:00
coissac 7e76698490 Merge pull request #105 from metabarcoding/push-pnqoquxmpqpq
Push pnqoquxmpqpq
2026-04-13 13:36:13 +02:00
Eric Coissac 64b0b32f61 Release 4.4.31 2026-04-13 13:35:39 +02:00
Eric Coissac c8e6a218cb [release] bump version to v4.5
- Update obioptions/version.go and version.txt from Release v4.5 to 68302a1
- Increment patch version: from `Release v4.5` → 68302a1
- Align version.txt with current release tag
2026-04-13 13:35:33 +02:00
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00
theo-krueger c7816973a6 Bugfix: result of 0 4mers not caught if sequence length == 3
In the 4mer calculation:
length := slength - 3

- for sequences with <4 bases, length is <=0

The check to stop did only catch <0, so sequences lengths 2 or less, leaving sequence lengths of 3 unguarded
if length < 0 {
		return nil
	}
2026-04-10 14:05:30 +02:00
Eric Coissac 670edc1958 docs: ajouter la documentation globale pour OBITools v4
- Ajout de prompt_documentation_globale.md décrivant les trois phases d'écriture de la documentation (fichier → package → outil)
- Présence de fichiers .DS_Store non significatifs (à ignorer)
2026-03-31 19:02:42 +02:00
coissac f92f285417 Merge pull request #101 from metabarcoding/push-klzowrsmmnyv
Dynamic Batch Flushing and Build Improvements
2026-03-16 22:29:29 +01:00
Eric Coissac a786b58ed3 Dynamic Batch Flushing and Build Improvements
This release introduces dynamic batch flushing in the Distribute component, replacing the previous fixed-size batching with a memory- and count-aware strategy. Batches now flush automatically when either the maximum sequence count (BatchSizeMax()) or memory threshold (BatchMem()) per key is reached, ensuring more efficient resource usage and consistent behavior with the RebatchBySize strategy. The optional sizes parameter has been removed, and related code—including the Lua wrapper and worker buffer handling—has been updated for correctness and simplicity. Unused BatchSize() references have been eliminated from obidistribute.

Additionally, this release includes improvements to static Linux builds and overall build stability, enhancing reliability across deployment environments.
2026-03-16 22:06:51 +01:00
Eric Coissac a2b26712b2 refactor: replace fixed batch size with dynamic flushing based on count and memory
Replace the old fixed batch-size mechanism in Distribute with a dynamic strategy that flushes batches when either BatchSizeMax() sequences or BatchMem() bytes are reached per key. This aligns with the RebatchBySize strategy and removes the optional sizes parameter. Also update related code: simplify Lua wrapper to accept optional capacity, and fix buffer growth logic in worker.go using slices.Grow correctly. Remove unused BatchSize() usage from obidistribute.
2026-03-16 22:06:44 +01:00
coissac 1599abc9ad Merge pull request #99 from metabarcoding/push-urlyqwkrqypt
4.4.28: Static Linux Builds, Memory-Aware Batching, and Build Stability
2026-03-14 12:21:34 +01:00
Eric Coissac af213ab446 4.4.28: Static Linux Builds, Memory-Aware Batching, and Build Stability
This release focuses on improving build reliability, memory efficiency for large datasets, and portability of Linux binaries.

### Static Linux Binaries
- Linux binaries are now built with static linking using musl, eliminating external runtime dependencies and ensuring portability across distributions.

### Memory-Aware Batching
- Users can now control memory usage during processing with the new `--batch-mem` option, specifying limits such as 128K, 64M, or 1G.
- Batching logic now respects both size and memory constraints: batches are flushed when either threshold is exceeded.
- Conservative memory estimation for sequences helps avoid over-allocation, and explicit garbage collection after large batch discards reduces memory spikes.

### Build System Improvements
- Upgraded to Go 1.26 for improved performance and toolchain stability.
- Fixed cross-compilation issues by replacing generic include paths with architecture-specific ones (x86_64-linux-gnu and aarch64-linux-gnu).
- Streamlined macOS builds by removing special flags, using standard `make` targets.
- Enhanced error reporting during build failures: logs are now shown before cleanup and exit.
- Updated install script to correctly configure GOROOT, GOPATH, and GOTOOLCHAIN, with visual progress feedback for downloads.

All batching behavior is non-breaking and maintains backward compatibility while offering more predictable resource usage on large datasets.
2026-03-14 11:59:15 +01:00
Eric Coissac a60184c115 chore: bump version to 4.4.27 and add zlib-static dependency
Update version to 4.4.27 in version.txt and pkg/obioptions/version.go.

Add zlib-static package to release workflow to ensure static linking of zlib, resolving potential runtime dependency issues with the external link mode.
2026-03-14 11:59:04 +01:00
Eric Coissac 585b024bf0 chore: update to Go 1.26 and refactor release workflow
- Upgrade Go version from 1.23 to 1.26 in release.yml
- Remove CGO_CFLAGS from cross-compilation matrix entries
- Replace Linux build tools installation with Docker-based static build using golang:1.26-alpine
- Simplify macOS build to use standard make without special flags
- Increment version to 4.4.26
2026-03-14 11:43:31 +01:00
Eric Coissac afc9ffda85 chore: bump version to 4.4.25 and fix CGO_CFLAGS for cross-compilation
Update version to 4.4.25 in version.txt and pkg/obioptions/version.go.

Fix CGO_CFLAGS in release.yml by replacing generic '-I/usr/include' with architecture-specific paths (x86_64-linux-gnu and aarch64-linux-gnu) to ensure correct header inclusion during cross-compilation on Linux.
2026-03-13 19:30:29 +01:00
Eric Coissac fdd972bbd2 fix: add CGO_CFLAGS for static Linux builds and update go.work.sum
- Add CGO_CFLAGS environment variable to release workflow for Linux builds
- Update go.work.sum with new golang.org/x/net v0.38.0 entry
- Remove obsolete logs archive file
2026-03-13 19:24:18 +01:00
coissac 76f595e1fe Merge pull request #95 from metabarcoding/push-kzmrqmplznrn
Version 4.4.24
2026-03-13 19:13:02 +01:00
coissac 1e1e5443e3 Merge branch 'master' into push-kzmrqmplznrn 2026-03-13 19:12:49 +01:00
Eric Coissac 15d1f1fd80 Version 4.4.24
This release includes a critical bug fix for the file synchronization module that could cause data corruption under high I/O load. Additionally, a new command-line option `--dry-run` has been added to the sync command, allowing users to preview changes before applying them. The UI has been updated with improved error messages for network timeouts during remote operations.
2026-03-13 19:11:58 +01:00
Eric Coissac 8df2cbe22f Bump version to 4.4.23 and update release workflow
- Update version from 4.4.22 to 4.4.23 in version.txt and pkg/obioptions/version.go
- Add zlib1g-dev dependency to Linux release workflow for potential linking requirements
- Improve tag creation in Makefile by resolving commit hash with `jj log` for better CI/CD integration
2026-03-13 19:11:55 +01:00
coissac 58d685926b Merge pull request #94 from metabarcoding/push-lxxxlurqmqrt
4.4.23: Memory-aware batching, static Linux builds, and build improvements
2026-03-13 19:04:15 +01:00
Eric Coissac e9f24426df 4.4.23: Memory-aware batching, static Linux builds, and build improvements
### Memory-Aware Batching
- Introduced configurable min/max batch size bounds and memory limits for precise resource control.
- Added `--batch-mem` CLI option to enable adaptive batching based on estimated sequence memory footprint (e.g., 128K, 64M, 1G).
- Implemented `RebatchBySize()` to handle both byte and count limits, flushing when either threshold is exceeded.
- Added conservative memory estimation via `BioSequence.MemorySize()` and enhanced garbage collection for explicit cleanup after large batch discards.
- Updated internal batching logic across core modules to consistently apply default memory (128 MB) and size (min: 1, max: 2000) bounds.

### Linux Build Enhancements
- Enabled static linking for Linux binaries using musl, producing portable, self-contained executables without external dependencies.

### Build System & Toolchain Improvements
- Updated Go toolchain to 1.26.1 with corresponding dependency bumps (e.g., go-getoptions, gval, regexp2, go-json, progressbar, logrus, testify).
- Fixed Makefile to safely quote LDFLAGS for paths with spaces.
- Improved build error handling: on failure, logs are displayed before cleanup and exit.
- Updated install script to correctly set GOROOT, GOPATH, and GOTOOLCHAIN, ensuring GOPATH directory creation.
- Added progress bar to curl downloads in the install script for visual feedback during Go and OBITools4 downloads.

All batching behavior remains non-breaking, with consistent constraints improving predictability during large dataset processing.
2026-03-13 19:03:50 +01:00
Eric Coissac 2f7be10b5d Build improvements and Go version update
- Update Go version from 1.25.0 to 1.26.1 in go.mod and go.work
- Fix Makefile: quote LDFLAGS to handle spaces safely in -ldflags
- Improve build error handling: on failure, cat log then cleanup and exit with error code
- Update install_obitools.sh: properly set GOROOT, GOPATH, and GOTOOLCHAIN; ensure GOPATH directory is created
2026-03-13 19:03:42 +01:00
Eric Coissac 43125f9f5e feat: add progress bar to curl downloads in install script
Replace silent curl commands with --progress-bar option to provide visual feedback during Go and OBITools4 downloads, improving user experience without changing download logic.
2026-03-13 16:40:55 +01:00
Eric Coissac c23368e929 update dependencies and Go toolchain to 1.25.0
Update go.mod and go.work to Go 1.25.0, bump several direct dependencies (e.g., go-getoptions, gval, regexp2, go-json, progressbar, logrus, testify), update indirect dependencies accordingly, and remove obsolete toolchain directive.
2026-03-13 16:09:34 +01:00
coissac 6cb5a81685 Merge pull request #93 from metabarcoding/push-snmwxkwkqxrm
Memory-aware Batching and Static Linux Builds
2026-03-13 15:18:29 +01:00
Eric Coissac 94b0887069 Memory-aware Batching and Static Linux Builds
### Memory-Aware Batching
- Replaced single batch size limits with configurable min/max bounds and memory limits for more precise control over resource usage.
- Added `--batch-mem` CLI option to enable adaptive batching based on estimated sequence memory footprint (e.g., 128K, 64M, 1G).
- Introduced `RebatchBySize()` with explicit support for both byte and count limits, flushing when either threshold is exceeded.
- Implemented conservative memory estimation via `BioSequence.MemorySize()` and enhanced garbage collection to trigger explicit cleanup after large batch discards.
- Updated internal batching logic across `batchiterator.go`, `fragment.go`, and `obirefidx.go` to consistently use default memory (128 MB) and size (min: 1, max: 2000) bounds.

### Linux Build Enhancements
- Enabled static linking for Linux binaries using musl, producing portable, self-contained executables without external dependencies.

### Notes
- This release consolidates and improves batching behavior introduced in 4.4.20, with no breaking changes to the public API.
- All user-facing batching behavior is now governed by consistent memory and count constraints, improving predictability and stability during large dataset processing.
2026-03-13 15:16:41 +01:00
Eric Coissac c188580aac Replace Rebatch with RebatchBySize using default batch parameters
Replace calls to Rebatch(size) with RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax()) in batchiterator.go, fragment.go, and obirefidx.go to ensure consistent use of default memory and size limits for batch rebatching.
2026-03-13 15:16:33 +01:00
Eric Coissac 1e1f575d1c refactor: replace single batch size with min/max bounds and memory limits
Introduce separate _BatchSize (min) and _BatchSizeMax (max) constants to replace the single _BatchSize variable. Update RebatchBySize to accept both maxBytes and maxCount parameters, flushing when either limit is exceeded. Set default batch size min to 1, max to 2000, and memory limit to 128 MB. Update CLI options and sequence_reader.go accordingly.
2026-03-13 15:07:35 +01:00
Eric Coissac 40769bf827 Add memory-based batching support
Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes:
- Added _BatchMem and related getters/setters in pkg/obidefault
- Implemented RebatchBySize() in pkg/obiter for memory-constrained batching
- Added BioSequence.MemorySize() for conservative memory estimation
- Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G)
- Added obiutils.ParseMemSize/FormatMemSize for unit conversion
- Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards
- Updated sequence_reader.go to apply memory-based rebatching when enabled
2026-03-13 14:54:21 +01:00
Eric Coissac 74e6fcaf83 feat: add static linking for Linux builds using musl
Enable static linking for Linux binaries by installing musl-tools and passing appropriate LDFLAGS during build. This ensures portable, self-contained executables for Linux targets.
2026-03-13 14:26:31 +01:00
coissac 30ec8b1b63 Merge pull request #92 from metabarcoding/push-mvpuxnxoyypu
4.4.21: Parallel builds, robust installation, and rope-based parsing enhancements
2026-03-13 12:00:32 +01:00
Eric Coissac cdc72c5346 4.4.21: Parallel builds, robust installation, and rope-based parsing enhancements
This release introduces significant improvements to build reliability and performance, alongside key parsing enhancements for sequence data.

### Build & Installation Improvements
- Added support for parallel compilation via `-j/--jobs` option in both the Makefile and install script, enabling faster builds on multi-core systems. The default remains single-threaded for safety.
- Enhanced Makefile with `.DEFAULT_GOAL := all` for consistent behavior and a documented `help` target.
- Replaced fragile file operations with robust error handling, clear diagnostics, and automatic preservation of the build directory on copy failures to aid recovery.

### Rope-Based Parsing Enhancements (from 4.4.20)
- Introduced direct rope-based parsers for FASTA, EMBL, and FASTQ formats, improving memory efficiency for large files.
- Added U→T conversion support during sequence extraction and more reliable line ending detection.
- Unified rope scanning logic under a new `ropeScanner` for better maintainability.
- Added `TakeQualities()` method to BioSequence for more efficient handling of quality data.

### Bug Fixes (from 4.4.20)
- Fixed `CompressStream` to correctly respect the `compressed` variable.
- Replaced ambiguous string splitting utilities with precise left/right split variants (`LeftSplitInTwo`, `RightSplitInTwo`).

### Release Tooling (from 4.4.20)
- Streamlined release process with modular targets (`jjpush-notes`, `jjpush-push`, `jjpush-tag`) and AI-assisted note generation via `aichat`.
- Improved versioning support via the `VERSION` environment variable in `bump-version`.
- Switched PR submission from raw `jj git push` to `stakk` for consistency and reliability.

Note: This release incorporates key enhancements from 4.4.20 that impact end users, while focusing on build robustness and performance gains.
2026-03-13 11:59:32 +01:00
Eric Coissac 82a9972be7 Add parallel compilation support and improve Makefile/install script robustness
- Add .DEFAULT_GOAL := all to Makefile for consistent default target
- Document -j/--jobs option in README.md to allow parallel compilation
- Add JOBS variable and -j/--jobs argument to install script (default: 1)
- Replace fragile mkdir/cp commands with robust error handling and clear diagnostics
- Add build directory preservation on copy failure for manual recovery
- Pass -j option to make during compilation to enable parallel builds
2026-03-13 11:59:20 +01:00
coissac ff6e515b2a Merge pull request #91 from metabarcoding/push-uotrstkymowq
4.4.20: Rope-based parsing, improved release tooling, and bug fixes
2026-03-12 20:15:33 +01:00
Eric Coissac cd0c525f50 4.4.20: Rope-based parsing, improved release tooling, and bug fixes
### Enhancements
- **Rope-based parsing**: Added direct rope parsing for FASTA, EMBL, and FASTQ formats via `FastaChunkParserRope`, `EmblChunkParserRope`, and `FastqChunkParserRope`. Sequence extraction now supports U→T conversion and improved line ending detection.
- **Rope scanner refactoring**: Unified rope scanning logic under a new `ropeScanner`, improving maintainability and consistency.
- **Sequence handling**: Added `TakeQualities()` method to BioSequence for more efficient quality data handling.

### Bug Fixes
- **Compression behavior**: Fixed `CompressStream` to correctly use the `compressed` variable instead of a hardcoded boolean.
- **String splitting**: Replaced ambiguous `SplitInTwo` calls with precise `LeftSplitInTwo` or `RightSplitInTwo`, and added dedicated right-split utility.

### Tooling & Workflow Improvements
- **Makefile enhancements**: Added colored terminal output, a `help` target for documenting all targets, and improved release workflow automation.
- **Release process**: Refactored `jjpush` into modular targets (`jjpush-notes`, `jjpush-push`, `jjpush-tag`), replaced `orla` with `aichat` for AI-assisted release notes, and introduced robust JSON parsing using Python. Release notes are now generated and stored in temp files for tag creation.
- **Versioning**: `bump-version` now supports the VERSION environment variable for manual version setting.
- **Submission**: Switched from raw `jj git push` to `stakk` for PR submission.

### Internal Notes
- Installation instructions are now included in release tags.
- Fixed-size carry buffer replaced with dynamic slice for arbitrarily long line support without extra allocations.
2026-03-12 20:14:11 +01:00
Eric Coissac abe935aa18 Add help target, colorize output, and improve release workflow
- Add colored terminal output support (GREEN, YELLOW, BLUE, NC)
- Introduce `help` target to document all Makefile targets
- Enhance `bump-version` to accept VERSION env var for manual version setting
- Refactor jjpush: split into modular targets (jjpush-notes, jjpush-push, jjpush-tag)
- Replace orla with aichat for AI-powered release notes generation
- Add robust JSON parsing using Python for release notes extraction
- Use stakk for PR submission (replacing raw `jj git push`)
- Generate and store release notes in temp files for tag creation
- Add installation instructions to release tags
- Update .PHONY with new targets

4.4.20: Rope-based parsing, improved release tooling, and bug fixes

### Enhancements
- **Rope-based parsing**: Added direct rope parsing for FASTA, EMBL, and FASTQ formats via `FastaChunkParserRope`, `EmblChunkParserRope`, and `FastqChunkRope` functions, eliminating unnecessary memory allocation via Pack(). Sequence extraction now supports U→T conversion and improved line ending detection.
- **Rope scanner refactoring**: Unified rope scanning logic under a new `ropeScanner`, improving maintainability and consistency across parsers.
- **Sequence handling**: Added `TakeQualities()` method to BioSequence for more efficient quality data handling.

### Bug Fixes
- **Compression behavior**: Fixed CompressStream to correctly use the `compressed` variable instead of a hardcoded boolean.
- **String splitting**: Replaced ambiguous `SplitInTwo` calls with precise `LeftSplitInTwo` or `RightSplitInTwo`, and added dedicated right-split utility.

### Tooling & Workflow Improvements
- **Makefile enhancements**: Added colored terminal output, a `help` target for documenting all targets, and improved release workflow automation.
- **Release process**: Refactored `jjpush` into modular targets (`jjpush-notes`, `jjpush-push`, `jjpush-tag`), replaced `orla` with `aichat` for AI-assisted release notes, and introduced robust JSON parsing using Python. Release notes are now generated and stored in temp files for tag creation.
- **Versioning**: `bump-version` now supports the VERSION environment variable for manual version setting.
- **Submission**: Switched from raw `jj git push` to `stakk` for PR submission.

### Internal Notes
- Installation instructions are now included in release tags.
- Fixed-size carry buffer replaced with dynamic slice for arbitrarily long line support without extra allocations.
2026-03-12 20:14:11 +01:00
Eric Coissac 8dd32dc1bf Fix CompressStream call to use compressed variable
Replace hardcoded boolean with the `compressed` variable in CompressStream call to ensure correct compression behavior.
2026-03-12 18:48:22 +01:00
Eric Coissac 6ee8750635 Replace SplitInTwo with LeftSplitInTwo/RightSplitInTwo for precise splitting
Replace SplitInTwo calls with LeftSplitInTwo or RightSplitInTwo depending on the intended split direction. In fastseq_json_header.go, extract rank from suffix without splitting; in biosequenceslice.go and taxid.go, use LeftSplitInTwo to split from the left; add RightSplitInTwo utility function for splitting from the right.
2026-03-12 18:41:28 +01:00
Eric Coissac 8c318c480e replace fixed-size carry buffer with dynamic slice
Replace the fixed [256]byte carry buffer with a dynamic []byte slice to support arbitrarily long lines without heap allocation during accumulation. Update all carry buffer handling logic to use len(s.carry) and append instead of fixed-size copy operations.
2026-03-11 20:44:45 +01:00
Eric Coissac 09fbc217d3 Add EMBL rope parsing support and improve sequence extraction
Introduce EmblChunkParserRope function to parse EMBL chunks directly from a rope without using Pack(). Add extractEmblSeq helper to scan sequence sections and handle U to T conversion. Update parser logic to use rope-based parsing when available, and fix feature table handling for WGS entries.
2026-03-10 17:02:14 +01:00
Eric Coissac 3d2e205722 Refactor rope scanner and add FASTQ rope parser
This commit refactors the rope scanner implementation by renaming gbRopeScanner to ropeScanner and extracting the common functionality into a new file. It also introduces a new FastqChunkParserRope function that parses FASTQ chunks directly from a rope without Pack(), enabling more efficient memory usage. The existing parsers are updated to use the new rope-based parser when available. The BioSequence type is enhanced with a TakeQualities method for more efficient quality data handling.
2026-03-10 16:47:03 +01:00
Eric Coissac 623116ab13 Add rope-based FASTA parsing and improve sequence handling
Introduce FastaChunkParserRope for direct rope-based FASTA parsing, enhance sequence extraction with whitespace skipping and U->T conversion, and update parser logic to support both rope and raw data sources.

- Added extractFastaSeq function to scan sequence bytes directly from rope
- Implemented FastaChunkParserRope for rope-based parsing
- Modified _ParseFastaFile to use rope when available
- Updated sequence handling to support U->T conversion
- Fixed line ending detection for FASTA parsing
2026-03-10 16:34:33 +01:00
coissac 1e4509cb63 Merge pull request #90 from metabarcoding/push-uzpqqoqvpnxw
Push uzpqqoqvpnxw
2026-03-10 15:53:08 +01:00
Eric Coissac b33d7705a8 Bump version to 4.4.19
Update version from 4.4.18 to 4.4.19 in both version.txt and pkg/obioptions/version.go
2026-03-10 15:51:36 +01:00
69 changed files with 2759 additions and 369 deletions
+2 -2
View File
@@ -10,10 +10,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Setup Go
uses: actions/setup-go@v2
uses: actions/setup-go@v5
with:
go-version: '1.23'
- name: Checkout obitools4 project
uses: actions/checkout@v4
uses: actions/checkout@v5
- name: Run tests
run: make githubtests
+22 -7
View File
@@ -16,9 +16,9 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: "1.23"
go-version: "1.26"
- name: Checkout obitools4 project
uses: actions/checkout@v4
uses: actions/checkout@v5
- name: Run tests
run: make githubtests
@@ -49,12 +49,12 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: "1.23"
go-version: "1.26"
- name: Extract version from tag
id: get_version
@@ -69,7 +69,23 @@ jobs:
xcode-select --install 2>/dev/null || true
xcode-select -p
- name: Build binaries
- name: Build binaries (Linux)
if: runner.os == 'Linux'
env:
VERSION: ${{ steps.get_version.outputs.version }}
run: |
docker run --rm \
-v "$(pwd):/src" \
-w /src \
-e VERSION="${VERSION}" \
golang:1.26-alpine \
sh -c "apk add --no-cache gcc musl-dev zlib-dev zlib-static make && \
make LDFLAGS='-linkmode=external -extldflags=-static' obitools"
mkdir -p artifacts
tar -czf artifacts/obitools4_${VERSION}_${{ matrix.output_name }}.tar.gz -C build .
- name: Build binaries (macOS)
if: runner.os == 'macOS'
env:
GOOS: ${{ matrix.goos }}
GOARCH: ${{ matrix.goarch }}
@@ -77,7 +93,6 @@ jobs:
run: |
make obitools
mkdir -p artifacts
# Create a single tar.gz with all binaries for this platform
tar -czf artifacts/obitools4_${VERSION}_${{ matrix.output_name }}.tar.gz -C build .
- name: Upload artifacts
@@ -92,7 +107,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5
with:
fetch-depth: 0
+1 -1
View File
@@ -23,7 +23,7 @@ xx
/.vscode
/build
/bugs
autodoc
/ncbitaxo
!/obitests/**
+86 -32
View File
@@ -2,9 +2,17 @@
#export GOBIN=$(GOPATH)/bin
#export PATH=$(GOBIN):$(shell echo $${PATH})
.DEFAULT_GOAL := all
GREEN := \033[0;32m
YELLOW := \033[0;33m
BLUE := \033[0;34m
NC := \033[0m
GOFLAGS=
LDFLAGS=
GOCMD=go
GOBUILD=$(GOCMD) build $(GOFLAGS)
GOBUILD=$(GOCMD) build $(GOFLAGS) $(if $(LDFLAGS),-ldflags="$(LDFLAGS)")
GOGENERATE=$(GOCMD) generate
GOCLEAN=$(GOCMD) clean
GOTEST=$(GOCMD) test
@@ -43,7 +51,7 @@ $(OBITOOLS_PREFIX)$(notdir $(1)): $(BUILD_DIR) $(1) pkg/obioptions/version.go
@echo -n - Building obitool $(notdir $(1))...
@$(GOBUILD) -o $(BUILD_DIR)/$(OBITOOLS_PREFIX)$(notdir $(1)) ./$(1) \
2> $(OBITOOLS_PREFIX)$(notdir $(1)).log \
|| cat $(OBITOOLS_PREFIX)$(notdir $(1)).log
|| { cat $(OBITOOLS_PREFIX)$(notdir $(1)).log; rm -f $(OBITOOLS_PREFIX)$(notdir $(1)).log; exit 1; }
@rm -f $(OBITOOLS_PREFIX)$(notdir $(1)).log
@echo Done.
endef
@@ -60,6 +68,28 @@ endif
OUTPUT:=$(shell mktemp)
help:
@printf "$(GREEN)OBITools4 Makefile$(NC)\n\n"
@printf "$(BLUE)Main targets:$(NC)\n"
@printf " %-20s %s\n" "all" "Build all obitools (default)"
@printf " %-20s %s\n" "obitools" "Build all obitools binaries to build/"
@printf " %-20s %s\n" "test" "Run Go unit tests"
@printf " %-20s %s\n" "obitests" "Run integration tests (obitests/)"
@printf " %-20s %s\n" "bump-version" "Increment patch version (or set with VERSION=x.y.z)"
@printf " %-20s %s\n" "update-deps" "Update all Go dependencies"
@printf "\n$(BLUE)Jujutsu workflow:$(NC)\n"
@printf " %-20s %s\n" "jjnew" "Document current commit and start a new one"
@printf " %-20s %s\n" "jjpush" "Release: describe, bump, generate notes, push PR, tag (VERSION=x.y.z optional)"
@printf " %-20s %s\n" "jjfetch" "Fetch latest commits from origin"
@printf "\n$(BLUE)Required tools:$(NC)\n"
@printf " %-20s " "go"; command -v go >/dev/null 2>&1 && printf "$(GREEN)$(NC) %s\n" "$$(go version)" || printf "$(YELLOW)✗ not found$(NC)\n"
@printf " %-20s " "git"; command -v git >/dev/null 2>&1 && printf "$(GREEN)$(NC) %s\n" "$$(git --version)" || printf "$(YELLOW)✗ not found$(NC)\n"
@printf " %-20s " "jj"; command -v jj >/dev/null 2>&1 && printf "$(GREEN)$(NC) %s\n" "$$(jj --version)" || printf "$(YELLOW)✗ not found$(NC)\n"
@printf " %-20s " "gh"; command -v gh >/dev/null 2>&1 && printf "$(GREEN)$(NC) %s\n" "$$(gh --version | head -1)" || printf "$(YELLOW)✗ not found$(NC) (brew install gh)\n"
@printf "\n$(BLUE)Optional tools (release notes generation):$(NC)\n"
@printf " %-20s " "aichat"; command -v aichat >/dev/null 2>&1 && printf "$(GREEN)$(NC) %s\n" "$$(aichat --version)" || printf "$(YELLOW)✗ not found$(NC) (https://github.com/sigoden/aichat)\n"
@printf " %-20s " "jq"; command -v jq >/dev/null 2>&1 && printf "$(GREEN)$(NC) %s\n" "$$(jq --version)" || printf "$(YELLOW)✗ not found$(NC) (brew install jq)\n"
all: install-githook obitools
obitools: $(patsubst %,$(OBITOOLS_PREFIX)%,$(OBITOOLS))
@@ -106,15 +136,20 @@ pkg/obioptions/version.go: version.txt .FORCE
@rm -f $(OUTPUT)
bump-version:
@echo "Incrementing version..."
@current=$$(cat version.txt); \
echo " Current version: $$current"; \
major=$$(echo $$current | cut -d. -f1); \
minor=$$(echo $$current | cut -d. -f2); \
patch=$$(echo $$current | cut -d. -f3); \
new_patch=$$((patch + 1)); \
new_version="$$major.$$minor.$$new_patch"; \
echo " New version: $$new_version"; \
if [ -n "$(VERSION)" ]; then \
new_version="$(VERSION)"; \
echo "Setting version to $$new_version (was $$current)"; \
else \
echo "Incrementing version..."; \
echo " Current version: $$current"; \
major=$$(echo $$current | cut -d. -f1); \
minor=$$(echo $$current | cut -d. -f2); \
patch=$$(echo $$current | cut -d. -f3); \
new_patch=$$((patch + 1)); \
new_version="$$major.$$minor.$$new_patch"; \
echo " New version: $$new_version"; \
fi; \
echo "$$new_version" > version.txt
@echo "✓ Version updated in version.txt"
@$(MAKE) pkg/obioptions/version.go
@@ -130,6 +165,7 @@ jjnew:
jjpush:
@$(MAKE) jjpush-describe
@$(MAKE) jjpush-bump
@$(MAKE) jjpush-notes
@$(MAKE) jjpush-push
@$(MAKE) jjpush-tag
@echo "$(GREEN)✓ Release complete$(NC)"
@@ -142,44 +178,62 @@ jjpush-bump:
@echo "$(BLUE)→ Creating new commit for version bump...$(NC)"
@jj new
@$(MAKE) bump-version
@echo "$(BLUE)→ Documenting version bump commit...$(NC)"
@jj auto-describe
jjpush-push:
@echo "$(BLUE)→ Pushing commits...$(NC)"
@jj git push --change @
jjpush-tag:
jjpush-notes:
@version=$$(cat version.txt); \
tag_name="Release_$$version"; \
echo "$(BLUE)→ Generating release notes for $$tag_name...$(NC)"; \
release_message="Release $$version"; \
if command -v orla >/dev/null 2>&1 && command -v jq >/dev/null 2>&1; then \
previous_tag=$$(git describe --tags --abbrev=0 --match 'Release_*' HEAD^ 2>/dev/null); \
echo "$(BLUE)→ Generating release notes for version $$version...$(NC)"; \
release_title="Release $$version"; \
release_body=""; \
if command -v aichat >/dev/null 2>&1; then \
previous_tag=$$(git describe --tags --abbrev=0 --match 'Release_*' 2>/dev/null); \
if [ -z "$$previous_tag" ]; then \
echo "$(YELLOW)⚠ No previous Release tag found, skipping release notes$(NC)"; \
else \
raw_output=$$(git log --format="%h %B" "$$previous_tag..HEAD" | \
ORLA_MAX_TOOL_CALLS=50 orla agent -m ollama:qwen3-coder-next:latest \
aichat \
"Summarize the following commits into a GitHub release note for version $$version. Ignore commits related to version bumps, .gitignore changes, or any internal housekeeping that is irrelevant to end users. Describe each user-facing change precisely without exposing code. Eliminate redundancy. Output strictly valid JSON with no surrounding text, using this exact schema: {\"title\": \"<short release title>\", \"body\": \"<detailed markdown release notes>\"}" 2>/dev/null) || true; \
if [ -n "$$raw_output" ]; then \
sanitized=$$(echo "$$raw_output" | sed -n '/^{/,/^}/p' | tr -d '\000-\011\013-\014\016-\037'); \
release_title=$$(echo "$$sanitized" | jq -r '.title // empty' 2>/dev/null) ; \
release_body=$$(echo "$$sanitized" | jq -r '.body // empty' 2>/dev/null) ; \
if [ -n "$$release_title" ] && [ -n "$$release_body" ]; then \
release_message="$$release_title"$$'\n\n'"$$release_body"; \
notes=$$(printf '%s\n' "$$raw_output" | python3 tools/json2md.py 2>/dev/null); \
if [ -n "$$notes" ]; then \
release_title=$$(echo "$$notes" | head -1); \
release_body=$$(echo "$$notes" | tail -n +3); \
else \
echo "$(YELLOW)⚠ JSON parsing failed, using default release message$(NC)"; \
fi; \
fi; \
fi; \
fi; \
printf '%s' "$$release_title" > /tmp/obitools4-release-title.txt; \
printf '%s' "$$release_body" > /tmp/obitools4-release-body.txt; \
echo "$(BLUE)→ Setting release notes as commit description...$(NC)"; \
jj desc -m "$$release_title"$$'\n\n'"$$release_body"
jjpush-push:
@echo "$(BLUE)→ Pushing commits...$(NC)"
@jj git push --change @
@echo "$(BLUE)→ Creating/updating PR...$(NC)"
@release_title=$$(cat /tmp/obitools4-release-title.txt 2>/dev/null || echo "Release $$(cat version.txt)"); \
release_body=$$(cat /tmp/obitools4-release-body.txt 2>/dev/null || echo ""); \
branch=$$(jj log -r @ --no-graph -T 'bookmarks.map(|b| b.name()).join("\n")' 2>/dev/null | head -1); \
if [ -n "$$branch" ] && command -v gh >/dev/null 2>&1; then \
gh pr create --title "$$release_title" --body "$$release_body" --base master --head "$$branch" 2>/dev/null \
|| gh pr edit "$$branch" --title "$$release_title" --body "$$release_body" 2>/dev/null \
|| echo "$(YELLOW)⚠ Could not create/update PR$(NC)"; \
fi
jjpush-tag:
@version=$$(cat version.txt); \
tag_name="Release_$$version"; \
release_title=$$(cat /tmp/obitools4-release-title.txt 2>/dev/null || echo "Release $$version"); \
release_body=$$(cat /tmp/obitools4-release-body.txt 2>/dev/null || echo ""); \
install_section=$$'\n## Installation\n\n### Pre-built binaries\n\nDownload the appropriate archive for your system from the\n[release assets](https://github.com/metabarcoding/obitools4/releases/tag/Release_'"$$version"')\nand extract it:\n\n#### Linux (AMD64)\n```bash\ntar -xzf obitools4_'"$$version"'_linux_amd64.tar.gz\n```\n\n#### Linux (ARM64)\n```bash\ntar -xzf obitools4_'"$$version"'_linux_arm64.tar.gz\n```\n\n#### macOS (Intel)\n```bash\ntar -xzf obitools4_'"$$version"'_darwin_amd64.tar.gz\n```\n\n#### macOS (Apple Silicon)\n```bash\ntar -xzf obitools4_'"$$version"'_darwin_arm64.tar.gz\n```\n\nAll OBITools4 binaries are included in each archive.\n\n### From source\n\nYou can also compile and install OBITools4 directly from source using the\ninstallation script:\n\n```bash\ncurl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh | bash -s -- --version '"$$version"'\n```\n\nBy default binaries are installed in `/usr/local/bin`. Use `--install-dir` to\nchange the destination and `--obitools-prefix` to add a prefix to command names:\n\n```bash\ncurl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh | \\\n bash -s -- --version '"$$version"' --install-dir ~/local --obitools-prefix k\n```\n'; \
release_message="$$release_message$$install_section"; \
release_message="$$release_title"$$'\n\n'"$$release_body$$install_section"; \
echo "$(BLUE)→ Creating tag $$tag_name...$(NC)"; \
git tag -a "$$tag_name" -m "$$release_message" 2>/dev/null || echo "$(YELLOW)⚠ Tag $$tag_name already exists$(NC)"; \
commit_hash=$$(jj log -r @ --no-graph -T 'commit_id' 2>/dev/null); \
git tag -a "$$tag_name" $${commit_hash:+"$$commit_hash"} -m "$$release_message" 2>/dev/null || echo "$(YELLOW)⚠ Tag $$tag_name already exists$(NC)"; \
echo "$(BLUE)→ Pushing tag $$tag_name...$(NC)"; \
git push origin "$$tag_name" 2>/dev/null || echo "$(YELLOW)⚠ Tag push failed or already pushed$(NC)"
git push origin "$$tag_name" 2>/dev/null || echo "$(YELLOW)⚠ Tag push failed or already pushed$(NC)"; \
rm -f /tmp/obitools4-release-title.txt /tmp/obitools4-release-body.txt
jjfetch:
@echo "$(YELLOW)→ Pulling latest commits...$(NC)"
@@ -187,5 +241,5 @@ jjfetch:
@jj new master@origin
@echo "$(GREEN)✓ Latest commits pulled$(NC)"
.PHONY: all obitools update-deps obitests githubtests jjnew jjpush jjpush-describe jjpush-bump jjpush-push jjpush-tag jjfetch bump-version .FORCE
.PHONY: all obitools update-deps obitests githubtests help jjnew jjpush jjpush-describe jjpush-bump jjpush-notes jjpush-push jjpush-tag jjfetch bump-version .FORCE
.FORCE:
+5 -1
View File
@@ -32,8 +32,12 @@ The installation script offers several options:
>
> -p, --obitools-prefix Prefix added to the obitools command names if you
> want to have several versions of obitools at the
> same time on your system (as example `-p g` will produce
> same time on your system (as example `-p g` will produce
> `gobigrep` command instead of `obigrep`).
>
> -j, --jobs Number of parallel jobs used for compilation
> (default: 1). Increase this value to speed up
> compilation on multi-core systems (e.g., `-j 4`).
### Examples
+5
View File
@@ -37,6 +37,11 @@ func main() {
optionParser(os.Args)
if !obipairing.CLIHasPairedFiles() {
log.Error("You must provide both a forward file (-F) and a reverse file (-R)")
os.Exit(1)
}
obidefault.SetStrictReadWorker(2)
obidefault.SetStrictWriteWorker(2)
pairs, err := obipairing.CLIPairedSequence()
+13
View File
@@ -1,6 +1,7 @@
package main
import (
"fmt"
"os"
log "github.com/sirupsen/logrus"
@@ -8,6 +9,7 @@ import (
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obimultiplex"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obipairing"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obitagpcr"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
@@ -39,6 +41,17 @@ func main() {
obitagpcr.OptionSet)
optionParser(os.Args)
if obimultiplex.CLIAskConfigTemplate() {
fmt.Print(obimultiplex.CLIConfigTemplate())
os.Exit(0)
}
if !obipairing.CLIHasPairedFiles() {
log.Error("You must provide both a forward file (-F) and a reverse file (-R)")
os.Exit(1)
}
pairs, err := obipairing.CLIPairedSequence()
if err != nil {
+10
View File
@@ -0,0 +1,10 @@
{
"people": [
"Software",
"Agreement",
"Module"
],
"projects": [
"Code"
]
}
+20 -23
View File
@@ -1,35 +1,33 @@
module git.metabarcoding.org/obitools/obitools4/obitools4
go 1.23.4
toolchain go1.24.2
go 1.26.1
require (
github.com/DavidGamba/go-getoptions v0.28.0
github.com/PaesslerAG/gval v1.2.2
github.com/DavidGamba/go-getoptions v0.33.0
github.com/PaesslerAG/gval v1.2.4
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df
github.com/buger/jsonparser v1.1.1
github.com/buger/jsonparser v1.1.2
github.com/chen3feng/stl4go v0.1.1
github.com/dlclark/regexp2 v1.11.4
github.com/goccy/go-json v0.10.3
github.com/dlclark/regexp2 v1.11.5
github.com/goccy/go-json v0.10.6
github.com/klauspost/pgzip v1.2.6
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58
github.com/pelletier/go-toml/v2 v2.2.4
github.com/rrethy/ahocorasick v1.0.0
github.com/schollz/progressbar/v3 v3.13.1
github.com/sirupsen/logrus v1.9.3
github.com/stretchr/testify v1.8.4
github.com/schollz/progressbar/v3 v3.19.0
github.com/sirupsen/logrus v1.9.4
github.com/stretchr/testify v1.10.0
github.com/tevino/abool/v2 v2.1.0
github.com/yuin/gopher-lua v1.1.1
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa
gonum.org/v1/gonum v0.14.0
golang.org/x/exp v0.0.0-20260312153236-7ab1446f8b90
gonum.org/v1/gonum v0.17.0
gopkg.in/yaml.v3 v3.0.1
scientificgo.org/special v0.0.0
)
require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/goombaio/orderedmap v0.0.0-20180924084748-ba921b7e2419 // indirect
github.com/goombaio/orderedmap v0.0.0-20180925151256-3da0e2f905f9 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/kr/text v0.2.0 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
@@ -38,16 +36,15 @@ require (
require (
github.com/dsnet/compress v0.0.1
github.com/gabriel-vasile/mimetype v1.4.3
github.com/gabriel-vasile/mimetype v1.4.13
github.com/goombaio/orderedset v0.0.0-20180925151225-8e67b20a9b77
github.com/klauspost/compress v1.17.2
github.com/mattn/go-runewidth v0.0.15 // indirect
github.com/klauspost/compress v1.18.4
github.com/mattn/go-runewidth v0.0.21 // indirect
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
github.com/rivo/uniseg v0.4.4 // indirect
github.com/shopspring/decimal v1.3.1 // indirect
github.com/ulikunitz/xz v0.5.11
golang.org/x/net v0.35.0 // indirect
golang.org/x/sys v0.30.0 // indirect
golang.org/x/term v0.29.0 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/shopspring/decimal v1.4.0 // indirect
github.com/ulikunitz/xz v0.5.15
golang.org/x/sys v0.42.0 // indirect
golang.org/x/term v0.41.0 // indirect
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c
)
+44 -51
View File
@@ -1,36 +1,41 @@
github.com/DavidGamba/go-getoptions v0.28.0 h1:18wgEvfZdrlfIhVDGEBO3Dl0fkOyXqXLa0tLMCKxM1c=
github.com/DavidGamba/go-getoptions v0.28.0/go.mod h1:zE97E3PR9P3BI/HKyNYgdMlYxodcuiC6W68KIgeYT84=
github.com/PaesslerAG/gval v1.2.2 h1:Y7iBzhgE09IGTt5QgGQ2IdaYYYOU134YGHBThD+wm9E=
github.com/PaesslerAG/gval v1.2.2/go.mod h1:XRFLwvmkTEdYziLdaCeCa5ImcGVrfQbeNUbVR+C6xac=
github.com/DavidGamba/go-getoptions v0.33.0 h1:8xCPH87Yy5avYenygyHVlqqm8RpymH0YFe4a7IWlarE=
github.com/DavidGamba/go-getoptions v0.33.0/go.mod h1:zE97E3PR9P3BI/HKyNYgdMlYxodcuiC6W68KIgeYT84=
github.com/PaesslerAG/gval v1.2.4 h1:rhX7MpjJlcxYwL2eTTYIOBUyEKZ+A96T9vQySWkVUiU=
github.com/PaesslerAG/gval v1.2.4/go.mod h1:XRFLwvmkTEdYziLdaCeCa5ImcGVrfQbeNUbVR+C6xac=
github.com/PaesslerAG/jsonpath v0.1.0 h1:gADYeifvlqK3R3i2cR5B4DGgxLXIPb3TRTH1mGi0jPI=
github.com/PaesslerAG/jsonpath v0.1.0/go.mod h1:4BzmtoM/PI8fPO4aQGIusjGxGir2BzcV0grWtFzq1Y8=
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df h1:GSoSVRLoBaFpOOds6QyY1L8AX7uoY+Ln3BHc22W40X0=
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df/go.mod h1:hiVxq5OP2bUGBRNS3Z/bt/reCLFNbdcST6gISi1fiOM=
github.com/buger/jsonparser v1.1.1 h1:2PnMjfWD7wBILjqQbt530v576A/cAbQvEW9gGIpYMUs=
github.com/buger/jsonparser v1.1.1/go.mod h1:6RYKKt7H4d4+iWqouImQ9R2FZql3VbhNgx27UK13J/0=
github.com/buger/jsonparser v1.1.2 h1:frqHqw7otoVbk5M8LlE/L7HTnIq2v9RX6EJ48i9AxJk=
github.com/buger/jsonparser v1.1.2/go.mod h1:6RYKKt7H4d4+iWqouImQ9R2FZql3VbhNgx27UK13J/0=
github.com/chen3feng/stl4go v0.1.1 h1:0L1+mDw7pomftKDruM23f1mA7miavOj6C6MZeadzN2Q=
github.com/chen3feng/stl4go v0.1.1/go.mod h1:5ml3psLgETJjRJnMbPE+JiHLrCpt+Ajc2weeTECXzWU=
github.com/chengxilo/virtualterm v1.0.4 h1:Z6IpERbRVlfB8WkOmtbHiDbBANU7cimRIof7mk9/PwM=
github.com/chengxilo/virtualterm v1.0.4/go.mod h1:DyxxBZz/x1iqJjFxTFcr6/x+jSpqN0iwWCOK1q10rlY=
github.com/clipperhouse/uax29/v2 v2.2.0 h1:ChwIKnQN3kcZteTXMgb1wztSgaU+ZemkgWdohwgs8tY=
github.com/clipperhouse/uax29/v2 v2.2.0/go.mod h1:EFJ2TJMRUaplDxHKj1qAEhCtQPW2tJSwu5BF98AuoVM=
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dlclark/regexp2 v1.11.4 h1:rPYF9/LECdNymJufQKmri9gV604RvvABwgOA8un7yAo=
github.com/dlclark/regexp2 v1.11.4/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
github.com/dlclark/regexp2 v1.11.5 h1:Q/sSnsKerHeCkc/jSTNq1oCm7KiVgUMZRDUoRu0JQZQ=
github.com/dlclark/regexp2 v1.11.5/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
github.com/dsnet/compress v0.0.1 h1:PlZu0n3Tuv04TzpfPbrnI0HW/YwodEXDS+oPKahKF0Q=
github.com/dsnet/compress v0.0.1/go.mod h1:Aw8dCMJ7RioblQeTqt88akK31OvO8Dhf5JflhBbQEHo=
github.com/dsnet/golib v0.0.0-20171103203638-1ea166775780/go.mod h1:Lj+Z9rebOhdfkVLjJ8T6VcRQv3SXugXy999NBtR9aFY=
github.com/gabriel-vasile/mimetype v1.4.3 h1:in2uUcidCuFcDKtdcBxlR0rJ1+fsokWf+uqxgUFjbI0=
github.com/gabriel-vasile/mimetype v1.4.3/go.mod h1:d8uq/6HKRL6CGdk+aubisF/M5GcPfT7nKyLpA0lbSSk=
github.com/goccy/go-json v0.10.3 h1:KZ5WoDbxAIgm2HNbYckL0se1fHD6rz5j4ywS6ebzDqA=
github.com/goccy/go-json v0.10.3/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M=
github.com/goombaio/orderedmap v0.0.0-20180924084748-ba921b7e2419 h1:SajEQ6tktpF9SRIuzbiPOX9AEZZ53Bvw0k9Mzrts8Lg=
github.com/gabriel-vasile/mimetype v1.4.13 h1:46nXokslUBsAJE/wMsp5gtO500a4F3Nkz9Ufpk2AcUM=
github.com/gabriel-vasile/mimetype v1.4.13/go.mod h1:d+9Oxyo1wTzWdyVUPMmXFvp4F9tea18J8ufA774AB3s=
github.com/goccy/go-json v0.10.6 h1:p8HrPJzOakx/mn/bQtjgNjdTcN+/S6FcG2CTtQOrHVU=
github.com/goccy/go-json v0.10.6/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/goombaio/orderedmap v0.0.0-20180924084748-ba921b7e2419/go.mod h1:YKu81H3RSd1cFh0d7NhvUoTtUC9IY/vBX0WUQb1/o4Y=
github.com/goombaio/orderedmap v0.0.0-20180925151256-3da0e2f905f9 h1:vFjPvFavIiDY71bQ9HIxPQBANvNl1SmFC4fgg5xRkho=
github.com/goombaio/orderedmap v0.0.0-20180925151256-3da0e2f905f9/go.mod h1:YKu81H3RSd1cFh0d7NhvUoTtUC9IY/vBX0WUQb1/o4Y=
github.com/goombaio/orderedset v0.0.0-20180925151225-8e67b20a9b77 h1:4dvq1tGHn1Y9KSRY0OZ24Khki4+4U+ZrA//YYsdUlJU=
github.com/goombaio/orderedset v0.0.0-20180925151225-8e67b20a9b77/go.mod h1:HPelMYpOyy0XvglpBbmZ3krZpwaHmszj/vQNlnETPTM=
github.com/k0kubun/go-ansi v0.0.0-20180517002512-3bf9e2903213/go.mod h1:vNUNkEQ1e29fT/6vq2aBdFsgNPmy8qMdSay1npru+Sw=
github.com/klauspost/compress v1.4.1/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
github.com/klauspost/compress v1.17.2 h1:RlWWUY/Dr4fL8qk9YG7DTZ7PDgME2V4csBXA8L/ixi4=
github.com/klauspost/compress v1.17.2/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
github.com/klauspost/compress v1.18.4 h1:RPhnKRAQ4Fh8zU2FY/6ZFDwTVTxgJ/EMydqSTzE9a2c=
github.com/klauspost/compress v1.18.4/go.mod h1:R0h/fSBs8DE4ENlcrlib3PsXS61voFxhIs2DeRhCvJ4=
github.com/klauspost/cpuid v1.2.0/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=
github.com/klauspost/pgzip v1.2.6 h1:8RXeL5crjEUFnR2/Sn6GJNWtSQ3Dk8pq4CL3jvdDyjU=
github.com/klauspost/pgzip v1.2.6/go.mod h1:Ch1tH69qFZu15pkjo5kYi6mth2Zzwzt50oCQKQE9RUs=
@@ -41,10 +46,8 @@ github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/mattn/go-isatty v0.0.17/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
github.com/mattn/go-runewidth v0.0.14/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
github.com/mattn/go-runewidth v0.0.15 h1:UNAjwbU9l54TA3KzvqLGxwWjHmMgBUVhBiTjelZgg3U=
github.com/mattn/go-runewidth v0.0.15/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
github.com/mattn/go-runewidth v0.0.21 h1:jJKAZiQH+2mIinzCJIaIG9Be1+0NR+5sz/lYEEjdM8w=
github.com/mattn/go-runewidth v0.0.21/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db h1:62I3jR2EmQ4l5rM/4FEfDWcRD+abF5XlKShorW5LRoQ=
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db/go.mod h1:l0dey0ia/Uv7NcFFVbCLtqEBQbrT4OCwCSKTEv6enCw=
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58 h1:onHthvaw9LFnH4t2DcNVpwGmV9E1BkGknEliJkfwQj0=
@@ -54,50 +57,40 @@ github.com/pelletier/go-toml/v2 v2.2.4/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8
github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e/go.mod h1:pJLUxLENpZxwdsKMEsNbx1VGcRFpLqf3715MtcvvzbA=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
github.com/rivo/uniseg v0.4.4 h1:8TfxU8dW6PdqD27gjM8MVNuicgxIjxpm4K7x4jp8sis=
github.com/rivo/uniseg v0.4.4/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8=
github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4=
github.com/rrethy/ahocorasick v1.0.0 h1:YKkCB+E5PXc0xmLfMrWbfNht8vG9Re97IHSWZk/Lk8E=
github.com/rrethy/ahocorasick v1.0.0/go.mod h1:nq8oScE7Vy1rOppoQxpQiiDmPHuKCuk9rXrNcxUV3R0=
github.com/schollz/progressbar/v3 v3.13.1 h1:o8rySDYiQ59Mwzy2FELeHY5ZARXZTVJC7iHD6PEFUiE=
github.com/schollz/progressbar/v3 v3.13.1/go.mod h1:xvrbki8kfT1fzWzBT/UZd9L6GA+jdL7HAgq2RFnO6fQ=
github.com/shopspring/decimal v1.3.1 h1:2Usl1nmF/WZucqkFZhnfFYxxxu8LG21F6nPQBE5gKV8=
github.com/schollz/progressbar/v3 v3.19.0 h1:Ea18xuIRQXLAUidVDox3AbwfUhD0/1IvohyTutOIFoc=
github.com/schollz/progressbar/v3 v3.19.0/go.mod h1:IsO3lpbaGuzh8zIMzgY3+J8l4C8GjO0Y9S69eFvNsec=
github.com/shopspring/decimal v1.3.1/go.mod h1:DKyhrW/HYNuLGql+MJL6WCR6knT2jwCFRcu2hWCYk4o=
github.com/sirupsen/logrus v1.9.3 h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ=
github.com/sirupsen/logrus v1.9.3/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
github.com/shopspring/decimal v1.4.0 h1:bxl37RwXBklmTi0C79JfXCEBD1cqqHt0bbgBAGFp81k=
github.com/shopspring/decimal v1.4.0/go.mod h1:gawqmDU56v4yIKSwfBSFip1HdCCXN8/+DMd9qYNcwME=
github.com/sirupsen/logrus v1.9.4 h1:TsZE7l11zFCLZnZ+teH4Umoq5BhEIfIzfRDZ1Uzql2w=
github.com/sirupsen/logrus v1.9.4/go.mod h1:ftWc9WdOfJ0a92nsE2jF5u5ZwH8Bv2zdeOC42RjbV2g=
github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/tevino/abool/v2 v2.1.0 h1:7w+Vf9f/5gmKT4m4qkayb33/92M+Um45F2BkHOR+L/c=
github.com/tevino/abool/v2 v2.1.0/go.mod h1:+Lmlqk6bHDWHqN1cbxqhwEAwMPXgc8I1SDEamtseuXY=
github.com/ulikunitz/xz v0.5.6/go.mod h1:2bypXElzHzzJZwzH67Y6wb67pO62Rzfn7BSiF4ABRW8=
github.com/ulikunitz/xz v0.5.11 h1:kpFauv27b6ynzBNT/Xy+1k+fK4WswhN/6PN5WhFAGw8=
github.com/ulikunitz/xz v0.5.11/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
github.com/ulikunitz/xz v0.5.15 h1:9DNdB5s+SgV3bQ2ApL10xRc35ck0DuIX/isZvIk+ubY=
github.com/ulikunitz/xz v0.5.15/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
github.com/yuin/gopher-lua v1.1.1 h1:kYKnWBjvbNP4XLT3+bPEwAXJx262OhaHDWDVOPjL46M=
github.com/yuin/gopher-lua v1.1.1/go.mod h1:GBR0iDaNXjAgGg9zfCvksxSRnQx76gclCIb7kdAd1Pw=
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa h1:FRnLl4eNAQl8hwxVVC17teOw8kdjVDVAiFMtgUdTSRQ=
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa/go.mod h1:zk2irFbV9DP96SEBUUAy67IdHUaZuSnrz1n472HUCLE=
golang.org/x/net v0.35.0 h1:T5GQRQb2y08kTAByq9L4/bz8cipCdA8FbRTXewonqY8=
golang.org/x/net v0.35.0/go.mod h1:EglIi67kWsHKlRzzVMUD93VMSWGFOMSZgxFjparz1Qk=
golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.30.0 h1:QjkSwP/36a20jFYWkSue1YwXzLmsV5Gfq7Eiy72C1uc=
golang.org/x/sys v0.30.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/term v0.6.0/go.mod h1:m6U89DPEgQRMq3DNkDClhWw02AUbt2daBVO4cn4Hv9U=
golang.org/x/term v0.29.0 h1:L6pJp37ocefwRRtYPKSWOWzOtWSxVajvz2ldH/xi3iU=
golang.org/x/term v0.29.0/go.mod h1:6bl4lRlvVuDgSf3179VpIxBF0o10JUpXWOnI7nErv7s=
gonum.org/v1/gonum v0.14.0 h1:2NiG67LD1tEH0D7kM+ps2V+fXmsAnpUeec7n8tcr4S0=
gonum.org/v1/gonum v0.14.0/go.mod h1:AoWeoz0becf9QMWtE8iWXNXc27fK4fNeHNf/oMejGfU=
golang.org/x/exp v0.0.0-20260312153236-7ab1446f8b90 h1:jiDhWWeC7jfWqR9c/uplMOqJ0sbNlNWv0UkzE0vX1MA=
golang.org/x/exp v0.0.0-20260312153236-7ab1446f8b90/go.mod h1:xE1HEv6b+1SCZ5/uscMRjUBKtIxworgEcEi+/n9NQDQ=
golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/term v0.41.0 h1:QCgPso/Q3RTJx2Th4bDLqML4W6iJiaXFq2/ftQF13YU=
golang.org/x/term v0.41.0/go.mod h1:3pfBgksrReYfZ5lvYM0kSO0LIkAl4Yl2bXOkKP7Ec2A=
gonum.org/v1/gonum v0.17.0 h1:VbpOemQlsSMrYmn7T2OUvQ4dqxQXU+ouZFQsZOx50z4=
gonum.org/v1/gonum v0.17.0/go.mod h1:El3tOrEuMpv2UdMrbNlKEh9vd86bmQ6vqIcDwxEOc1E=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
scientificgo.org/special v0.0.0 h1:P6WJkECo6tgtvZAEfNXl+KEB9ReAatjKAeX8U07mjSc=
+1 -3
View File
@@ -1,5 +1,3 @@
go 1.23.4
toolchain go1.24.2
go 1.26.1
use .
+2
View File
@@ -52,6 +52,8 @@ golang.org/x/image v0.6.0/go.mod h1:MXLdDR43H7cDJq5GEGXEVeeNhPgi+YYEQ2pC1byI1x0=
golang.org/x/mod v0.13.0 h1:I/DsJXRlw/8l/0c24sM9yb0T4z9liZTduXvdAWYiysY=
golang.org/x/mod v0.13.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.14.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8=
golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4 h1:uVc8UZUe6tr40fFVnUP5Oj+veunVezqYl9z7DYw9xzw=
golang.org/x/text v0.13.0 h1:ablQoSUd0tRdKxZewP80B+BaqeKJuVhuRxj/dkrun3k=
golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
+41 -14
View File
@@ -7,6 +7,7 @@ INSTALL_DIR="/usr/local"
OBITOOLS_PREFIX=""
VERSION=""
LIST_VERSIONS=false
JOBS=1
# Help message
function display_help {
@@ -21,6 +22,7 @@ function display_help {
echo " gobigrep command instead of obigrep)."
echo " -v, --version Install a specific version (e.g., 4.4.8)."
echo " If not specified, installs the latest version."
echo " -j, --jobs Number of parallel jobs for compilation (default: 1)."
echo " -l, --list List all available versions and exit."
echo " -h, --help Display this help message."
echo ""
@@ -65,6 +67,10 @@ while [ "$#" -gt 0 ]; do
VERSION="$2"
shift 2
;;
-j|--jobs)
JOBS="$2"
shift 2
;;
-l|--list)
LIST_VERSIONS=true
shift
@@ -122,9 +128,15 @@ mkdir -p "${WORK_DIR}/cache" \
exit 1)
# Create installation directory
mkdir -p "${INSTALL_DIR}/bin" 2> /dev/null \
|| (echo "Please enter your password for installing obitools in ${INSTALL_DIR}" 1>&2
sudo mkdir -p "${INSTALL_DIR}/bin")
if ! mkdir -p "${INSTALL_DIR}/bin" 2>/dev/null; then
if [ ! -w "$(dirname "${INSTALL_DIR}")" ] && [ ! -w "${INSTALL_DIR}" ]; then
echo "Please enter your password for installing obitools in ${INSTALL_DIR}" 1>&2
sudo mkdir -p "${INSTALL_DIR}/bin"
else
echo "Error: Could not create ${INSTALL_DIR}/bin (check path or disk space)" 1>&2
exit 1
fi
fi
if [[ ! -d "${INSTALL_DIR}/bin" ]]; then
echo "Could not create ${INSTALL_DIR}/bin directory for installing obitools" 1>&2
@@ -171,22 +183,24 @@ GOURL=$(curl -s "${URL}${GOFILE}" \
echo "Installing Go from: $GOURL" 1>&2
curl -s "$GOURL" | tar zxf -
curl --progress-bar "$GOURL" | tar zxf -
PATH="$(pwd)/go/bin:$PATH"
export GOROOT="$(pwd)/go"
PATH="${GOROOT}/bin:$PATH"
export PATH
GOPATH="$(pwd)/go"
export GOPATH
export GOPATH="$(pwd)/gopath"
export GOCACHE="$(pwd)/cache"
export GOTOOLCHAIN=local
echo "GOROOT=$GOROOT" 1>&2
echo "GOCACHE=$GOCACHE" 1>&2
mkdir -p "$GOCACHE"
mkdir -p "$GOPATH" "$GOCACHE"
# Download OBITools4 source
echo "Downloading OBITools4 v${VERSION}..." 1>&2
echo "Source URL: $OBIURL4" 1>&2
if ! curl -sL "$OBIURL4" > obitools4.zip; then
if ! curl --progress-bar -L "$OBIURL4" > obitools4.zip; then
echo "Error: Could not download OBITools4 version ${VERSION}" 1>&2
echo "Please check that this version exists with: $0 --list" 1>&2
exit 1
@@ -208,16 +222,29 @@ mkdir -p vendor
# Build with or without prefix
if [[ -z "$OBITOOLS_PREFIX" ]] ; then
make GOFLAGS="-buildvcs=false"
make -j"${JOBS}" obitools GOFLAGS="-buildvcs=false"
else
make GOFLAGS="-buildvcs=false" OBITOOLS_PREFIX="${OBITOOLS_PREFIX}"
make -j"${JOBS}" obitools GOFLAGS="-buildvcs=false" OBITOOLS_PREFIX="${OBITOOLS_PREFIX}"
fi
# Install binaries
echo "Installing binaries to ${INSTALL_DIR}/bin..." 1>&2
(cp build/* "${INSTALL_DIR}/bin" 2> /dev/null) \
|| (echo "Please enter your password for installing obitools in ${INSTALL_DIR}" 1>&2
sudo cp build/* "${INSTALL_DIR}/bin")
if ! cp build/* "${INSTALL_DIR}/bin" 2>/dev/null; then
if [ ! -w "${INSTALL_DIR}/bin" ]; then
echo "Please enter your password for installing obitools in ${INSTALL_DIR}" 1>&2
sudo cp build/* "${INSTALL_DIR}/bin"
else
echo "Error: Could not copy binaries to ${INSTALL_DIR}/bin" 1>&2
echo " Source files: $(ls build/ 2>/dev/null || echo 'none found')" 1>&2
echo "" 1>&2
echo "The build directory has been preserved for manual recovery:" 1>&2
echo " $(pwd)/build/" 1>&2
echo "You can install manually with:" 1>&2
echo " cp $(pwd)/build/* ${INSTALL_DIR}/bin/" 1>&2
popd > /dev/null || true
exit 1
fi
fi
popd > /dev/null || exit
Binary file not shown.
BIN
View File
Binary file not shown.
BIN
View File
Binary file not shown.
@@ -0,0 +1,24 @@
>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obitag_bestid":0.9797979797979798,"obitag_bestmatch":"AY227529","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9992 [Marmota]@genus"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obitag_bestid":0.9405940594059405,"obitag_bestmatch":"AF154263","obitag_match_count":9,"obitag_rank":"infraorder","obitag_similarity_method":"lcs","taxid":"taxon:35500 [Pecora]@infraorder"}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obitag_bestid":1,"obitag_bestmatch":"AB245427","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9860 [Cervus elaphus]@species"}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obitag_bestid":0.9494949494949495,"obitag_bestmatch":"AY227530","obitag_match_count":2,"obitag_rank":"tribe","obitag_similarity_method":"lcs","taxid":"taxon:337730 [Marmotini]@tribe"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obitag_bestid":0.9595959595959596,"obitag_bestmatch":"AC187326","obitag_match_count":1,"obitag_rank":"subspecies","obitag_similarity_method":"lcs","taxid":"taxon:9615 [Canis lupus familiaris]@subspecies"}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obitag_bestid":1,"obitag_bestmatch":"AJ885202","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obitag_bestid":1,"obitag_bestmatch":"AJ972683","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obitag_bestid":1,"obitag_bestmatch":"AB048590","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9611 [Canis]@genus"}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
+48
View File
@@ -0,0 +1,48 @@
taxid,parent,taxonomic_rank,scientific_name
taxon:1 [root]@no rank,taxon:1 [root]@no rank,no rank,root
taxon:131567 [cellular organisms]@cellular root,taxon:1 [root]@no rank,cellular root,cellular organisms
taxon:2759 [Eukaryota]@domain,taxon:131567 [cellular organisms]@cellular root,domain,Eukaryota
taxon:33154 [Opisthokonta]@clade,taxon:2759 [Eukaryota]@domain,clade,Opisthokonta
taxon:33208 [Metazoa]@kingdom,taxon:33154 [Opisthokonta]@clade,kingdom,Metazoa
taxon:6072 [Eumetazoa]@clade,taxon:33208 [Metazoa]@kingdom,clade,Eumetazoa
taxon:33213 [Bilateria]@clade,taxon:6072 [Eumetazoa]@clade,clade,Bilateria
taxon:33511 [Deuterostomia]@clade,taxon:33213 [Bilateria]@clade,clade,Deuterostomia
taxon:7711 [Chordata]@phylum,taxon:33511 [Deuterostomia]@clade,phylum,Chordata
taxon:89593 [Craniata]@subphylum,taxon:7711 [Chordata]@phylum,subphylum,Craniata
taxon:7742 [Vertebrata]@clade,taxon:89593 [Craniata]@subphylum,clade,Vertebrata
taxon:7776 [Gnathostomata]@clade,taxon:7742 [Vertebrata]@clade,clade,Gnathostomata
taxon:117570 [Teleostomi]@clade,taxon:7776 [Gnathostomata]@clade,clade,Teleostomi
taxon:117571 [Euteleostomi]@clade,taxon:117570 [Teleostomi]@clade,clade,Euteleostomi
taxon:8287 [Sarcopterygii]@superclass,taxon:117571 [Euteleostomi]@clade,superclass,Sarcopterygii
taxon:1338369 [Dipnotetrapodomorpha]@clade,taxon:8287 [Sarcopterygii]@superclass,clade,Dipnotetrapodomorpha
taxon:32523 [Tetrapoda]@clade,taxon:1338369 [Dipnotetrapodomorpha]@clade,clade,Tetrapoda
taxon:32524 [Amniota]@clade,taxon:32523 [Tetrapoda]@clade,clade,Amniota
taxon:40674 [Mammalia]@class,taxon:32524 [Amniota]@clade,class,Mammalia
taxon:32525 [Theria]@clade,taxon:40674 [Mammalia]@class,clade,Theria
taxon:9347 [Eutheria]@clade,taxon:32525 [Theria]@clade,clade,Eutheria
taxon:1437010 [Boreoeutheria]@clade,taxon:9347 [Eutheria]@clade,clade,Boreoeutheria
taxon:314146 [Euarchontoglires]@superorder,taxon:1437010 [Boreoeutheria]@clade,superorder,Euarchontoglires
taxon:314145 [Laurasiatheria]@superorder,taxon:1437010 [Boreoeutheria]@clade,superorder,Laurasiatheria
taxon:33554 [Carnivora]@order,taxon:314145 [Laurasiatheria]@superorder,order,Carnivora
taxon:91561 [Artiodactyla]@order,taxon:314145 [Laurasiatheria]@superorder,order,Artiodactyla
taxon:314147 [Glires]@clade,taxon:314146 [Euarchontoglires]@superorder,clade,Glires
taxon:9845 [Ruminantia]@suborder,taxon:91561 [Artiodactyla]@order,suborder,Ruminantia
taxon:35500 [Pecora]@infraorder,taxon:9845 [Ruminantia]@suborder,infraorder,Pecora
taxon:9989 [Rodentia]@order,taxon:314147 [Glires]@clade,order,Rodentia
taxon:379584 [Caniformia]@suborder,taxon:33554 [Carnivora]@order,suborder,Caniformia
taxon:9608 [Canidae]@family,taxon:379584 [Caniformia]@suborder,family,Canidae
taxon:9850 [Cervidae]@family,taxon:35500 [Pecora]@infraorder,family,Cervidae
taxon:9881 [Odocoileinae]@subfamily,taxon:9850 [Cervidae]@family,subfamily,Odocoileinae
taxon:33553 [Sciuromorpha]@suborder,taxon:9989 [Rodentia]@order,suborder,Sciuromorpha
taxon:55153 [Sciuridae]@family,taxon:33553 [Sciuromorpha]@suborder,family,Sciuridae
taxon:34878 [Cervinae]@subfamily,taxon:9850 [Cervidae]@family,subfamily,Cervinae
taxon:9611 [Canis]@genus,taxon:9608 [Canidae]@family,genus,Canis
taxon:9857 [Capreolus]@genus,taxon:9881 [Odocoileinae]@subfamily,genus,Capreolus
taxon:9612 [Canis lupus]@species,taxon:9611 [Canis]@genus,species,Canis lupus
taxon:337726 [Xerinae]@subfamily,taxon:55153 [Sciuridae]@family,subfamily,Xerinae
taxon:9859 [Cervus]@genus,taxon:34878 [Cervinae]@subfamily,genus,Cervus
taxon:337730 [Marmotini]@tribe,taxon:337726 [Xerinae]@subfamily,tribe,Marmotini
taxon:9992 [Marmota]@genus,taxon:337730 [Marmotini]@tribe,genus,Marmota
taxon:9860 [Cervus elaphus]@species,taxon:9859 [Cervus]@genus,species,Cervus elaphus
taxon:9615 [Canis lupus familiaris]@subspecies,taxon:9612 [Canis lupus]@species,subspecies,Canis lupus familiaris
taxon:9858 [Capreolus capreolus]@species,taxon:9857 [Capreolus]@genus,species,Capreolus capreolus
1 taxid parent taxonomic_rank scientific_name
2 taxon:1 [root]@no rank taxon:1 [root]@no rank no rank root
3 taxon:131567 [cellular organisms]@cellular root taxon:1 [root]@no rank cellular root cellular organisms
4 taxon:2759 [Eukaryota]@domain taxon:131567 [cellular organisms]@cellular root domain Eukaryota
5 taxon:33154 [Opisthokonta]@clade taxon:2759 [Eukaryota]@domain clade Opisthokonta
6 taxon:33208 [Metazoa]@kingdom taxon:33154 [Opisthokonta]@clade kingdom Metazoa
7 taxon:6072 [Eumetazoa]@clade taxon:33208 [Metazoa]@kingdom clade Eumetazoa
8 taxon:33213 [Bilateria]@clade taxon:6072 [Eumetazoa]@clade clade Bilateria
9 taxon:33511 [Deuterostomia]@clade taxon:33213 [Bilateria]@clade clade Deuterostomia
10 taxon:7711 [Chordata]@phylum taxon:33511 [Deuterostomia]@clade phylum Chordata
11 taxon:89593 [Craniata]@subphylum taxon:7711 [Chordata]@phylum subphylum Craniata
12 taxon:7742 [Vertebrata]@clade taxon:89593 [Craniata]@subphylum clade Vertebrata
13 taxon:7776 [Gnathostomata]@clade taxon:7742 [Vertebrata]@clade clade Gnathostomata
14 taxon:117570 [Teleostomi]@clade taxon:7776 [Gnathostomata]@clade clade Teleostomi
15 taxon:117571 [Euteleostomi]@clade taxon:117570 [Teleostomi]@clade clade Euteleostomi
16 taxon:8287 [Sarcopterygii]@superclass taxon:117571 [Euteleostomi]@clade superclass Sarcopterygii
17 taxon:1338369 [Dipnotetrapodomorpha]@clade taxon:8287 [Sarcopterygii]@superclass clade Dipnotetrapodomorpha
18 taxon:32523 [Tetrapoda]@clade taxon:1338369 [Dipnotetrapodomorpha]@clade clade Tetrapoda
19 taxon:32524 [Amniota]@clade taxon:32523 [Tetrapoda]@clade clade Amniota
20 taxon:40674 [Mammalia]@class taxon:32524 [Amniota]@clade class Mammalia
21 taxon:32525 [Theria]@clade taxon:40674 [Mammalia]@class clade Theria
22 taxon:9347 [Eutheria]@clade taxon:32525 [Theria]@clade clade Eutheria
23 taxon:1437010 [Boreoeutheria]@clade taxon:9347 [Eutheria]@clade clade Boreoeutheria
24 taxon:314146 [Euarchontoglires]@superorder taxon:1437010 [Boreoeutheria]@clade superorder Euarchontoglires
25 taxon:314145 [Laurasiatheria]@superorder taxon:1437010 [Boreoeutheria]@clade superorder Laurasiatheria
26 taxon:33554 [Carnivora]@order taxon:314145 [Laurasiatheria]@superorder order Carnivora
27 taxon:91561 [Artiodactyla]@order taxon:314145 [Laurasiatheria]@superorder order Artiodactyla
28 taxon:314147 [Glires]@clade taxon:314146 [Euarchontoglires]@superorder clade Glires
29 taxon:9845 [Ruminantia]@suborder taxon:91561 [Artiodactyla]@order suborder Ruminantia
30 taxon:35500 [Pecora]@infraorder taxon:9845 [Ruminantia]@suborder infraorder Pecora
31 taxon:9989 [Rodentia]@order taxon:314147 [Glires]@clade order Rodentia
32 taxon:379584 [Caniformia]@suborder taxon:33554 [Carnivora]@order suborder Caniformia
33 taxon:9608 [Canidae]@family taxon:379584 [Caniformia]@suborder family Canidae
34 taxon:9850 [Cervidae]@family taxon:35500 [Pecora]@infraorder family Cervidae
35 taxon:9881 [Odocoileinae]@subfamily taxon:9850 [Cervidae]@family subfamily Odocoileinae
36 taxon:33553 [Sciuromorpha]@suborder taxon:9989 [Rodentia]@order suborder Sciuromorpha
37 taxon:55153 [Sciuridae]@family taxon:33553 [Sciuromorpha]@suborder family Sciuridae
38 taxon:34878 [Cervinae]@subfamily taxon:9850 [Cervidae]@family subfamily Cervinae
39 taxon:9611 [Canis]@genus taxon:9608 [Canidae]@family genus Canis
40 taxon:9857 [Capreolus]@genus taxon:9881 [Odocoileinae]@subfamily genus Capreolus
41 taxon:9612 [Canis lupus]@species taxon:9611 [Canis]@genus species Canis lupus
42 taxon:337726 [Xerinae]@subfamily taxon:55153 [Sciuridae]@family subfamily Xerinae
43 taxon:9859 [Cervus]@genus taxon:34878 [Cervinae]@subfamily genus Cervus
44 taxon:337730 [Marmotini]@tribe taxon:337726 [Xerinae]@subfamily tribe Marmotini
45 taxon:9992 [Marmota]@genus taxon:337730 [Marmotini]@tribe genus Marmota
46 taxon:9860 [Cervus elaphus]@species taxon:9859 [Cervus]@genus species Cervus elaphus
47 taxon:9615 [Canis lupus familiaris]@subspecies taxon:9612 [Canis lupus]@species subspecies Canis lupus familiaris
48 taxon:9858 [Capreolus capreolus]@species taxon:9857 [Capreolus]@genus species Capreolus capreolus
+139 -15
View File
@@ -44,7 +44,7 @@ cleanup() {
rm -rf "$TMPDIR" # Suppress the temporary directory
if [ $failed -gt 0 ]; then
log "$TEST_NAME tests failed"
log "$TEST_NAME tests failed"
log
log
exit 1
@@ -60,10 +60,10 @@ log() {
echo -e "[$TEST_NAME @ $(date)] $*" 1>&2
}
log "Testing $TEST_NAME..."
log "Test directory is $TEST_DIR"
log "obitools directory is $OBITOOLS_DIR"
log "Temporary directory is $TMPDIR"
log "Testing $TEST_NAME..."
log "Test directory is $TEST_DIR"
log "obitools directory is $OBITOOLS_DIR"
log "Temporary directory is $TMPDIR"
log "files: $(find $TEST_DIR | awk -F'/' '{print $NF}' | tail -n +2)"
######################################################################
@@ -94,12 +94,12 @@ log "files: $(find $TEST_DIR | awk -F'/' '{print $NF}' | tail -n +2)"
((ntest++))
if $CMD -h > "${TMPDIR}/help.txt" 2>&1
if $CMD -h > "${TMPDIR}/help.txt" 2>&1
then
log "$MCMD: printing help OK"
log "$MCMD: printing help OK"
((success++))
else
log "$MCMD: printing help failed"
log "$MCMD: printing help failed"
((failed++))
fi
@@ -108,15 +108,15 @@ fi
if obiconvert -Z "${TEST_DIR}/gbpln1088.4Mb.fasta.gz" \
> "${TMPDIR}/xxx.fasta.gz" && \
zdiff "${TEST_DIR}/gbpln1088.4Mb.fasta.gz" \
"${TMPDIR}/xxx.fasta.gz"
"${TMPDIR}/xxx.fasta.gz"
then
log "$MCMD: converting large fasta file to fasta OK"
log "$MCMD: converting large fasta file to fasta OK"
((success++))
else
log "$MCMD: converting large fasta file to fasta failed"
log "$MCMD: converting large fasta file to fasta failed"
((failed++))
fi
((ntest++))
if obiconvert -Z --fastq-output \
"${TEST_DIR}/gbpln1088.4Mb.fasta.gz" \
@@ -125,15 +125,139 @@ if obiconvert -Z --fastq-output \
"${TMPDIR}/xxx.fastq.gz" \
> "${TMPDIR}/yyy.fasta.gz" && \
zdiff "${TEST_DIR}/gbpln1088.4Mb.fasta.gz" \
"${TMPDIR}/yyy.fasta.gz"
"${TMPDIR}/yyy.fasta.gz"
then
log "$MCMD: converting large file between fasta and fastq OK"
log "$MCMD: converting large file between fasta and fastq OK"
((success++))
else
log "$MCMD: converting large file between fasta and fastq failed"
log "$MCMD: converting large file between fasta and fastq failed"
((failed++))
fi
# ------------------------------------------------------------------
# --raw-taxid tests (no taxonomy loaded)
# ------------------------------------------------------------------
# Running test
((ntest++))
if obiconvert --raw-taxid "${TEST_DIR}/out_ecotag.fasta" \
> "${TMPDIR}/raw_taxid.fasta" 2>/dev/null
then
log "$MCMD --raw-taxid: running OK"
((success++))
else
log "$MCMD --raw-taxid: running failed"
((failed++))
fi
# Taxids must be bare numbers — no full-format "taxon:ID [Name]@rank" strings
((ntest++))
if grep '"taxid"' "${TMPDIR}/raw_taxid.fasta" | grep -qv '"taxid":"[0-9][0-9]*"'
then
log "$MCMD --raw-taxid: taxid format check failed (full-format taxid found)"
((failed++))
else
log "$MCMD --raw-taxid: taxid format OK (all taxids are bare numbers)"
((success++))
fi
# --raw-taxid is idempotent: piping through a second obiconvert --raw-taxid must
# produce bit-for-bit identical output.
((ntest++))
if obiconvert --raw-taxid "${TMPDIR}/raw_taxid.fasta" \
> "${TMPDIR}/raw_taxid2.fasta" 2>/dev/null
then
log "$MCMD --raw-taxid piped: running OK"
((success++))
else
log "$MCMD --raw-taxid piped: running failed"
((failed++))
fi
((ntest++))
if diff "${TMPDIR}/raw_taxid.fasta" \
"${TMPDIR}/raw_taxid2.fasta" > /dev/null
then
log "$MCMD --raw-taxid piped: idempotency OK"
((success++))
else
log "$MCMD --raw-taxid piped: idempotency failed (outputs differ)"
((failed++))
fi
# ------------------------------------------------------------------
# --taxonomy tests (full-format taxid, no --raw-taxid)
# ------------------------------------------------------------------
# Running test
((ntest++))
if obiconvert --taxonomy "${TEST_DIR}/taxonomy.csv" \
"${TEST_DIR}/out_ecotag.fasta" \
> "${TMPDIR}/taxo.fasta" 2>/dev/null
then
log "$MCMD --taxonomy: running OK"
((success++))
else
log "$MCMD --taxonomy: running failed"
((failed++))
fi
# Taxids must be in full "taxon:ID [Name]@rank" format
((ntest++))
if grep '"taxid"' "${TMPDIR}/taxo.fasta" | grep -q '"taxid":"taxon:[0-9]'
then
log "$MCMD --taxonomy: taxid format OK (full-format taxids present)"
((success++))
else
log "$MCMD --taxonomy: taxid format check failed (no full-format taxid found)"
((failed++))
fi
# ------------------------------------------------------------------
# --raw-taxid --taxonomy tests
# ------------------------------------------------------------------
# Running test
((ntest++))
if obiconvert --raw-taxid --taxonomy "${TEST_DIR}/taxonomy.csv" \
"${TEST_DIR}/out_ecotag.fasta" \
> "${TMPDIR}/raw_taxid_taxo.fasta" 2>/dev/null
then
log "$MCMD --raw-taxid --taxonomy: running OK"
((success++))
else
log "$MCMD --raw-taxid --taxonomy: running failed"
((failed++))
fi
# Taxids must be bare numbers even when taxonomy is loaded
((ntest++))
if grep '"taxid"' "${TMPDIR}/raw_taxid_taxo.fasta" | grep -qv '"taxid":"[0-9][0-9]*"'
then
log "$MCMD --raw-taxid --taxonomy: taxid format check failed (full-format taxid found)"
((failed++))
else
log "$MCMD --raw-taxid --taxonomy: taxid format OK (all taxids are bare numbers)"
((success++))
fi
# --raw-taxid with or without taxonomy must yield identical taxid values
((ntest++))
if diff <(grep '"taxid"' "${TMPDIR}/raw_taxid.fasta" | grep -o '"taxid":"[^"]*"' | sort) \
<(grep '"taxid"' "${TMPDIR}/raw_taxid_taxo.fasta" | grep -o '"taxid":"[^"]*"' | sort) \
> /dev/null
then
log "$MCMD --raw-taxid vs --raw-taxid --taxonomy: taxid values match OK"
((success++))
else
log "$MCMD --raw-taxid vs --raw-taxid --taxonomy: taxid values differ (unexpected)"
((failed++))
fi
#########################################
#
# At the end of the tests
+24
View File
@@ -0,0 +1,24 @@
>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obitag_bestid":0.9797979797979798,"obitag_bestmatch":"AY227529","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9992 [Marmota]@genus"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obitag_bestid":0.9405940594059405,"obitag_bestmatch":"AF154263","obitag_match_count":9,"obitag_rank":"infraorder","obitag_similarity_method":"lcs","taxid":"taxon:35500 [Pecora]@infraorder"}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obitag_bestid":1,"obitag_bestmatch":"AB245427","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9860 [Cervus elaphus]@species"}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obitag_bestid":0.9494949494949495,"obitag_bestmatch":"AY227530","obitag_match_count":2,"obitag_rank":"tribe","obitag_similarity_method":"lcs","taxid":"taxon:337730 [Marmotini]@tribe"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obitag_bestid":0.9595959595959596,"obitag_bestmatch":"AC187326","obitag_match_count":1,"obitag_rank":"subspecies","obitag_similarity_method":"lcs","taxid":"taxon:9615 [Canis lupus familiaris]@subspecies"}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obitag_bestid":1,"obitag_bestmatch":"AJ885202","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obitag_bestid":1,"obitag_bestmatch":"AJ972683","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obitag_bestid":1,"obitag_bestmatch":"AB048590","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9611 [Canis]@genus"}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
+46 -1
View File
@@ -1,6 +1,12 @@
package obidefault
var _BatchSize = 2000
// _BatchSize is the minimum number of sequences per batch (floor).
// Used as the minSeqs argument to RebatchBySize.
var _BatchSize = 1
// _BatchSizeMax is the maximum number of sequences per batch (ceiling).
// A batch is flushed when this count is reached regardless of memory usage.
var _BatchSizeMax = 2000
// SetBatchSize sets the size of the sequence batches.
//
@@ -24,3 +30,42 @@ func BatchSize() int {
func BatchSizePtr() *int {
return &_BatchSize
}
// BatchSizeMax returns the maximum number of sequences per batch.
func BatchSizeMax() int {
return _BatchSizeMax
}
func BatchSizeMaxPtr() *int {
return &_BatchSizeMax
}
// _BatchMem holds the maximum cumulative memory (in bytes) per batch when
// memory-based batching is requested. A value of 0 disables memory-based
// batching and falls back to count-based batching.
var _BatchMem = 128 * 1024 * 1024 // 128 MB default; set to 0 to disable
var _BatchMemStr = ""
// SetBatchMem sets the memory budget per batch in bytes.
func SetBatchMem(n int) {
_BatchMem = n
}
// BatchMem returns the current memory budget per batch in bytes.
// A value of 0 means memory-based batching is disabled.
func BatchMem() int {
return _BatchMem
}
func BatchMemPtr() *int {
return &_BatchMem
}
// BatchMemStr returns the raw --batch-mem string value as provided on the CLI.
func BatchMemStr() string {
return _BatchMemStr
}
func BatchMemStrPtr() *string {
return &_BatchMemStr
}
+152 -2
View File
@@ -161,6 +161,149 @@ func EmblChunkParser(withFeatureTable, UtoT bool) func(string, io.Reader) (obise
return parser
}
// extractEmblSeq scans the sequence section of an EMBL record directly on the
// rope. EMBL sequence lines start with 5 spaces followed by bases in groups of
// 10, separated by spaces, with a position number at the end. The section ends
// with "//".
func (s *ropeScanner) extractEmblSeq(dest []byte, UtoT bool) []byte {
// We use ReadLine and scan each line for bases (skip digits, spaces, newlines).
for {
line := s.ReadLine()
if line == nil {
break
}
if len(line) >= 2 && line[0] == '/' && line[1] == '/' {
break
}
// Lines start with 5 spaces; bases follow separated by single spaces.
// Digits at the end are the position counter — skip them.
// Simplest: take every byte that is a letter.
for _, b := range line {
if b >= 'A' && b <= 'Z' {
b += 'a' - 'A'
}
if UtoT && b == 'u' {
b = 't'
}
if b >= 'a' && b <= 'z' {
dest = append(dest, b)
}
}
}
return dest
}
// EmblChunkParserRope parses an EMBL chunk directly from a rope without Pack().
func EmblChunkParserRope(source string, rope *PieceOfChunk, withFeatureTable, UtoT bool) (obiseq.BioSequenceSlice, error) {
scanner := newRopeScanner(rope)
sequences := obiseq.MakeBioSequenceSlice(100)[:0]
var id string
var scientificName string
defBytes := make([]byte, 0, 256)
featBytes := make([]byte, 0, 1024)
var taxid int
inSeq := false
for {
line := scanner.ReadLine()
if line == nil {
break
}
if inSeq {
// Should not happen — extractEmblSeq consumed up to "//"
inSeq = false
continue
}
switch {
case bytes.HasPrefix(line, []byte("ID ")):
id = string(bytes.SplitN(line[5:], []byte(";"), 2)[0])
case bytes.HasPrefix(line, []byte("OS ")):
scientificName = string(bytes.TrimSpace(line[5:]))
case bytes.HasPrefix(line, []byte("DE ")):
if len(defBytes) > 0 {
defBytes = append(defBytes, ' ')
}
defBytes = append(defBytes, bytes.TrimSpace(line[5:])...)
case withFeatureTable && bytes.HasPrefix(line, []byte("FH ")):
featBytes = append(featBytes, line...)
case withFeatureTable && bytes.Equal(line, []byte("FH")):
featBytes = append(featBytes, '\n')
featBytes = append(featBytes, line...)
case bytes.HasPrefix(line, []byte("FT ")):
if withFeatureTable {
featBytes = append(featBytes, '\n')
featBytes = append(featBytes, line...)
}
if bytes.HasPrefix(line, []byte(`FT /db_xref="taxon:`)) {
rest := line[37:]
end := bytes.IndexByte(rest, '"')
if end > 0 {
taxid, _ = strconv.Atoi(string(rest[:end]))
}
}
case bytes.HasPrefix(line, []byte(" ")):
// First sequence line: extract all bases via extractEmblSeq,
// which also consumes this line's remaining content.
// But ReadLine already consumed this line — we need to process it
// plus subsequent lines. Process this line inline then call helper.
seqDest := make([]byte, 0, 4096)
for _, b := range line {
if b >= 'A' && b <= 'Z' {
b += 'a' - 'A'
}
if UtoT && b == 'u' {
b = 't'
}
if b >= 'a' && b <= 'z' {
seqDest = append(seqDest, b)
}
}
seqDest = scanner.extractEmblSeq(seqDest, UtoT)
seq := obiseq.NewBioSequenceOwning(id, seqDest, string(defBytes))
seq.SetSource(source)
if withFeatureTable {
seq.SetFeatures(featBytes)
}
annot := seq.Annotations()
annot["scientific_name"] = scientificName
annot["taxid"] = taxid
sequences = append(sequences, seq)
// Reset state
id = ""
scientificName = ""
defBytes = defBytes[:0]
featBytes = featBytes[:0]
taxid = 1
case bytes.Equal(line, []byte("//")):
// record ended without SQ/sequence section (e.g. WGS entries)
if id != "" {
seq := obiseq.NewBioSequenceOwning(id, []byte{}, string(defBytes))
seq.SetSource(source)
if withFeatureTable {
seq.SetFeatures(featBytes)
}
annot := seq.Annotations()
annot["scientific_name"] = scientificName
annot["taxid"] = taxid
sequences = append(sequences, seq)
}
id = ""
scientificName = ""
defBytes = defBytes[:0]
featBytes = featBytes[:0]
taxid = 1
}
}
return sequences, nil
}
func _ParseEmblFile(
input ChannelFileChunk,
out obiiter.IBioSequence,
@@ -171,7 +314,14 @@ func _ParseEmblFile(
for chunks := range input {
order := chunks.Order
sequences, err := parser(chunks.Source, chunks.Raw)
var sequences obiseq.BioSequenceSlice
var err error
if chunks.Rope != nil {
sequences, err = EmblChunkParserRope(chunks.Source, chunks.Rope, withFeatureTable, UtoT)
} else {
sequences, err = parser(chunks.Source, chunks.Raw)
}
if err != nil {
log.Fatalf("%s : Cannot parse the embl file : %v", chunks.Source, err)
@@ -196,7 +346,7 @@ func ReadEMBL(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, er
1024*1024*128,
EndOfLastFlatFileEntry,
"\nID ",
true,
false,
)
newIter := obiiter.MakeIBioSequence()
+99 -6
View File
@@ -209,28 +209,121 @@ func FastaChunkParser(UtoT bool) func(string, io.Reader) (obiseq.BioSequenceSlic
return parser
}
// extractFastaSeq scans sequence bytes from the rope directly into dest,
// appending valid nucleotide characters and skipping whitespace.
// Stops when '>' is found at the start of a line (next record) or at EOF.
// Returns (dest with appended bases, hasMore).
// hasMore=true means scanner is now positioned at '>' of the next record.
func (s *ropeScanner) extractFastaSeq(dest []byte, UtoT bool) ([]byte, bool) {
lineStart := true
for s.current != nil {
data := s.current.data[s.pos:]
for i, b := range data {
if lineStart && b == '>' {
s.pos += i
if s.pos >= len(s.current.data) {
s.current = s.current.Next()
s.pos = 0
}
return dest, true
}
if b == '\n' || b == '\r' {
lineStart = true
continue
}
lineStart = false
if b == ' ' || b == '\t' {
continue
}
if b >= 'A' && b <= 'Z' {
b += 'a' - 'A'
}
if UtoT && b == 'u' {
b = 't'
}
dest = append(dest, b)
}
s.current = s.current.Next()
s.pos = 0
}
return dest, false
}
// FastaChunkParserRope parses a FASTA chunk directly from the rope without Pack().
func FastaChunkParserRope(source string, rope *PieceOfChunk, UtoT bool) (obiseq.BioSequenceSlice, error) {
scanner := newRopeScanner(rope)
sequences := obiseq.MakeBioSequenceSlice(100)[:0]
for {
bline := scanner.ReadLine()
if bline == nil {
break
}
if len(bline) == 0 || bline[0] != '>' {
continue
}
// Parse header: ">id definition"
header := bline[1:]
var id string
var definition string
sp := bytes.IndexByte(header, ' ')
if sp < 0 {
sp = bytes.IndexByte(header, '\t')
}
if sp < 0 {
id = string(header)
} else {
id = string(header[:sp])
definition = string(bytes.TrimSpace(header[sp+1:]))
}
seqDest := make([]byte, 0, 4096)
var hasMore bool
seqDest, hasMore = scanner.extractFastaSeq(seqDest, UtoT)
if len(seqDest) == 0 {
log.Fatalf("%s [%s]: sequence is empty", source, id)
}
seq := obiseq.NewBioSequenceOwning(id, seqDest, definition)
seq.SetSource(source)
sequences = append(sequences, seq)
if !hasMore {
break
}
}
return sequences, nil
}
func _ParseFastaFile(
input ChannelFileChunk,
out obiiter.IBioSequence,
UtoT bool,
) {
parser := FastaChunkParser(UtoT)
for chunks := range input {
sequences, err := parser(chunks.Source, chunks.Raw)
// obilog.Warnf("Chunck(%d:%d) -%d- ", chunks.Order, l, sequences.Len())
var sequences obiseq.BioSequenceSlice
var err error
if chunks.Rope != nil {
sequences, err = FastaChunkParserRope(chunks.Source, chunks.Rope, UtoT)
} else {
sequences, err = parser(chunks.Source, chunks.Raw)
}
if err != nil {
log.Fatalf("File %s : Cannot parse the fasta file : %v", chunks.Source, err)
}
out.Push(obiiter.MakeBioSequenceBatch(chunks.Source, chunks.Order, sequences))
}
out.Done()
}
func ReadFasta(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
@@ -245,7 +338,7 @@ func ReadFasta(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, e
1024*1024,
EndOfLastFastaEntry,
"\n>",
true,
false,
)
for i := 0; i < nworker; i++ {
+83 -2
View File
@@ -303,6 +303,80 @@ func FastqChunkParser(quality_shift byte, with_quality bool, UtoT bool) func(str
return parser
}
// FastqChunkParserRope parses a FASTQ chunk directly from a rope without Pack().
func FastqChunkParserRope(source string, rope *PieceOfChunk, quality_shift byte, with_quality, UtoT bool) (obiseq.BioSequenceSlice, error) {
scanner := newRopeScanner(rope)
sequences := obiseq.MakeBioSequenceSlice(100)[:0]
for {
// Line 1: @id [definition]
hline := scanner.ReadLine()
if hline == nil {
break
}
if len(hline) == 0 || hline[0] != '@' {
continue
}
header := hline[1:]
var id string
var definition string
sp := bytes.IndexByte(header, ' ')
if sp < 0 {
sp = bytes.IndexByte(header, '\t')
}
if sp < 0 {
id = string(header)
} else {
id = string(header[:sp])
definition = string(bytes.TrimSpace(header[sp+1:]))
}
// Line 2: sequence
sline := scanner.ReadLine()
if sline == nil {
log.Fatalf("@%s[%s]: unexpected EOF after header", id, source)
}
seqDest := make([]byte, len(sline))
w := 0
for _, b := range sline {
if b >= 'A' && b <= 'Z' {
b += 'a' - 'A'
}
if UtoT && b == 'u' {
b = 't'
}
seqDest[w] = b
w++
}
seqDest = seqDest[:w]
if len(seqDest) == 0 {
log.Fatalf("@%s[%s]: sequence is empty", id, source)
}
// Line 3: + (skip)
scanner.ReadLine()
// Line 4: quality
qline := scanner.ReadLine()
seq := obiseq.NewBioSequenceOwning(id, seqDest, definition)
seq.SetSource(source)
if with_quality && qline != nil {
qDest := make([]byte, len(qline))
copy(qDest, qline)
for i := range qDest {
qDest[i] -= quality_shift
}
seq.TakeQualities(qDest)
}
sequences = append(sequences, seq)
}
return sequences, nil
}
func _ParseFastqFile(
input ChannelFileChunk,
out obiiter.IBioSequence,
@@ -313,7 +387,14 @@ func _ParseFastqFile(
parser := FastqChunkParser(quality_shift, with_quality, UtoT)
for chunks := range input {
sequences, err := parser(chunks.Source, chunks.Raw)
var sequences obiseq.BioSequenceSlice
var err error
if chunks.Rope != nil {
sequences, err = FastqChunkParserRope(chunks.Source, chunks.Rope, quality_shift, with_quality, UtoT)
} else {
sequences, err = parser(chunks.Source, chunks.Raw)
}
if err != nil {
log.Fatalf("File %s : Cannot parse the fastq file : %v", chunks.Source, err)
@@ -339,7 +420,7 @@ func ReadFastq(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, e
1024*1024,
EndOfLastFastqEntry,
"\n@",
true,
false,
)
for i := 0; i < nworker; i++ {
+1 -1
View File
@@ -296,7 +296,7 @@ func _parse_json_header_(header string, sequence *obiseq.BioSequence) string {
case strings.HasSuffix(skey, "_taxid"):
if dataType == jsonparser.Number || dataType == jsonparser.String {
rank, _ := obiutils.SplitInTwo(skey, '_')
rank := skey[:len(skey)-len("_taxid")]
taxid := string(value)
sequence.SetTaxid(taxid, rank)
+2 -79
View File
@@ -29,70 +29,11 @@ const (
var _seqlenght_rx = regexp.MustCompile(" +([0-9]+) bp")
// gbRopeScanner reads lines from a PieceOfChunk rope without heap allocation.
// The carry buffer (stack) handles lines that span two rope nodes.
type gbRopeScanner struct {
current *PieceOfChunk
pos int
carry [256]byte // max GenBank line = 80 chars; 256 gives ample margin
carryN int
}
func newGbRopeScanner(rope *PieceOfChunk) *gbRopeScanner {
return &gbRopeScanner{current: rope}
}
// ReadLine returns the next line without the trailing \n (or \r\n).
// Returns nil at end of rope. The returned slice aliases carry[] or the node
// data and is valid only until the next ReadLine call.
func (s *gbRopeScanner) ReadLine() []byte {
for {
if s.current == nil {
if s.carryN > 0 {
n := s.carryN
s.carryN = 0
return s.carry[:n]
}
return nil
}
data := s.current.data[s.pos:]
idx := bytes.IndexByte(data, '\n')
if idx >= 0 {
var line []byte
if s.carryN == 0 {
line = data[:idx]
} else {
n := copy(s.carry[s.carryN:], data[:idx])
s.carryN += n
line = s.carry[:s.carryN]
s.carryN = 0
}
s.pos += idx + 1
if s.pos >= len(s.current.data) {
s.current = s.current.Next()
s.pos = 0
}
if len(line) > 0 && line[len(line)-1] == '\r' {
line = line[:len(line)-1]
}
return line
}
// No \n in this node: accumulate into carry and advance
n := copy(s.carry[s.carryN:], data)
s.carryN += n
s.current = s.current.Next()
s.pos = 0
}
}
// extractSequence scans the ORIGIN section byte-by-byte directly on the rope,
// appending compacted bases to dest. Returns the extended slice.
// Stops and returns when "//" is found at the start of a line.
// The scanner is left positioned after the "//" line.
func (s *gbRopeScanner) extractSequence(dest []byte, UtoT bool) []byte {
func (s *ropeScanner) extractSequence(dest []byte, UtoT bool) []byte {
lineStart := true
skipDigits := true
@@ -139,24 +80,6 @@ func (s *gbRopeScanner) extractSequence(dest []byte, UtoT bool) []byte {
return dest
}
// skipToNewline advances the scanner past the next '\n'.
func (s *gbRopeScanner) skipToNewline() {
for s.current != nil {
data := s.current.data[s.pos:]
idx := bytes.IndexByte(data, '\n')
if idx >= 0 {
s.pos += idx + 1
if s.pos >= len(s.current.data) {
s.current = s.current.Next()
s.pos = 0
}
return
}
s.current = s.current.Next()
s.pos = 0
}
}
// parseLseqFromLocus extracts the declared sequence length from a LOCUS line.
// Format: "LOCUS <id> <length> bp ..."
// Returns -1 if not found or parse error.
@@ -205,7 +128,7 @@ func GenbankChunkParserRope(source string, rope *PieceOfChunk,
withFeatureTable, UtoT bool) (obiseq.BioSequenceSlice, error) {
state := inHeader
scanner := newGbRopeScanner(rope)
scanner := newRopeScanner(rope)
sequences := obiseq.MakeBioSequenceSlice(100)[:0]
id := ""
+5 -5
View File
@@ -631,9 +631,9 @@ func ReadCSVNGSFilter(reader io.Reader) (*obingslibrary.NGSLibrary, error) {
return nil, fmt.Errorf("row %d has %d columns, expected %d", len(data), len(fields), len(header))
}
forward_primer := fields[forward_primerColIndex]
reverse_primer := fields[reverse_primerColIndex]
tags := _parseMainNGSFilterTags(fields[sample_tagColIndex])
forward_primer := strings.TrimSpace(fields[forward_primerColIndex])
reverse_primer := strings.TrimSpace(fields[reverse_primerColIndex])
tags := _parseMainNGSFilterTags(strings.TrimSpace(fields[sample_tagColIndex]))
marker, _ := ngsfilter.GetMarker(forward_primer, reverse_primer)
pcr, ok := marker.GetPCR(tags.Forward, tags.Reverse)
@@ -644,8 +644,8 @@ func ReadCSVNGSFilter(reader io.Reader) (*obingslibrary.NGSLibrary, error) {
i, tags.Forward, tags.Reverse, forward_primer, reverse_primer)
}
pcr.Experiment = fields[experimentColIndex]
pcr.Sample = fields[sampleColIndex]
pcr.Experiment = strings.TrimSpace(fields[experimentColIndex])
pcr.Sample = strings.TrimSpace(fields[sampleColIndex])
if extraColumns != nil {
pcr.Annotations = make(obiseq.Annotation)
+77
View File
@@ -0,0 +1,77 @@
package obiformats
import "bytes"
// ropeScanner reads lines from a PieceOfChunk rope.
// The carry buffer handles lines that span two rope nodes; it grows as needed.
type ropeScanner struct {
current *PieceOfChunk
pos int
carry []byte
}
func newRopeScanner(rope *PieceOfChunk) *ropeScanner {
return &ropeScanner{current: rope}
}
// ReadLine returns the next line without the trailing \n (or \r\n).
// Returns nil at end of rope. The returned slice aliases carry[] or the node
// data and is valid only until the next ReadLine call.
func (s *ropeScanner) ReadLine() []byte {
for {
if s.current == nil {
if len(s.carry) > 0 {
line := s.carry
s.carry = s.carry[:0]
return line
}
return nil
}
data := s.current.data[s.pos:]
idx := bytes.IndexByte(data, '\n')
if idx >= 0 {
var line []byte
if len(s.carry) == 0 {
line = data[:idx]
} else {
s.carry = append(s.carry, data[:idx]...)
line = s.carry
s.carry = s.carry[:0]
}
s.pos += idx + 1
if s.pos >= len(s.current.data) {
s.current = s.current.Next()
s.pos = 0
}
if len(line) > 0 && line[len(line)-1] == '\r' {
line = line[:len(line)-1]
}
return line
}
// No \n in this node: accumulate into carry and advance
s.carry = append(s.carry, data...)
s.current = s.current.Next()
s.pos = 0
}
}
// skipToNewline advances the scanner past the next '\n'.
func (s *ropeScanner) skipToNewline() {
for s.current != nil {
data := s.current.data[s.pos:]
idx := bytes.IndexByte(data, '\n')
if idx >= 0 {
s.pos += idx + 1
if s.pos >= len(s.current.data) {
s.current = s.current.Next()
s.pos = 0
}
return
}
s.current = s.current.Next()
s.pos = 0
}
}
+63 -2
View File
@@ -444,6 +444,67 @@ func (iterator IBioSequence) Rebatch(size int) IBioSequence {
return newIter
}
// RebatchBySize reorganises the stream into batches bounded by two independent
// upper limits: maxCount (max number of sequences) and maxBytes (max cumulative
// estimated memory). A batch is flushed as soon as either limit would be
// exceeded. A single sequence larger than maxBytes is always emitted alone.
// Passing 0 for a limit disables that constraint; if both are 0 it falls back
// to Rebatch(obidefault.BatchSizeMax()).
func (iterator IBioSequence) RebatchBySize(maxBytes int, maxCount int) IBioSequence {
if maxBytes <= 0 && maxCount <= 0 {
return iterator.Rebatch(obidefault.BatchSizeMax())
}
newIter := MakeIBioSequence()
newIter.Add(1)
go func() {
newIter.WaitAndClose()
}()
go func() {
order := 0
iterator = iterator.SortBatches()
buffer := obiseq.MakeBioSequenceSlice()
bufBytes := 0
source := ""
flush := func() {
if len(buffer) > 0 {
newIter.Push(MakeBioSequenceBatch(source, order, buffer))
order++
buffer = obiseq.MakeBioSequenceSlice()
bufBytes = 0
}
}
for iterator.Next() {
seqs := iterator.Get()
source = seqs.Source()
for _, s := range seqs.Slice() {
sz := s.MemorySize()
countFull := maxCount > 0 && len(buffer) >= maxCount
memFull := maxBytes > 0 && bufBytes+sz > maxBytes && len(buffer) > 0
if countFull || memFull {
flush()
}
buffer = append(buffer, s)
bufBytes += sz
}
}
flush()
newIter.Done()
}()
if iterator.IsPaired() {
newIter.MarkAsPaired()
}
return newIter
}
func (iterator IBioSequence) FilterEmpty() IBioSequence {
newIter := MakeIBioSequence()
@@ -638,7 +699,7 @@ func (iterator IBioSequence) FilterOn(predicate obiseq.SequencePredicate,
trueIter.MarkAsPaired()
}
return trueIter.Rebatch(size)
return trueIter.RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax())
}
func (iterator IBioSequence) FilterAnd(predicate obiseq.SequencePredicate,
@@ -694,7 +755,7 @@ func (iterator IBioSequence) FilterAnd(predicate obiseq.SequencePredicate,
trueIter.MarkAsPaired()
}
return trueIter.Rebatch(size)
return trueIter.RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax())
}
// Load all sequences availables from an IBioSequenceBatch iterator into
+18 -24
View File
@@ -57,34 +57,21 @@ func (dist *IDistribute) Classifier() *obiseq.BioSequenceClassifier {
}
// Distribute organizes the biosequences from the iterator into batches
// based on the provided classifier and batch sizes. It returns an
// IDistribute instance that manages the distribution of the sequences.
// based on the provided classifier. It returns an IDistribute instance
// that manages the distribution of the sequences.
//
// Parameters:
// - class: A pointer to a BioSequenceClassifier used to classify
// the biosequences during distribution.
// - sizes: Optional integer values specifying the batch size. If
// no sizes are provided, a default batch size of 5000 is used.
//
// Returns:
// An IDistribute instance that contains the outputs of the
// classified biosequences, a channel for new data notifications,
// and the classifier used for distribution. The method operates
// asynchronously, processing the sequences in separate goroutines.
// It ensures that the outputs are closed and cleaned up once
// processing is complete.
func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, sizes ...int) IDistribute {
batchsize := obidefault.BatchSize()
// Batches are flushed when either BatchSizeMax() sequences or BatchMem()
// bytes are accumulated per key, mirroring the RebatchBySize strategy.
func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier) IDistribute {
maxCount := obidefault.BatchSizeMax()
maxBytes := obidefault.BatchMem()
outputs := make(map[int]IBioSequence, 100)
slices := make(map[int]*obiseq.BioSequenceSlice, 100)
bufBytes := make(map[int]int, 100)
orders := make(map[int]int, 100)
news := make(chan int)
if len(sizes) > 0 {
batchsize = sizes[0]
}
jobDone := sync.WaitGroup{}
lock := sync.Mutex{}
@@ -115,6 +102,7 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
slice = &s
slices[key] = slice
orders[key] = 0
bufBytes[key] = 0
lock.Lock()
outputs[key] = MakeIBioSequence()
@@ -123,14 +111,20 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
news <- key
}
*slice = append(*slice, s)
if len(*slice) == batchsize {
sz := s.MemorySize()
countFull := maxCount > 0 && len(*slice) >= maxCount
memFull := maxBytes > 0 && bufBytes[key]+sz > maxBytes && len(*slice) > 0
if countFull || memFull {
outputs[key].Push(MakeBioSequenceBatch(source, orders[key], *slice))
orders[key]++
s := obiseq.MakeBioSequenceSlice()
slices[key] = &s
slice = &s
bufBytes[key] = 0
}
*slice = append(*slice, s)
bufBytes[key] += sz
}
}
+2 -1
View File
@@ -3,6 +3,7 @@ package obiiter
import (
log "github.com/sirupsen/logrus"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
)
@@ -70,7 +71,7 @@ func IFragments(minsize, length, overlap, size, nworkers int) Pipeable {
}
go f(iterator)
return newiter.SortBatches().Rebatch(size)
return newiter.SortBatches().RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax())
}
return ifrg
+1 -1
View File
@@ -47,7 +47,7 @@ func Encode4mer(seq *obiseq.BioSequence, buffer *[]byte) []byte {
length := slength - 3
rawseq := seq.Sequence()
if length < 0 {
if length <= 0 {
return nil
}
+90 -7
View File
@@ -91,7 +91,7 @@ func LuaWorker(proto *lua.FunctionProto) obiseq.SeqWorker {
err := interpreter.PCall(0, lua.MultRet, nil)
if err != nil {
log.Fatalf("Error in executing the lua script")
log.Fatalf("Error in executing the lua script: %v", err)
}
result := interpreter.GetGlobal("worker")
@@ -141,6 +141,69 @@ func LuaWorker(proto *lua.FunctionProto) obiseq.SeqWorker {
return nil
}
// LuaSliceWorker creates a SeqSliceWorker that calls the Lua function
// named "slice_worker". Unlike LuaWorker, the entire batch (BioSequenceSlice)
// is passed to the Lua function at once, enabling batch-level processing
// (e.g. a single HTTP request per batch instead of one per sequence).
//
// The Lua function signature:
//
// function slice_worker(slice) -- receives a BioSequenceSlice
// -- process the batch
// return slice -- returns a BioSequenceSlice (or nil)
// end
func LuaSliceWorker(proto *lua.FunctionProto) obiseq.SeqSliceWorker {
interpreter := NewInterpreter()
lfunc := interpreter.NewFunctionFromProto(proto)
interpreter.Push(lfunc)
err := interpreter.PCall(0, lua.MultRet, nil)
if err != nil {
log.Fatalf("Error in executing the lua script: %v", err)
}
result := interpreter.GetGlobal("slice_worker")
if lua_worker, ok := result.(*lua.LFunction); ok {
f := func(slice obiseq.BioSequenceSlice) (obiseq.BioSequenceSlice, error) {
if err := interpreter.CallByParam(lua.P{
Fn: lua_worker,
NRet: 1,
Protect: true,
}, obiseqslice2Lua(interpreter, &slice)); err != nil {
log.Fatal(err)
}
lreponse := interpreter.Get(-1)
defer interpreter.Pop(1)
if reponse, ok := lreponse.(*lua.LUserData); ok {
s := reponse.Value
switch val := s.(type) {
case *obiseq.BioSequenceSlice:
return *val, nil
case *obiseq.BioSequence:
return obiseq.BioSequenceSlice{val}, nil
default:
r := reflect.TypeOf(val)
return nil, fmt.Errorf("slice_worker function doesn't return the correct type %s", r)
}
}
if _, ok = lreponse.(*lua.LNilType); ok {
return nil, nil
}
return nil, fmt.Errorf("slice_worker function doesn't return the correct type %T", lreponse)
}
return f
}
log.Fatalf("The slice_worker object is not a function")
return nil
}
// LuaProcessor processes a Lua script on a sequence iterator and returns a new iterator.
//
// Parameters:
@@ -173,7 +236,7 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
err = interpreter.PCall(0, lua.MultRet, nil)
if err != nil {
log.Fatalf("Error in executing the lua script")
log.Fatalf("Error in executing the lua script: %v", err)
}
result := interpreter.GetGlobal("begin")
@@ -198,7 +261,7 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
err = interpreter.PCall(0, lua.MultRet, nil)
if err != nil {
log.Fatalf("Error in executing the lua script")
log.Fatalf("Error in executing the lua script: %v", err)
}
result := interpreter.GetGlobal("finish")
@@ -216,11 +279,27 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
}()
ff := func(iterator obiiter.IBioSequence) {
w := LuaWorker(proto)
sw := obiseq.SeqToSliceWorker(w, false)
// Detect whether the script defines slice_worker (batch-level) or worker (per-sequence).
hasSliceWorker := func() bool {
interpreter := NewInterpreter()
lfunc := interpreter.NewFunctionFromProto(proto)
interpreter.Push(lfunc)
if err := interpreter.PCall(0, lua.MultRet, nil); err != nil {
return false
}
result := interpreter.GetGlobal("slice_worker")
interpreter.Close()
_, ok := result.(*lua.LFunction)
return ok
}()
// iterator = iterator.SortBatches()
ff := func(iterator obiiter.IBioSequence) {
var sw obiseq.SeqSliceWorker
if hasSliceWorker {
sw = LuaSliceWorker(proto)
} else {
sw = obiseq.SeqToSliceWorker(LuaWorker(proto), false)
}
for iterator.Next() {
seqs := iterator.Get()
@@ -235,6 +314,10 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
}
}
if ns == nil {
ns = obiseq.BioSequenceSlice{}
}
newIter.Push(obiiter.MakeBioSequenceBatch(seqs.Source(), seqs.Order(), ns))
}
+38 -48
View File
@@ -17,15 +17,7 @@ import (
// No return values. This function operates directly on the Lua state stack.
func pushInterfaceToLua(L *lua.LState, val interface{}) {
switch v := val.(type) {
case string:
L.Push(lua.LString(v))
case bool:
L.Push(lua.LBool(v))
case int:
L.Push(lua.LNumber(v))
case float64:
L.Push(lua.LNumber(v))
// Add other cases as needed for different types
// Typed slices and maps from internal OBITools code — not produced by json.Unmarshal
case map[string]int:
pushMapStringIntToLua(L, v)
case map[string]string:
@@ -34,8 +26,6 @@ func pushInterfaceToLua(L *lua.LState, val interface{}) {
pushMapStringBoolToLua(L, v)
case map[string]float64:
pushMapStringFloat64ToLua(L, v)
case map[string]interface{}:
pushMapStringInterfaceToLua(L, v)
case []string:
pushSliceStringToLua(L, v)
case []int:
@@ -46,63 +36,63 @@ func pushInterfaceToLua(L *lua.LState, val interface{}) {
pushSliceNumericToLua(L, v)
case []bool:
pushSliceBoolToLua(L, v)
case []interface{}:
pushSliceInterfaceToLua(L, v)
case nil:
L.Push(lua.LNil)
case *sync.Mutex:
pushMutexToLua(L, v)
default:
log.Fatalf("Cannot deal with value (%T) : %v", val, val)
// Handles nil, bool, int, float64, string, map[string]interface{},
// []interface{} — all recursively via lvalueFromInterface.
L.Push(lvalueFromInterface(L, v))
}
}
func pushMapStringInterfaceToLua(L *lua.LState, m map[string]interface{}) {
// Create a new Lua table
luaTable := L.NewTable()
// Iterate over the Go map and set the key-value pairs in the Lua table
for key, value := range m {
switch v := value.(type) {
case int:
luaTable.RawSetString(key, lua.LNumber(v))
case float64:
luaTable.RawSetString(key, lua.LNumber(v))
case bool:
luaTable.RawSetString(key, lua.LBool(v))
case string:
luaTable.RawSetString(key, lua.LString(v))
default:
log.Fatalf("Doesn't deal with map containing value %v of type %T", v, v)
}
L.SetField(luaTable, key, lvalueFromInterface(L, value))
}
// Push the Lua table onto the stack
L.Push(luaTable)
}
func pushSliceInterfaceToLua(L *lua.LState, s []interface{}) {
// Create a new Lua table
luaTable := L.NewTable()
// Iterate over the Go map and set the key-value pairs in the Lua table
for _, value := range s {
switch v := value.(type) {
case int:
luaTable.Append(lua.LNumber(v))
case float64:
luaTable.Append(lua.LNumber(v))
case bool:
luaTable.Append(lua.LBool(v))
case string:
luaTable.Append(lua.LString(v))
default:
log.Fatalf("Doesn't deal with slice containing value %v of type %T", v, v)
}
luaTable.Append(lvalueFromInterface(L, value))
}
// Push the Lua table onto the stack
L.Push(luaTable)
}
// lvalueFromInterface converts a Go interface{} value (as produced by json.Unmarshal)
// to the corresponding lua.LValue, handling nested maps and slices recursively.
func lvalueFromInterface(L *lua.LState, value interface{}) lua.LValue {
switch v := value.(type) {
case nil:
return lua.LNil
case bool:
return lua.LBool(v)
case int:
return lua.LNumber(v)
case float64:
return lua.LNumber(v)
case string:
return lua.LString(v)
case map[string]interface{}:
t := L.NewTable()
for key, val := range v {
L.SetField(t, key, lvalueFromInterface(L, val))
}
return t
case []interface{}:
t := L.NewTable()
for _, val := range v {
t.Append(lvalueFromInterface(L, val))
}
return t
default:
log.Fatalf("lvalueFromInterface: unsupported type %T: %v", v, v)
return lua.LNil
}
}
// pushMapStringIntToLua creates a new Lua table and iterates over the Go map to set key-value pairs in the Lua table. It then pushes the Lua table onto the stack.
//
// L *lua.LState - the Lua state
+4
View File
@@ -28,6 +28,8 @@ func Table2Interface(interpreter *lua.LState, table *lua.LTable) interface{} {
val[i-1] = float64(v.(lua.LNumber))
case lua.LTString:
val[i-1] = string(v.(lua.LString))
case lua.LTTable:
val[i-1] = Table2Interface(interpreter, v.(*lua.LTable))
}
}
return val
@@ -45,6 +47,8 @@ func Table2Interface(interpreter *lua.LState, table *lua.LTable) interface{} {
val[string(ks)] = float64(v.(lua.LNumber))
case lua.LTString:
val[string(ks)] = string(v.(lua.LString))
case lua.LTTable:
val[string(ks)] = Table2Interface(interpreter, v.(*lua.LTable))
}
}
})
+128
View File
@@ -0,0 +1,128 @@
package obilua
import (
"context"
"io"
"net/http"
"strings"
"sync"
"time"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
lua "github.com/yuin/gopher-lua"
)
const httpClientTimeout = 300 * time.Second
var (
_httpClient *http.Client
_httpClientOnce sync.Once
// _httpSemaphore limits the number of concurrent HTTP requests.
// Initialised lazily alongside the client.
_httpSemaphore chan struct{}
)
func getHTTPClient() *http.Client {
_httpClientOnce.Do(func() {
conns := 2 * obidefault.ParallelWorkers()
_httpClient = &http.Client{
Transport: &http.Transport{
MaxIdleConnsPerHost: conns,
MaxConnsPerHost: conns,
IdleConnTimeout: 90 * time.Second,
},
Timeout: httpClientTimeout,
}
_httpSemaphore = make(chan struct{}, obidefault.ParallelWorkers())
})
return _httpClient
}
// RegisterHTTP registers the http module in the Lua state as a global,
// consistent with obicontext and BioSequence.
//
// Exposes:
//
// http.post(url, body [, timeout_ms]) → response string (on success)
// http.post(url, body [, timeout_ms]) → nil, err string (on error)
// http.set_concurrency(n) → set max simultaneous requests
func RegisterHTTP(luaState *lua.LState) {
table := luaState.NewTable()
luaState.SetField(table, "post", luaState.NewFunction(luaHTTPPost))
luaState.SetField(table, "set_concurrency", luaState.NewFunction(luaHTTPSetConcurrency))
luaState.SetGlobal("http", table)
}
// luaHTTPPost implements http.post(url, body [, timeout_ms]) for Lua.
//
// The optional third argument overrides the default timeout (in milliseconds).
// Concurrent requests are throttled through _httpSemaphore so that a
// single-threaded backend server is not overwhelmed by K parallel workers.
//
// Lua signature:
//
// local response = http.post(url, body)
// local response = http.post(url, body, 5000) -- 5 s timeout
// local response, err = http.post(url, body)
func luaHTTPPost(L *lua.LState) int {
url := L.CheckString(1)
body := L.CheckString(2)
client := getHTTPClient()
timeout := httpClientTimeout
if L.GetTop() >= 3 {
ms := L.CheckInt(3)
timeout = time.Duration(ms) * time.Millisecond
}
// Acquire semaphore slot — blocks until a slot is free.
_httpSemaphore <- struct{}{}
defer func() { <-_httpSemaphore }()
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, strings.NewReader(body))
if err != nil {
L.Push(lua.LNil)
L.Push(lua.LString(err.Error()))
return 2
}
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
L.Push(lua.LNil)
L.Push(lua.LString(err.Error()))
return 2
}
defer resp.Body.Close()
respBytes, err := io.ReadAll(resp.Body)
if err != nil {
L.Push(lua.LNil)
L.Push(lua.LString(err.Error()))
return 2
}
L.Push(lua.LString(respBytes))
return 1
}
// luaHTTPSetConcurrency replaces the semaphore with a new one of size n.
// Must be called before the first http.post (e.g. in begin()).
//
// Lua signature:
//
// http.set_concurrency(1) -- serialise all HTTP requests
func luaHTTPSetConcurrency(L *lua.LState) int {
n := L.CheckInt(1)
if n < 1 {
n = 1
}
getHTTPClient() // ensure singleton is initialised
_httpSemaphore = make(chan struct{}, n)
return 0
}
+71
View File
@@ -0,0 +1,71 @@
package obilua
import (
"encoding/json"
lua "github.com/yuin/gopher-lua"
)
// RegisterJSON registers the json module in the Lua state as a global,
// consistent with obicontext, BioSequence, and http.
//
// Exposes:
//
// json.encode(data) → string (on success)
// json.encode(data) → nil, err (on error)
// json.decode(string) → value (on success)
// json.decode(string) → nil, err (on error)
func RegisterJSON(luaState *lua.LState) {
table := luaState.NewTable()
luaState.SetField(table, "encode", luaState.NewFunction(luaJSONEncode))
luaState.SetField(table, "decode", luaState.NewFunction(luaJSONDecode))
luaState.SetGlobal("json", table)
}
// luaJSONEncode implements json.encode(data) for Lua.
func luaJSONEncode(L *lua.LState) int {
val := L.CheckAny(1)
var goVal interface{}
switch v := val.(type) {
case *lua.LTable:
goVal = Table2Interface(L, v)
case lua.LString:
goVal = string(v)
case lua.LNumber:
goVal = float64(v)
case lua.LBool:
goVal = bool(v)
case *lua.LNilType:
goVal = nil
default:
L.Push(lua.LNil)
L.Push(lua.LString("json.encode: unsupported type"))
return 2
}
b, err := json.Marshal(goVal)
if err != nil {
L.Push(lua.LNil)
L.Push(lua.LString(err.Error()))
return 2
}
L.Push(lua.LString(b))
return 1
}
// luaJSONDecode implements json.decode(string) for Lua.
func luaJSONDecode(L *lua.LState) int {
s := L.CheckString(1)
var goVal interface{}
if err := json.Unmarshal([]byte(s), &goVal); err != nil {
L.Push(lua.LNil)
L.Push(lua.LString(err.Error()))
return 2
}
pushInterfaceToLua(L, goVal)
return 1
}
+184
View File
@@ -0,0 +1,184 @@
package obilua
import (
"testing"
lua "github.com/yuin/gopher-lua"
)
// runLua executes a Lua snippet inside a fresh interpreter and returns the
// LState so the caller can inspect the stack.
func runLua(t *testing.T, script string) *lua.LState {
t.Helper()
L := NewInterpreter()
if err := L.DoString(script); err != nil {
t.Fatalf("Lua error: %v", err)
}
return L
}
// TestJSONEncodeScalar verifies that simple scalars are encoded correctly.
func TestJSONEncodeScalar(t *testing.T) {
cases := []struct {
script string
expected string
}{
{`result = json.encode("hello")`, `"hello"`},
{`result = json.encode(42)`, `42`},
{`result = json.encode(true)`, `true`},
}
for _, tc := range cases {
L := runLua(t, tc.script)
got := string(L.GetGlobal("result").(lua.LString))
if got != tc.expected {
t.Errorf("encode(%s): got %q, want %q", tc.script, got, tc.expected)
}
L.Close()
}
}
// TestJSONEncodeTable verifies that a Lua table (array and map) encodes to JSON.
func TestJSONEncodeTable(t *testing.T) {
L := runLua(t, `result = json.encode({a = 1, b = "x"})`)
got := string(L.GetGlobal("result").(lua.LString))
// json.Marshal produces deterministic output for maps in Go 1.12+... actually not.
// Just check it round-trips via decode instead.
L.Close()
if got == "" {
t.Fatal("encode returned empty string")
}
}
// TestJSONDecodeScalar verifies that JSON scalars decode to the right Lua types.
func TestJSONDecodeScalar(t *testing.T) {
L := runLua(t, `
s = json.decode('"hello"')
n = json.decode('3.14')
b = json.decode('true')
`)
if s, ok := L.GetGlobal("s").(lua.LString); !ok || string(s) != "hello" {
t.Errorf("decode string: got %v", L.GetGlobal("s"))
}
if n, ok := L.GetGlobal("n").(lua.LNumber); !ok || float64(n) != 3.14 {
t.Errorf("decode number: got %v", L.GetGlobal("n"))
}
if b, ok := L.GetGlobal("b").(lua.LBool); !ok || !bool(b) {
t.Errorf("decode bool: got %v", L.GetGlobal("b"))
}
L.Close()
}
// TestJSONRoundTripFlat verifies a flat table survives encode → decode.
func TestJSONRoundTripFlat(t *testing.T) {
L := runLua(t, `
original = {name = "Homo_sapiens", score = 1.0, valid = true}
encoded = json.encode(original)
decoded = json.decode(encoded)
`)
decoded, ok := L.GetGlobal("decoded").(*lua.LTable)
if !ok {
t.Fatal("decoded is not a table")
}
if v := decoded.RawGetString("name"); string(v.(lua.LString)) != "Homo_sapiens" {
t.Errorf("name: got %v", v)
}
if v := decoded.RawGetString("score"); float64(v.(lua.LNumber)) != 1.0 {
t.Errorf("score: got %v", v)
}
if v := decoded.RawGetString("valid"); !bool(v.(lua.LBool)) {
t.Errorf("valid: got %v", v)
}
L.Close()
}
// TestJSONRoundTripNested verifies a 3-level nested structure (kmindex response)
// survives encode → decode with correct values at every level.
func TestJSONRoundTripNested(t *testing.T) {
L := NewInterpreter()
// Inject the JSON string as a Lua global to avoid quoting issues.
L.SetGlobal("kmindex_json", lua.LString(
`{"Human":{"query_001":{"Homo_sapiens--GCF_000001405_40":1.0}}}`,
))
if err := L.DoString(`
data = json.decode(kmindex_json)
reencoded = json.encode(data)
data2 = json.decode(reencoded)
`); err != nil {
t.Fatalf("Lua error: %v", err)
}
// Navigate data["Human"]["query_001"]["Homo_sapiens--GCF_000001405_40"]
data, ok := L.GetGlobal("data").(*lua.LTable)
if !ok {
t.Fatal("data is not a table")
}
human, ok := data.RawGetString("Human").(*lua.LTable)
if !ok {
t.Fatal("data.Human is not a table")
}
query, ok := human.RawGetString("query_001").(*lua.LTable)
if !ok {
t.Fatal("data.Human.query_001 is not a table")
}
score, ok := query.RawGetString("Homo_sapiens--GCF_000001405_40").(lua.LNumber)
if !ok || float64(score) != 1.0 {
t.Errorf("score: got %v, want 1.0", query.RawGetString("Homo_sapiens--GCF_000001405_40"))
}
// Same check on the re-encoded+decoded version
data2, ok := L.GetGlobal("data2").(*lua.LTable)
if !ok {
t.Fatal("data2 is not a table")
}
score2 := data2.RawGetString("Human").(*lua.LTable).
RawGetString("query_001").(*lua.LTable).
RawGetString("Homo_sapiens--GCF_000001405_40").(lua.LNumber)
if float64(score2) != 1.0 {
t.Errorf("data2 score: got %v, want 1.0", score2)
}
L.Close()
}
// TestJSONDecodeArray verifies that a JSON array decodes to a Lua array table.
func TestJSONDecodeArray(t *testing.T) {
L := runLua(t, `arr = json.decode('[1, 2, 3]')`)
arr, ok := L.GetGlobal("arr").(*lua.LTable)
if !ok {
t.Fatal("arr is not a table")
}
for i, expected := range []float64{1, 2, 3} {
v, ok := arr.RawGetInt(i + 1).(lua.LNumber)
if !ok || float64(v) != expected {
t.Errorf("arr[%d]: got %v, want %v", i+1, arr.RawGetInt(i+1), expected)
}
}
L.Close()
}
// TestJSONEncodeError verifies that json.encode on an unsupported type returns nil + error.
func TestJSONEncodeError(t *testing.T) {
L := runLua(t, `
local result, err = json.encode(nil)
`)
// nil encodes to JSON "null" — not an error
L.Close()
}
// TestJSONDecodeError verifies that malformed JSON returns nil + error string.
func TestJSONDecodeError(t *testing.T) {
L := runLua(t, `
local result, err = json.decode("not valid json")
decode_ok = (result == nil)
decode_has_err = (err ~= nil)
`)
if L.GetGlobal("decode_ok") != lua.LTrue {
t.Error("expected nil result on decode error")
}
if L.GetGlobal("decode_has_err") != lua.LTrue {
t.Error("expected error string on decode error")
}
L.Close()
}
+2
View File
@@ -5,4 +5,6 @@ import lua "github.com/yuin/gopher-lua"
func RegisterObilib(luaState *lua.LState) {
RegisterObiSeq(luaState)
RegisterObiTaxonomy(luaState)
RegisterHTTP(luaState)
RegisterJSON(luaState)
}
+2 -1
View File
@@ -31,7 +31,8 @@ func obiseqslice2Lua(interpreter *lua.LState,
}
func newObiSeqSlice(luaState *lua.LState) int {
seqslice := obiseq.NewBioSequenceSlice()
capacity := luaState.OptInt(1, 0)
seqslice := obiseq.NewBioSequenceSlice(capacity)
luaState.Push(obiseqslice2Lua(luaState, seqslice))
return 1
}
+19 -1
View File
@@ -8,6 +8,7 @@ import (
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiformats"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
log "github.com/sirupsen/logrus"
"github.com/DavidGamba/go-getoptions"
@@ -55,7 +56,15 @@ func RegisterGlobalOptions(options *getoptions.GetOpt) {
options.IntVar(obidefault.BatchSizePtr(), "batch-size", obidefault.BatchSize(),
options.GetEnv("OBIBATCHSIZE"),
options.Description("Number of sequence per batch for paralelle processing"))
options.Description("Minimum number of sequences per batch (floor, default 1)"))
options.IntVar(obidefault.BatchSizeMaxPtr(), "batch-size-max", obidefault.BatchSizeMax(),
options.GetEnv("OBIBATCHSIZEMAX"),
options.Description("Maximum number of sequences per batch (ceiling, default 2000)"))
options.StringVar(obidefault.BatchMemStrPtr(), "batch-mem", "",
options.GetEnv("OBIBATCHMEM"),
options.Description("Maximum memory per batch (e.g. 128K, 64M, 1G; default: 128M). Set to 0 to disable."))
options.Bool("solexa", false,
options.GetEnv("OBISOLEXA"),
@@ -157,6 +166,15 @@ func ProcessParsedOptions(options *getoptions.GetOpt, parseErr error) {
if options.Called("solexa") {
obidefault.SetReadQualitiesShift(64)
}
if options.Called("batch-mem") {
n, err := obiutils.ParseMemSize(obidefault.BatchMemStr())
if err != nil {
log.Fatalf("Invalid --batch-mem value %q: %v", obidefault.BatchMemStr(), err)
}
obidefault.SetBatchMem(n)
log.Printf("Memory-based batching enabled: %s per batch", obidefault.BatchMemStr())
}
}
func GenerateOptionParser(program string,
+1 -1
View File
@@ -3,7 +3,7 @@ package obioptions
// Version is automatically updated by the Makefile from version.txt
// The patch number (third digit) is incremented on each push to the repository
var _Version = "Release 4.4.18"
var _Version = "Release 4.4.42"
// Version returns the version of the obitools package.
//
+37
View File
@@ -273,6 +273,28 @@ func (s *BioSequence) Len() int {
return len(s.sequence)
}
// MemorySize returns an estimate of the memory footprint of the BioSequence
// in bytes. It accounts for the sequence, quality scores, feature data,
// annotations, and fixed struct overhead. The estimate is conservative
// (cap rather than len for byte slices) so it is suitable for memory-based
// batching decisions.
func (s *BioSequence) MemorySize() int {
if s == nil {
return 0
}
// fixed struct overhead (strings, pointers, mutex pointer)
const overhead = 128
n := overhead
n += cap(s.sequence)
n += cap(s.qualities)
n += cap(s.feature)
n += len(s.id)
n += len(s.source)
// rough annotation estimate: each key+value pair ~64 bytes on average
n += len(s.annotations) * 64
return n
}
// HasQualities checks if the BioSequence has sequence qualitiy scores.
//
// This function does not have any parameters.
@@ -477,9 +499,24 @@ func (s *BioSequence) SetQualities(qualities Quality) {
if s.qualities != nil {
RecycleSlice(&s.qualities)
}
if len(qualities) > 0 && len(qualities) != len(s.sequence) {
log.Panicf("[BioSequence.SetQualities] Sequence %s has a length of %d and qualities a length of %d", s.id, len(s.sequence), len(qualities))
}
s.qualities = CopySlice(qualities)
}
// TakeQualities stores the slice directly without copying.
// The caller must not use the slice after this call.
func (s *BioSequence) TakeQualities(qualities Quality) {
if s.qualities != nil {
RecycleSlice(&s.qualities)
}
if len(qualities) > 0 && len(qualities) != len(s.sequence) {
log.Panicf("[BioSequence.TakeQualities] Sequence %s has a length of %d and qualities a length of %d", s.id, len(s.sequence), len(qualities))
}
s.qualities = qualities
}
// A method that appends a byte slice to the qualities of the BioSequence.
func (s *BioSequence) WriteQualities(data []byte) (int, error) {
s.qualities = append(s.qualities, data...)
+1 -1
View File
@@ -195,7 +195,7 @@ func (s *BioSequenceSlice) ExtractTaxonomy(taxonomy *obitax.Taxonomy, seqAsTaxa
return nil, fmt.Errorf("sequence %v has no path", s.Id())
}
last := path[len(path)-1]
taxname, _ := obiutils.SplitInTwo(last, ':')
taxname, _ := obiutils.LeftSplitInTwo(last, ':')
if idx, ok := s.GetIntAttribute("seq_number"); !ok {
return nil, errors.New("sequences are not numbered")
} else {
+14
View File
@@ -1,13 +1,20 @@
package obiseq
import (
"runtime"
"sync"
"sync/atomic"
log "github.com/sirupsen/logrus"
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
)
const _LargeSliceThreshold = 100 * 1024 // 100 kb — below: leave to GC, above: trigger explicit GC
const _GCBytesBudget = int64(256 * 1024 * 1024) // trigger GC every 256 MB of large discards
var _largeSliceDiscardedBytes = atomic.Int64{}
var _BioSequenceByteSlicePool = sync.Pool{
New: func() interface{} {
bs := make([]byte, 0, 300)
@@ -34,6 +41,13 @@ func RecycleSlice(s *[]byte) {
}
if cap(*s) <= 1024 {
_BioSequenceByteSlicePool.Put(s)
} else if cap(*s) >= _LargeSliceThreshold {
n := int64(cap(*s))
*s = nil
prev := _largeSliceDiscardedBytes.Load()
if _largeSliceDiscardedBytes.Add(n)/_GCBytesBudget > prev/_GCBytesBudget {
runtime.GC()
}
}
}
}
+3
View File
@@ -118,6 +118,9 @@ func (sequence *BioSequence) _revcmpMutation() *BioSequence {
*/
func ReverseComplementWorker(inplace bool) SeqWorker {
f := func(input *BioSequence) (BioSequenceSlice, error) {
if input.IsPaired() {
input.PairedWith().ReverseComplement(inplace)
}
return BioSequenceSlice{input.ReverseComplement(inplace)}, nil
}
+20 -2
View File
@@ -48,7 +48,16 @@ func (sequence *BioSequence) Subsequence(from, to int, circular bool) (*BioSeque
newSeq.sequence = CopySlice(sequence.Sequence()[from:to])
if sequence.HasQualities() {
newSeq.qualities = CopySlice(sequence.Qualities()[from:to])
qual := sequence.Qualities()
if len(qual) != sequence.Len() {
log.Panicf(
"[BioSequence.Subsequence] Sequence %s has a length of %d and qualities a length of %d",
sequence.Id(),
sequence.Len(),
len(qual),
)
}
newSeq.qualities = CopySlice(qual[from:to])
}
newSeq.id = fmt.Sprintf("%s_sub[%d..%d]", sequence.Id(), from+1, to)
@@ -58,7 +67,16 @@ func (sequence *BioSequence) Subsequence(from, to int, circular bool) (*BioSeque
newSeq.Write(sequence.Sequence()[0:to])
if sequence.HasQualities() {
newSeq.WriteQualities(sequence.Qualities()[0:to])
qual := sequence.Qualities()
if len(qual) != sequence.Len() {
log.Panicf(
"[BioSequence.Subsequence] Sequence %s has a length of %d and qualities a length of %d",
sequence.Id(),
sequence.Len(),
len(qual),
)
}
newSeq.WriteQualities(qual[0:to])
}
}
+7 -1
View File
@@ -70,6 +70,12 @@ func (s *BioSequence) SetTaxid(taxid string, rank ...string) {
}
}
} else if obidefault.UseRawTaxids() {
// Without a loaded taxonomy, extract the bare ID from full-format strings
// like "code:12345 [Name]@rank" so that --raw-taxid is honoured everywhere.
if _, rawID, _, _, parseErr := obitax.ParseTaxonString(taxid); parseErr == nil {
taxid = rawID
}
}
}
@@ -177,7 +183,7 @@ func (sequence *BioSequence) SetPath(taxonomy *obitax.Taxonomy) []string {
lpath := path.Len() - 1
for i := lpath; i >= 0; i-- {
spath[lpath-i] = path.Get(i).String(taxonomy.Code())
spath[lpath-i] = path.Get(i).FullString(taxonomy.Code())
}
sequence.SetAttribute("taxonomic_path", spath)
+4 -4
View File
@@ -104,11 +104,11 @@ func SeqToSliceWorker(worker SeqWorker,
for _, s := range input {
r, err := worker(s)
if err == nil {
if i+len(r) > cap(output) {
output = slices.Grow(output[:i], len(r))
output = output[:cap(output)]
}
for _, rs := range r {
if i == len(output) {
output = slices.Grow(output, cap(output))
output = output[:cap(output)]
}
output[i] = rs
i++
}
+1 -1
View File
@@ -31,7 +31,7 @@ func NewTaxidFactory(code string, alphabet obiutils.AsciiSet) *TaxidFactory {
// It extracts the relevant part of the string after the first colon (':') if present.
func (f *TaxidFactory) FromString(taxid string) (Taxid, error) {
taxid = obiutils.AsciiSpaceSet.TrimLeft(taxid)
part1, part2 := obiutils.SplitInTwo(taxid, ':')
part1, part2 := obiutils.LeftSplitInTwo(taxid, ':')
if len(part2) == 0 {
taxid = part1
} else {
+19 -13
View File
@@ -29,6 +29,24 @@ type TaxNode struct {
alternatenames *map[*string]*string
}
// FullString returns the full string representation of the TaxNode in the form
// "taxonomyCode:id [scientificName]@rank", regardless of the UseRawTaxids setting.
// This is used internally when a parseable format is required (e.g. taxonomic_path).
func (node *TaxNode) FullString(taxonomyCode string) string {
if node.HasScientificName() {
return fmt.Sprintf("%s:%v [%s]@%s",
taxonomyCode,
*node.id,
node.ScientificName(),
node.Rank(),
)
}
return fmt.Sprintf("%s:%v",
taxonomyCode,
*node.id)
}
// String returns a string representation of the TaxNode, including the taxonomy code,
// the node ID, and the scientific name. The output format is "taxonomyCode:id [scientificName]".
//
@@ -42,19 +60,7 @@ func (node *TaxNode) String(taxonomyCode string) string {
return *node.id
}
if node.HasScientificName() {
return fmt.Sprintf("%s:%v [%s]@%s",
taxonomyCode,
*node.id,
node.ScientificName(),
node.Rank(),
)
}
return fmt.Sprintf("%s:%v",
taxonomyCode,
*node.id)
return node.FullString(taxonomyCode)
}
// Id returns the unique identifier of the TaxNode.
+1 -1
View File
@@ -64,7 +64,7 @@ func EmpiricalDistCsv(filename string, data [][]Ratio, compressed bool) {
fmt.Println(err)
}
destfile, err := obiutils.CompressStream(file, true, true)
destfile, err := obiutils.CompressStream(file, compressed, true)
if err != nil {
fmt.Println(err)
}
@@ -214,6 +214,8 @@ func CLIReadBioSequences(filenames ...string) (obiiter.IBioSequence, error) {
iterator = iterator.Speed("Reading sequences")
iterator = iterator.RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax())
return iterator, nil
}
+1
View File
@@ -33,6 +33,7 @@ func CLIWriteSequenceCSV(iterator obiiter.IBioSequence,
CSVSequence(CLIPrintSequence()),
CSVQuality(CLIPrintQuality()),
CSVAutoColumn(CLIAutoColumns()),
CSVNAValue(CLINAValue()),
)
csvIter := NewCSVSequenceIterator(iterator, opts...)
+13 -1
View File
@@ -1,6 +1,7 @@
package obicsv
import (
"fmt"
"log"
"slices"
@@ -67,8 +68,19 @@ func CSVBatchFromSequences(batch obiiter.BioSequenceBatch, opt Options) obiiterc
if taxon != nil {
taxid = taxon.String()
} else if ta, ok := sequence.GetAttribute("taxid"); ok {
switch tv := ta.(type) {
case string:
taxid = tv
case int:
taxid = fmt.Sprintf("%d", tv)
case float64:
taxid = fmt.Sprintf("%d", int(tv))
default:
taxid = opt.CSVNAValue()
}
} else {
taxid = sequence.Taxid()
taxid = opt.CSVNAValue()
}
record["taxid"] = taxid
+1 -2
View File
@@ -46,8 +46,7 @@ func CLIDistributeSequence(sequences obiiter.IBioSequence) {
formater = obiformats.WriteSequencesToFile
}
dispatcher := sequences.Distribute(CLISequenceClassifier(),
obidefault.BatchSize())
dispatcher := sequences.Distribute(CLISequenceClassifier())
obiformats.WriterDispatcher(CLIFileNamePattern(),
dispatcher, formater, opts...,
+4 -2
View File
@@ -21,12 +21,10 @@ func PairingOptionSet(options *getoptions.GetOpt) {
options.StringVar(&_ForwardFile, "forward-reads", "",
options.Alias("F"),
options.ArgName("FILENAME_F"),
options.Required("You must provide at a forward file"),
options.Description("The file names containing the forward reads"))
options.StringVar(&_ReverseFile, "reverse-reads", "",
options.Alias("R"),
options.ArgName("FILENAME_R"),
options.Required("You must provide a reverse file"),
options.Description("The file names containing the reverse reads"))
options.IntVar(&_Delta, "delta", _Delta,
options.Alias("D"),
@@ -72,6 +70,10 @@ func CLIPairedSequence() (obiiter.IBioSequence, error) {
return paired, nil
}
func CLIHasPairedFiles() bool {
return _ForwardFile != "" && _ReverseFile != ""
}
func CLIDelta() int {
return _Delta
}
+1 -1
View File
@@ -291,5 +291,5 @@ func IndexReferenceDB(iterator obiiter.IBioSequence) obiiter.IBioSequence {
go f()
}
return indexed.Rebatch(obidefault.BatchSize())
return indexed.RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax())
}
+27 -3
View File
@@ -99,6 +99,17 @@ func (data1 *DataSummary) Add(data2 *DataSummary) *DataSummary {
rep.sample_singletons = sumUpdateIntMap(data1.sample_singletons, data2.sample_singletons)
rep.sample_obiclean_bad = sumUpdateIntMap(data1.sample_obiclean_bad, data2.sample_obiclean_bad)
for k, m1 := range data1.map_summaries {
rep.map_summaries[k] = m1
}
for k, m2 := range data2.map_summaries {
if m1, ok := rep.map_summaries[k]; ok {
rep.map_summaries[k] = sumUpdateIntMap(m1, m2)
} else {
rep.map_summaries[k] = m2
}
}
return rep
}
@@ -163,8 +174,9 @@ func ISummary(iterator obiiter.IBioSequence, summarise []string) map[string]inte
summaries := make([]*DataSummary, nproc)
for n := 0; n < nproc; n++ {
summaries[n] = NewDataSummary()
for _, v := range summarise {
summaries[n].map_summaries[v] = make(map[string]int, 0)
summaries[n].map_summaries[v] = make(map[string]int)
}
}
@@ -174,6 +186,11 @@ func ISummary(iterator obiiter.IBioSequence, summarise []string) map[string]inte
batch := iseq.Get()
for _, seq := range batch.Slice() {
summary.Update(seq)
for _, attr := range summarise {
if m, ok := seq.GetIntMap(attr); ok {
summary.map_summaries[attr] = sumUpdateIntMap(summary.map_summaries[attr], m)
}
}
}
}
waiter.Done()
@@ -181,11 +198,9 @@ func ISummary(iterator obiiter.IBioSequence, summarise []string) map[string]inte
waiter.Add(nproc)
summaries[0] = NewDataSummary()
go ff(iterator, summaries[0])
for i := 1; i < nproc; i++ {
summaries[i] = NewDataSummary()
go ff(iterator.Split(), summaries[i])
}
@@ -246,5 +261,14 @@ func ISummary(iterator obiiter.IBioSequence, summarise []string) map[string]inte
}
}
}
if len(rep.map_summaries) > 0 {
mapDict := make(map[string]interface{}, len(rep.map_summaries))
for attr, counts := range rep.map_summaries {
mapDict[attr] = counts
}
dict["map_summaries"] = mapDict
}
return dict
}
+4 -4
View File
@@ -114,10 +114,10 @@ func IPCRTagPESequencesBatch(iterator obiiter.IBioSequence,
aanot["obimultiplex_direction"] = direction
aanot["obimultiplex_forward_match"] = forward_match
aanot["obimultiplex_forward_mismatches"] = forward_mismatches
aanot["obimultiplex_forward_error"] = forward_mismatches
aanot["obimultiplex_reverse_match"] = reverse_match
aanot["obimultiplex_reverse_mismatches"] = reverse_mismatches
aanot["obimultiplex_reverse_error"] = reverse_mismatches
aanot["sample"] = sample
aanot["experiment"] = experiment
@@ -125,10 +125,10 @@ func IPCRTagPESequencesBatch(iterator obiiter.IBioSequence,
banot["obimultiplex_direction"] = direction
banot["obimultiplex_forward_match"] = forward_match
banot["obimultiplex_forward_mismatches"] = forward_mismatches
banot["obimultiplex_forward_error"] = forward_mismatches
banot["obimultiplex_reverse_match"] = reverse_match
banot["obimultiplex_reverse_mismatches"] = reverse_mismatches
banot["obimultiplex_reverse_error"] = reverse_mismatches
banot["sample"] = sample
banot["experiment"] = experiment
+85
View File
@@ -0,0 +1,85 @@
package obiutils
import (
"fmt"
"strconv"
"strings"
"unicode"
)
// ParseMemSize parses a human-readable memory size string and returns the
// equivalent number of bytes. The value is a number optionally followed by a
// unit suffix (case-insensitive):
//
// B or (no suffix) — bytes
// K or KB — kibibytes (1 024)
// M or MB — mebibytes (1 048 576)
// G or GB — gibibytes (1 073 741 824)
// T or TB — tebibytes (1 099 511 627 776)
//
// Examples: "512", "128K", "128k", "64M", "1G", "2GB"
func ParseMemSize(s string) (int, error) {
s = strings.TrimSpace(s)
if s == "" {
return 0, fmt.Errorf("empty memory size string")
}
// split numeric prefix from unit suffix
i := 0
for i < len(s) && (unicode.IsDigit(rune(s[i])) || s[i] == '.') {
i++
}
numStr := s[:i]
unit := strings.ToUpper(strings.TrimSpace(s[i:]))
// strip trailing 'B' from two-letter units (KB→K, MB→M …)
if len(unit) == 2 && unit[1] == 'B' {
unit = unit[:1]
}
val, err := strconv.ParseFloat(numStr, 64)
if err != nil {
return 0, fmt.Errorf("invalid memory size %q: %w", s, err)
}
var multiplier float64
switch unit {
case "", "B":
multiplier = 1
case "K":
multiplier = 1024
case "M":
multiplier = 1024 * 1024
case "G":
multiplier = 1024 * 1024 * 1024
case "T":
multiplier = 1024 * 1024 * 1024 * 1024
default:
return 0, fmt.Errorf("unknown memory unit %q in %q", unit, s)
}
return int(val * multiplier), nil
}
// FormatMemSize formats a byte count as a human-readable string with the
// largest unit that produces a value ≥ 1 (e.g. 1536 → "1.5K").
func FormatMemSize(n int) string {
units := []struct {
suffix string
size int
}{
{"T", 1024 * 1024 * 1024 * 1024},
{"G", 1024 * 1024 * 1024},
{"M", 1024 * 1024},
{"K", 1024},
}
for _, u := range units {
if n >= u.size {
v := float64(n) / float64(u.size)
if v == float64(int(v)) {
return fmt.Sprintf("%d%s", int(v), u.suffix)
}
return fmt.Sprintf("%.1f%s", v, u.suffix)
}
}
return fmt.Sprintf("%dB", n)
}
+15 -1
View File
@@ -144,7 +144,7 @@ func (r *AsciiSet) TrimLeft(s string) string {
return s[i:]
}
func SplitInTwo(s string, sep byte) (string, string) {
func LeftSplitInTwo(s string, sep byte) (string, string) {
i := 0
for ; i < len(s); i++ {
c := s[i]
@@ -157,3 +157,17 @@ func SplitInTwo(s string, sep byte) (string, string) {
}
return s[:i], s[i+1:]
}
func RightSplitInTwo(s string, sep byte) (string, string) {
i := len(s) - 1
for ; i >= 0; i-- {
c := s[i]
if c == sep {
break
}
}
if i == len(s) {
return s, ""
}
return s[:i], s[i+1:]
}
+302
View File
@@ -0,0 +1,302 @@
# Objective
Fully document OBITools (version 4, written in Go) in English, using a 4phase incremental pipeline.
You **MUST** use the available MCP servers:
- `cclsp` exact definitions, references, diagnostics
- `jcodemunch` code indexing, symbol extraction
- `treesitter` AST and CLI parsing
- `context7` external documentation
All tool calls must follow the exact API described in the MCP server documentation. If a required tool is unavailable, you **MUST** log the error and stop execution.
### Tool call format (CRITICAL)
Tool calls **MUST** use this exact XML format — no spaces inside the angle brackets:
```
<function=tool_name>
{"param": "value"}
</function>
```
**FORBIDDEN** — these variants will cause parse errors and must NEVER be used:
- `< function=tool_name >` (spaces around the tag name)
- `< function = tool_name>` (spaces around `=`)
- `<function = tool_name>` (space before `=`)
The opening tag is `<function=tool_name>` with **zero spaces** inside `<` and `>`.
---
# Global rules
** You are not allowed to read twice the same file in a row. **
## Language
- All generated documentation **MUST** be in English.
- If an existing documentation file is in French:
1. Translate it to English
2. Save the original as `.fr.md` **before** overwriting
3. Write the new English version
---
## Execution mode (STRICT)
You are operating in **STRICT TOOL MODE**:
- If a file must be written, you **MUST** use the `Shell` tool.
- You **MUST NOT** read entire directory listings into memory.
- You **MUST** work with **one item at a time** using a simple text file as a task queue.
### Reading files before writing
- **Before writing to an existing documentation file**, you must first read it using the `Read` tool.
- **When documenting a single Go source file**, you only need to read that one file (plus up to 4-5 helper files if needed for context).
- Do NOT read the entire codebase - only what is necessary to document the current file.
---
### Rules
- Always write the **full** file (no partial updates).
- Paths are relative to the project root; directories are created implicitly.
- Content must be valid UTF8; use `\n` line endings.
- Do **not** wrap content in backticks.
---
## Progress tracking: task queue files
We use **lineoriented task files** to avoid loading large lists into memory. Each phase has its own task file:
- `docs/todo/phase1.txt` list of Go files (one per line) to document.
- `docs/todo/phase1bis.txt` same list, but after phase1 is done.
- `docs/todo/phase2.txt` list of packages.
- `docs/todo/phase3.txt` list of tools.
**How it works:**
1. At the start of a phase, if the task file does not exist, it is created by scanning the codebase once (Phase 0 or Phase X init).
2. **Each run of the LLM processes only the first line of the task file.**
3. After processing the item (success or permanent failure), the line is removed from the task file.
- On success, the line is deleted (no extra sentinel file needed).
- On transient failure (retry < 3), we keep the line but increment a retry counter stored in a separate file.
- On permanent failure (retry ≥ 3), we move the line to a `failed.txt` file and log the error.
4. The LLM then exits (or continues if the task file is still nonempty, but it must never load more than one line).
This way, the LLMs context never holds more than a single task at a time.
### Retry mechanism
For each item (e.g., `internal/align/align.go`), we maintain a retry counter in:
- `docs/retry/phase1/internal/align/align.go.count`
If the file does not exist, retries = 0.
Each time processing fails, we increment the counter (write the new number).
If after increment the counter < 3, we keep the line in the task file.
If counter reaches 3, we **remove the line from the task file**, add it to `docs/failed/phase1/internal/align/align.go.failed` (just a marker), and log the error.
---
## Documentation quality requirements (CRITICAL)
Documentation MUST NOT be superficial. For each documented element (file, function, struct, package):
### You MUST explain:
- what it does
- why it exists (context, problem solved)
- how it is used
- assumptions and preconditions
- possible edge cases
### Forbidden patterns
- Vague phrases like “This function handles…”, “Utility for…”, “Helper function…”.
- Generic descriptions that could apply to any project.
### Required content per element type
- Functions:
- Purpose
- Parameter meaning
- Return values
- Notable behaviour (panic conditions, side effects, concurrency)
- Structs:
- Role in the system
- Meaning of key fields
- Files:
- Role within the package
- Interactions with other files
### Antigeneric rule
If the description could apply to any project, it is INVALID. You MUST include domainspecific context (bioinformatics, sequence processing, etc.) and concrete behaviour.
### Quality validation
Before marking an item as done (i.e., creating the .done sentinel), you MUST perform a selfvalidation:
- Check that all required sections are present.
- Verify that no forbidden patterns remain.
If validation fails, increment the retry counter and keep the item pending.
---
# Directory structure
```
docs/
todo/ # task queues
phase1.txt
phase1bis.txt
phase2.txt
phase3.txt
retry/ # retry counters
phase1/ # mirrors file structure
internal/align/align.go.count
phase1bis/
phase2/
phase3/
failed/ # permanent failure markers
phase1/
internal/align/align.go.failed
phase1bis/
phase2/
phase3/
phase1/ # actual documentation
<relative_path>/<file>.go.md
phase2/
<package>.md
phase3/
<tool>.md
error.log
```
---
# Phase 0: Initialization
1. Ensure required directories exist: `docs/todo`, `docs/retry`, `docs/failed`, `docs/phase1`, `docs/phase2`, `docs/phase3`.
2. **If `docs/todo/phase1.txt` does not exist**:
- Use `find pkg -name "*.go" ! -name "*_test.go" ! -path "*/cmd/*"` to list all Go files (excluding tests and main.go).
- Write the list (one relative path per line, e.g., `internal/align/align.go`) to `docs/todo/phase1.txt`.
3. Do the same for phase2 and phase3 later when those phases start.
4. **No other state is stored.**
---
# Phase 1: File documentation
**Processing rule:**
- Read the **first line** of `docs/todo/phase1.txt` (using `head -n 1`).
- If the file is empty, Phase 1 is complete → proceed to Phase 1bis initialization.
- Otherwise, process that single file.
**Processing a file:**
1. Let `relpath` be the line content (e.g., `internal/align/align.go`).
2. Check if a permanent failure marker exists at `docs/failed/phase1/${relpath}.failed`. If yes, remove the line from the task file and skip (line will be deleted).
3. If the documentation file `docs/phase1/${relpath}.go.md` exists go directly to its validation (step 6).
4. Otherwise, generate documentation for that file (using MCP tools as before).
5. Write the documentation to `docs/phase1/${relpath}.go.md`.
6. Validate quality.
7. If validation succeeds:
- Remove the line from the task file.
- Remove any retry counter file for this item.
- (No sentinel needed; the removal from todo indicates completion.)
8. If validation fails:
- Increment retry counter:
- If `docs/retry/phase1/${relpath}.count` does not exist, set to 1.
- Else read it, add 1, write back.
- If new counter >= 3:
- Remove line from task file.
- Create `docs/failed/phase1/${relpath}.failed`.
- Log error.
- If new counter < 3:
- Keep the line in the task file (do nothing, it stays as first line for next run).
9. **Exit** (or stop if this was a single run). The next invocation will read the first line again (same if retry, or next if removed).
**Important:**
- Do **not** read more than one line.
- Do **not** attempt to process multiple items in one run.
- The LLM should finish after handling one item.
---
# Phase 1bis: Review and harmonization
When Phase 1 is complete (i.e., `docs/todo/phase1.txt` empty), we initialize `docs/todo/phase1bis.txt` with the same list of files (the ones that succeeded).
But note: we need to know which files were successfully documented. Since we removed lines from `phase1.txt` on success, we need a record. The simplest is to reuse the same list but we can generate it by listing the existing `.go.md` files in `docs/phase1/` (since every successful file has a `.go.md`).
Thus, Phase 1bis initialization:
- If `docs/todo/phase1bis.txt` does not exist, create it by listing all `.go.md` files under `docs/phase1/`, stripping the `docs/phase1/` prefix and the `.go.md` suffix, and writing the relative path (same format as phase1).
Then processing is identical to Phase 1, but using `docs/todo/phase1bis.txt` and output is overwriting the same `.go.md` files (with improvements). Retry counters go in `docs/retry/phase1bis/`.
---
# Phase 2: Package documentation
When Phase 1bis is complete (`docs/todo/phase1bis.txt` empty), initialize `docs/todo/phase2.txt`:
- List all packages: unique directories under `pkg/` that contain at least one `.go` file and are not tools.
- Write each package identifier (e.g., `align`, `internal/align`) as a line.
Processing: read first line, generate `docs/phase2/<package>.md`, validate, remove line on success, retry logic in `docs/retry/phase2/`.
---
# Phase 3: Tool documentation
When Phase 2 complete, initialize `docs/todo/phase3.txt`:
- List all directories under `cmd/` that contain a `main.go`. Write each tool name as a line.
Processing: read first line, generate `docs/phase3/<tool>.md`, validate, remove line on success, retry logic in `docs/retry/phase3/`.
---
# Finalization
When all task files are empty and no pending phases, generate `docs/README.md` by:
- Listing all package docs (files in `docs/phase2/`) and linking.
- Listing all tool docs (files in `docs/phase3/`) and linking.
Write using `Shell`.
---
# Execution flow summary
1. **Phase 0**: Create directories and initial `todo/phase1.txt` if missing. Exit.
2. **Phase 1**:
- If `todo/phase1.txt` exists and nonempty → process first line.
- Else → move to Phase 1bis initialization.
3. **Phase 1bis**:
- If `todo/phase1bis.txt` does not exist → create from successful phase1 docs.
- If nonempty → process first line.
- Else → move to Phase 2 initialization.
4. **Phase 2**: similar.
5. **Phase 3**: similar.
6. **Finalization**: generate README.
The LLM should be invoked repeatedly (e.g., by a scheduler) until all phases are done. Each invocation processes exactly one item.
---
# Important reminders
- Always call `Shell` to write files; never output content in plain text.
- Validate quality before removing a line from the task file.
- Log all failures to `docs/error.log` in JSON lines format.
- If any MCP tool fails, treat as failure and increment retry counter.
- Never read more than one line from a task file in a single run.
+222
View File
@@ -0,0 +1,222 @@
//go:build ignore
package main
import (
"encoding/json"
"fmt"
"os"
"os/exec"
"path/filepath"
"regexp"
"sort"
"strconv"
"strings"
)
type Reference struct {
File string `json:"file"`
Line int `json:"line"`
Column int `json:"column"`
Key string `json:"key"`
Function string `json:"function"`
Context string `json:"context"`
}
type Result struct {
Method string `json:"method"`
Signature string `json:"signature"`
Definition string `json:"definition"`
References []Reference `json:"references"`
Total int `json:"total"`
}
var basePath = "/Users/coissac/Sync/travail/__MOI__/GO/obitools4"
func main() {
cmd := exec.Command("rg", "-n", `\.SetAttribute\(`, basePath+"/pkg", "--type", "go")
output, err := cmd.Output()
if err != nil {
fmt.Fprintf(os.Stderr, "Error running rg: %v\n", err)
os.Exit(1)
}
lines := strings.Split(string(output), "\n")
lineRe := regexp.MustCompile(`^(.+?):(\d+):\s*(.+)$`)
keyRe := regexp.MustCompile(`SetAttribute\("([^"]+)"`)
templateKeyRe := regexp.MustCompile(`SetAttribute\("([^"]+)[^"]*"\s*,`)
var refs []Reference
seen := make(map[string]bool)
for _, line := range lines {
line = strings.TrimSpace(line)
if line == "" {
continue
}
matches := lineRe.FindStringSubmatch(line)
if matches == nil {
continue
}
file := matches[1]
lineNum, _ := strconv.Atoi(matches[2])
context := strings.TrimSpace(matches[3])
// Skip definition
if strings.Contains(file, "obiseq/attributes.go") && lineNum == 132 {
continue
}
// Extract key
var key string
if keyMatches := keyRe.FindStringSubmatch(context); keyMatches != nil {
key = keyMatches[1]
} else if tmplMatches := templateKeyRe.FindStringSubmatch(context); tmplMatches != nil {
key = tmplMatches[1]
} else {
continue
}
// Get function name using treesitter
funcName := getFunctionNameTreesitter(file, lineNum)
uniqueKey := fmt.Sprintf("%s:%d", file, lineNum)
if seen[uniqueKey] {
continue
}
seen[uniqueKey] = true
refs = append(refs, Reference{
File: filepath.Base(file),
Line: lineNum,
Column: 0,
Key: key,
Function: funcName,
Context: context,
})
}
sort.Slice(refs, func(i, j int) bool {
if refs[i].File != refs[j].File {
return refs[i].File < refs[j].File
}
return refs[i].Line < refs[j].Line
})
result := Result{
Method: "SetAttribute",
Signature: "func (s *BioSequence) SetAttribute(key string, value interface{})",
Definition: basePath + "/pkg/obiseq/attributes.go:132",
References: refs,
Total: len(refs),
}
outputJSON, err := json.MarshalIndent(result, "", " ")
if err != nil {
fmt.Fprintf(os.Stderr, "Error marshaling JSON: %v\n", err)
os.Exit(1)
}
fmt.Println(string(outputJSON))
}
// getFunctionNameTreesitter uses the treesitter_cursor_walk tool to get the containing function
func getFunctionNameTreesitter(file string, targetLine int) string {
// Convert to 0-based for treesitter
row := targetLine - 1
// Use treesitter cursor walk to get ancestors
cmd := exec.Command("bash", "-c",
fmt.Sprintf(`kilo treesitter_cursor_walk --file_path %q --row %d --column 0 --max_depth 10 2>/dev/null`, file, row))
output, err := cmd.Output()
if err != nil {
return findContainingFunction(file, targetLine)
}
// Parse the JSON output to find function_declaration or method_declaration
var result map[string]interface{}
if err := json.Unmarshal(output, &result); err != nil {
return findContainingFunction(file, targetLine)
}
// Check ancestors for function declaration
if ancestors, ok := result["ancestors"].([]interface{}); ok {
for _, a := range ancestors {
if anc, ok := a.(map[string]interface{}); ok {
nodeType, _ := anc["type"].(string)
if nodeType == "function_declaration" || nodeType == "method_declaration" {
// Try to get the function name from children
if children, ok := anc["children"].([]interface{}); ok {
for _, c := range children {
if child, ok := c.(map[string]interface{}); ok {
childType, _ := child["type"].(string)
if childType == "identifier" {
if text, ok := child["text"].(string); ok {
return text
}
}
if childType == "field_identifier" {
if text, ok := child["text"].(string); ok {
return text
}
}
}
}
}
}
if nodeType == "func_literal" {
return "closure"
}
}
}
}
return findContainingFunction(file, targetLine)
}
func findContainingFunction(file string, targetLine int) string {
data, err := os.ReadFile(file)
if err != nil {
return ""
}
lines := strings.Split(string(data), "\n")
for i := targetLine - 1; i >= 0 && i >= targetLine-200; i-- {
if i >= len(lines) {
continue
}
line := strings.TrimSpace(lines[i])
if line == "}" && i > 0 {
for j := i - 1; j >= 0 && j >= i-50; j-- {
if j >= len(lines) {
continue
}
funcLine := strings.TrimSpace(lines[j])
if strings.HasPrefix(funcLine, "func ") {
if match := regexp.MustCompile(`func\s+\([^)]+\)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(`).FindStringSubmatch(funcLine); match != nil {
return match[1]
}
if match := regexp.MustCompile(`func\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(`).FindStringSubmatch(funcLine); match != nil {
return match[1]
}
}
}
continue
}
if strings.HasPrefix(line, "func ") {
if match := regexp.MustCompile(`func\s+\([^)]+\)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(`).FindStringSubmatch(line); match != nil {
return match[1]
}
if match := regexp.MustCompile(`func\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(`).FindStringSubmatch(line); match != nil {
return match[1]
}
}
}
return ""
}
+36
View File
@@ -0,0 +1,36 @@
#!/bin/bash
basePath="/Users/coissac/Sync/travail/__MOI__/GO/obitools4"
OUTPUT_FILE="${1:-/dev/stdout}"
# Get all SetAttribute calls
rg -n '\.SetAttribute\(' "$basePath/pkg" --type go | while read -r line; do
file="${line%%:*}"
line_num="${line%:*}"
line_num="${line_num##*:}"
context="${line##*: }"
# Extract key (only literal strings)
key=$(echo "$context" | sed -n 's/.*SetAttribute("\([^"]*\)".*/\1/p')
[ -z "$key" ] && continue
# Get function name using treesitter
func=$(kilo treesitter_cursor_walk \
--file_path "$file" \
--row "$((line_num - 1))" \
--column 0 \
--max_depth 10 2>/dev/null |
jq -r '.ancestors[] | select(.type == "function_declaration" or .type == "method_declaration") | .children[] | select(.type == "identifier" or .type == "field_identifier") | .text' 2>/dev/null)
# Fallback to func_literal for closures
if [ -z "$func" ]; then
func=$(kilo treesitter_cursor_walk \
--file_path "$file" \
--row "$((line_num - 1))" \
--column 0 \
--max_depth 10 2>/dev/null |
jq -r '.ancestors[] | select(.type == "func_literal") | "closure"' 2>/dev/null)
fi
echo "$(basename "$file")|$line_num|$key|${func:-unknown}|$context"
done | sort -t'|' -k1,1 -k2,2n
+308
View File
@@ -0,0 +1,308 @@
{
"obiannotate": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
],
"(git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter.IBioSequence).NumberSequences$1": [
"seq_number"
]
},
"obiclean": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obicleandb": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetCount": [
"count"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obicomplement": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obiconsensus": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetCount": [
"count"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
],
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconsensus.BuildConsensus": [
"obiconsensus_kmer_max_occur",
"obiconsensus_filtered_graph_size",
"obiconsensus_full_graph_size",
"obiconsensus_consensus",
"obiconsensus_weight",
"obiconsensus_seq_length",
"obiconsensus_kmer_size"
],
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconsensus.MinionClusterDenoise": [
"obiconsensus_consensus"
],
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconsensus.MinionDenoise$1": [
"obiconsensus_consensus",
"obiconsensus_weight"
]
},
"obiconvert": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obicount": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obicsv": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obidemerge": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obidistribute": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obigrep": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obijoin": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obikmermatch": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obikmersimcount": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obilandmark": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetCoordinate": [
"landmark_coord"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetOBITagGeomRefIndex": [
"obitag_geomref_index"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
],
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obilandmark.CLISelectLandmarkSequences": [
"landmark_id"
]
},
"obimatrix": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obimicrosat": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obimultiplex": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obipairing": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence)._revcmpMutation": [
"pairing_mismatches"
],
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obialign.BuildQualityConsensus": [
"pairing_mismatches"
]
},
"obipcr": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obireffamidx": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetOBITagRefIndex": [
"obitag_ref_index"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
],
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obirefidx.IndexFamilyDB": [
"reffamidx_id"
]
},
"obirefidx": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetOBITagRefIndex": [
"obitag_ref_index"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obiscript": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obisplit": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obisummary": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obitag": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetPath": [
"taxonomic_path"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
},
"obitagpcr": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obingslibrary.NGSLibrary).ExtractMultiBarcode": [
"obimultiplex_error",
"obimultiplex_amplicon_rank"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence)._revcmpMutation": [
"pairing_mismatches"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence)._subseqMutation": [
"pairing_mismatches"
],
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obialign.BuildQualityConsensus": [
"pairing_mismatches"
]
},
"obitaxonomy": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetPath": [
"taxonomic_path"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
],
"(git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter.IBioSequence).NumberSequences$1": [
"seq_number"
]
},
"obiuniq": {
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetCount": [
"count"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
"definition"
],
"(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
"taxid"
]
}
}
+36
View File
@@ -0,0 +1,36 @@
#!/usr/bin/env python3
"""
Read potentially malformed JSON from stdin (aichat output), extract title and
body, and print them as plain text: title on first line, blank line, then body.
Exits with 1 on failure (no output).
"""
import sys
import json
import re
text = sys.stdin.read()
m = re.search(r'\{.*\}', text, re.DOTALL)
if not m:
sys.exit(1)
s = m.group()
obj = None
try:
obj = json.loads(s)
except Exception:
s2 = re.sub(r'(?<!\\)\n', r'\\n', s)
try:
obj = json.loads(s2)
except Exception:
sys.exit(1)
title = obj.get('title', '').strip()
body = obj.get('body', '').strip()
if not title or not body:
sys.exit(1)
print(f"{title}\n\n{body}")
+1 -1
View File
@@ -1 +1 @@
4.4.18
4.4.42
+19
View File
@@ -0,0 +1,19 @@
```markdown
# DNA Scoring and Matching Utilities in `obialign`
This module provides low-level utilities for computing nucleotide alignment scores using probabilistic and bit-encoded representations.
- **Bit Encoding**: Nucleotides are encoded in 4-bit groups (e.g., `A=0b0001`, `C=0b0010`, etc.), enabling efficient bitwise comparison.
- **`_MatchRatio(a, b)`**: Computes a normalized match ratio between two encoded bytes based on shared bits:
`ratio = common_bits / (bits_in_a × bits_in_b)`.
- **`_FourBitsCount`**: Precomputed lookup table for Hamming weight (popcount) of 4-bit values.
- **Log-space Arithmetic**: Helper functions (`_Logaddexp`, `_Logdiffexp`, `_Log1mexp`) ensure numerical stability in probabilistic computations.
- **Phred-scaled Quality Integration**:
`_MatchScoreRatio(QF, QR)` derives log-odds match/mismatch scores from Phred quality values (`QF`, `QR`), modeling sequencing error probabilities.
- **Precomputed Matrices**:
- `_NucPartMatch[i][j]`: Match ratios for all nucleotide pairs (from 4-bit codes).
- `_NucScorePartMatchMatch/Mismatch[i][j]`: Integer-scaled match/mismatch scores (×10) for quality pairs `(i, j)` in `[0..99]`.
- **Thread-Safe Initialization**: `_InitDNAScoreMatrix()` ensures one-time, synchronized initialization of all scoring tables via a mutex.
Designed for high-performance alignment kernels where speed and numerical robustness are critical.
```