[4.4.2] Enhanced taxonomy handling, input robustness & PCR tag validation

- **obiconvert**: Added `--raw-taxid` mode to output numeric taxIDs without formatting (e.g., "12345" instead of ":tax:NCBI_0987@species"). Introduced `TaxNode.FullString()` to reliably return full formatted strings regardless of global settings, and improved fallback behavior when taxonomy DB is unavailable. - **ngsfilter**: Input fields (primers, sample tags/IDs) are now automatically trimmed of leading/trailing whitespace to prevent parsing failures from inconsistent formatting. - **obitools (pcrtag)**: Mismatch-related fields (`forward_mismatches`, `reverse_mishaps`) renamed to "error" for consistency across annotation dictionaries. - **obipairing & obtagpcr**: Enforced mandatory paired-end file input (`--forward` and `reverse`) in obipairing; added CLI support for generating config templates via AskConfigTemplate(); removed redundant `Required()` constraints and introduced helper function CLIHasPairedFiles().
feat(obiconvert): add --raw-taxid option and refactor taxID formatting
2026-05-01 12:30:39 +00:00 · 2026-04-30 16:57:45 +02:00 · 2026-04-30 16:57:38 +02:00 · 2026-04-30 08:14:39 +02:00 · 2026-04-29 15:29:25 +02:00 · 2026-04-29 15:01:37 +02:00
47 changed files with 1836 additions and 161 deletions
@@ -10,10 +10,10 @@ jobs:
    runs-on: ubuntu-latest
    steps:
    - name: Setup Go
-      uses: actions/setup-go@v2
+      uses: actions/setup-go@v5
      with:
        go-version: '1.23'
    - name: Checkout obitools4 project
-      uses: actions/checkout@v4
+      uses: actions/checkout@v5
    - name: Run tests
      run: make githubtests
@@ -16,9 +16,9 @@ jobs:
      - name: Setup Go
        uses: actions/setup-go@v5
        with:
-          go-version: "1.23"
+          go-version: "1.26"
      - name: Checkout obitools4 project
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5
      - name: Run tests
        run: make githubtests

@@ -49,12 +49,12 @@ jobs:

    steps:
      - name: Checkout code
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5

      - name: Setup Go
        uses: actions/setup-go@v5
        with:
-          go-version: "1.23"
+          go-version: "1.26"

      - name: Extract version from tag
        id: get_version
@@ -62,12 +62,6 @@ jobs:
          TAG=${GITHUB_REF#refs/tags/Release_}
          echo "version=$TAG" >> $GITHUB_OUTPUT

-      - name: Install build tools (Linux)
-        if: runner.os == 'Linux'
-        run: |
-          sudo apt-get update -q
-          sudo apt-get install -y musl-tools
-
      - name: Install build tools (macOS)
        if: runner.os == 'macOS'
        run: |
@@ -75,20 +69,30 @@ jobs:
          xcode-select --install 2>/dev/null || true
          xcode-select -p

-      - name: Build binaries
+      - name: Build binaries (Linux)
+        if: runner.os == 'Linux'
+        env:
+          VERSION: ${{ steps.get_version.outputs.version }}
+        run: |
+          docker run --rm \
+            -v "$(pwd):/src" \
+            -w /src \
+            -e VERSION="${VERSION}" \
+            golang:1.26-alpine \
+            sh -c "apk add --no-cache gcc musl-dev zlib-dev zlib-static make && \
+                   make LDFLAGS='-linkmode=external -extldflags=-static' obitools"
+          mkdir -p artifacts
+          tar -czf artifacts/obitools4_${VERSION}_${{ matrix.output_name }}.tar.gz -C build .
+
+      - name: Build binaries (macOS)
+        if: runner.os == 'macOS'
        env:
          GOOS: ${{ matrix.goos }}
          GOARCH: ${{ matrix.goarch }}
          VERSION: ${{ steps.get_version.outputs.version }}
-          CC: ${{ matrix.goos == 'linux' && 'musl-gcc' || '' }}
        run: |
-          if [ "$GOOS" = "linux" ]; then
-            make LDFLAGS='-linkmode=external -extldflags=-static' obitools
-          else
-            make obitools
-          fi
+          make obitools
          mkdir -p artifacts
-          # Create a single tar.gz with all binaries for this platform
          tar -czf artifacts/obitools4_${VERSION}_${{ matrix.output_name }}.tar.gz -C build .

      - name: Upload artifacts
@@ -103,7 +107,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
-        uses: actions/checkout@v4
+        uses: actions/checkout@v5
        with:
          fetch-depth: 0

@@ -23,7 +23,7 @@ xx
 /.vscode
 /build
 /bugs
-
+autodoc
 /ncbitaxo

 !/obitests/**
@@ -229,7 +229,8 @@ jjpush-tag:
 	install_section=$$'\n## Installation\n\n### Pre-built binaries\n\nDownload the appropriate archive for your system from the\n[release assets](https://github.com/metabarcoding/obitools4/releases/tag/Release_'"$$version"')\nand extract it:\n\n#### Linux (AMD64)\n```bash\ntar -xzf obitools4_'"$$version"'_linux_amd64.tar.gz\n```\n\n#### Linux (ARM64)\n```bash\ntar -xzf obitools4_'"$$version"'_linux_arm64.tar.gz\n```\n\n#### macOS (Intel)\n```bash\ntar -xzf obitools4_'"$$version"'_darwin_amd64.tar.gz\n```\n\n#### macOS (Apple Silicon)\n```bash\ntar -xzf obitools4_'"$$version"'_darwin_arm64.tar.gz\n```\n\nAll OBITools4 binaries are included in each archive.\n\n### From source\n\nYou can also compile and install OBITools4 directly from source using the\ninstallation script:\n\n```bash\ncurl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh | bash -s -- --version '"$$version"'\n```\n\nBy default binaries are installed in `/usr/local/bin`. Use `--install-dir` to\nchange the destination and `--obitools-prefix` to add a prefix to command names:\n\n```bash\ncurl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh | \\\n  bash -s -- --version '"$$version"' --install-dir ~/local --obitools-prefix k\n```\n'; \
 	release_message="$$release_title"$$'\n\n'"$$release_body$$install_section"; \
 	echo "$(BLUE)→ Creating tag $$tag_name...$(NC)"; \
-	git tag -a "$$tag_name" -m "$$release_message" 2>/dev/null || echo "$(YELLOW)⚠ Tag $$tag_name already exists$(NC)"; \
+	commit_hash=$$(jj log -r @ --no-graph -T 'commit_id' 2>/dev/null); \
+	git tag -a "$$tag_name" $${commit_hash:+"$$commit_hash"} -m "$$release_message" 2>/dev/null || echo "$(YELLOW)⚠ Tag $$tag_name already exists$(NC)"; \
 	echo "$(BLUE)→ Pushing tag $$tag_name...$(NC)"; \
 	git push origin "$$tag_name" 2>/dev/null || echo "$(YELLOW)⚠ Tag push failed or already pushed$(NC)"; \
 	rm -f /tmp/obitools4-release-title.txt /tmp/obitools4-release-body.txt
@@ -37,6 +37,11 @@ func main() {

 	optionParser(os.Args)

+	if !obipairing.CLIHasPairedFiles() {
+		log.Error("You must provide both a forward file (-F) and a reverse file (-R)")
+		os.Exit(1)
+	}
+
 	obidefault.SetStrictReadWorker(2)
 	obidefault.SetStrictWriteWorker(2)
 	pairs, err := obipairing.CLIPairedSequence()
@@ -1,6 +1,7 @@
 package main

 import (
+	"fmt"
 	"os"

 	log "github.com/sirupsen/logrus"
@@ -8,6 +9,7 @@ import (
 	"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
 	"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
 	"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
+	"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obimultiplex"
 	"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obipairing"
 	"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obitagpcr"
 	"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
@@ -39,6 +41,17 @@ func main() {
 		obitagpcr.OptionSet)

 	optionParser(os.Args)
+
+	if obimultiplex.CLIAskConfigTemplate() {
+		fmt.Print(obimultiplex.CLIConfigTemplate())
+		os.Exit(0)
+	}
+
+	if !obipairing.CLIHasPairedFiles() {
+		log.Error("You must provide both a forward file (-F) and a reverse file (-R)")
+		os.Exit(1)
+	}
+
 	pairs, err := obipairing.CLIPairedSequence()

 	if err != nil {
@@ -0,0 +1,10 @@
+{
+  "people": [
+    "Software",
+    "Agreement",
+    "Module"
+  ],
+  "projects": [
+    "Code"
+  ]
+}
@@ -6,7 +6,7 @@ require (
 	github.com/DavidGamba/go-getoptions v0.33.0
 	github.com/PaesslerAG/gval v1.2.4
 	github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df
-	github.com/buger/jsonparser v1.1.1
+	github.com/buger/jsonparser v1.1.2
 	github.com/chen3feng/stl4go v0.1.1
 	github.com/dlclark/regexp2 v1.11.5
 	github.com/goccy/go-json v0.10.6
@@ -6,8 +6,8 @@ github.com/PaesslerAG/jsonpath v0.1.0 h1:gADYeifvlqK3R3i2cR5B4DGgxLXIPb3TRTH1mGi
 github.com/PaesslerAG/jsonpath v0.1.0/go.mod h1:4BzmtoM/PI8fPO4aQGIusjGxGir2BzcV0grWtFzq1Y8=
 github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df h1:GSoSVRLoBaFpOOds6QyY1L8AX7uoY+Ln3BHc22W40X0=
 github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df/go.mod h1:hiVxq5OP2bUGBRNS3Z/bt/reCLFNbdcST6gISi1fiOM=
-github.com/buger/jsonparser v1.1.1 h1:2PnMjfWD7wBILjqQbt530v576A/cAbQvEW9gGIpYMUs=
-github.com/buger/jsonparser v1.1.1/go.mod h1:6RYKKt7H4d4+iWqouImQ9R2FZql3VbhNgx27UK13J/0=
+github.com/buger/jsonparser v1.1.2 h1:frqHqw7otoVbk5M8LlE/L7HTnIq2v9RX6EJ48i9AxJk=
+github.com/buger/jsonparser v1.1.2/go.mod h1:6RYKKt7H4d4+iWqouImQ9R2FZql3VbhNgx27UK13J/0=
 github.com/chen3feng/stl4go v0.1.1 h1:0L1+mDw7pomftKDruM23f1mA7miavOj6C6MZeadzN2Q=
 github.com/chen3feng/stl4go v0.1.1/go.mod h1:5ml3psLgETJjRJnMbPE+JiHLrCpt+Ajc2weeTECXzWU=
 github.com/chengxilo/virtualterm v1.0.4 h1:Z6IpERbRVlfB8WkOmtbHiDbBANU7cimRIof7mk9/PwM=
@@ -52,6 +52,7 @@ golang.org/x/image v0.6.0/go.mod h1:MXLdDR43H7cDJq5GEGXEVeeNhPgi+YYEQ2pC1byI1x0=
 golang.org/x/mod v0.13.0 h1:I/DsJXRlw/8l/0c24sM9yb0T4z9liZTduXvdAWYiysY=
 golang.org/x/mod v0.13.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
 golang.org/x/mod v0.14.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
+golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8=
 golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
 golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4 h1:uVc8UZUe6tr40fFVnUP5Oj+veunVezqYl9z7DYw9xzw=
 golang.org/x/text v0.13.0 h1:ablQoSUd0tRdKxZewP80B+BaqeKJuVhuRxj/dkrun3k=
@@ -0,0 +1,24 @@
+>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obitag_bestid":0.9797979797979798,"obitag_bestmatch":"AY227529","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9992 [Marmota]@genus"}
+ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
+gcctgaaactcaaaggacttggcggtgctttacatccct
+>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obitag_bestid":0.9405940594059405,"obitag_bestmatch":"AF154263","obitag_match_count":9,"obitag_rank":"infraorder","obitag_similarity_method":"lcs","taxid":"taxon:35500 [Pecora]@infraorder"}
+ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
+agcttaaaactcaaaggacttggcggtgctttataccctt
+>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obitag_bestid":1,"obitag_bestmatch":"AB245427","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9860 [Cervus elaphus]@species"}
+ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
+agcttaaaactcaaaggacttggcggtgctttataccctt
+>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obitag_bestid":0.9494949494949495,"obitag_bestmatch":"AY227530","obitag_match_count":2,"obitag_rank":"tribe","obitag_similarity_method":"lcs","taxid":"taxon:337730 [Marmotini]@tribe"}
+ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
+gcttaaaactcaaaggacttggcggtgctttatatccct
+>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obitag_bestid":0.9595959595959596,"obitag_bestmatch":"AC187326","obitag_match_count":1,"obitag_rank":"subspecies","obitag_similarity_method":"lcs","taxid":"taxon:9615 [Canis lupus familiaris]@subspecies"}
+ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
+gattaaacctcaaaggacttggcagtgctttatacccct
+>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obitag_bestid":1,"obitag_bestmatch":"AJ885202","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
+ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
+agcttaaaactcaaaggacttggcggtgctttataccctt
+>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obitag_bestid":1,"obitag_bestmatch":"AJ972683","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
+ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
+agcttaaaactcaaaggacttggcggtgctttataccctt
+>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obitag_bestid":1,"obitag_bestmatch":"AB048590","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9611 [Canis]@genus"}
+ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
+gcttaaaactcaaaggacttggcggtgctttatatccct
@@ -0,0 +1,48 @@
+taxid,parent,taxonomic_rank,scientific_name
+taxon:1 [root]@no rank,taxon:1 [root]@no rank,no rank,root
+taxon:131567 [cellular organisms]@cellular root,taxon:1 [root]@no rank,cellular root,cellular organisms
+taxon:2759 [Eukaryota]@domain,taxon:131567 [cellular organisms]@cellular root,domain,Eukaryota
+taxon:33154 [Opisthokonta]@clade,taxon:2759 [Eukaryota]@domain,clade,Opisthokonta
+taxon:33208 [Metazoa]@kingdom,taxon:33154 [Opisthokonta]@clade,kingdom,Metazoa
+taxon:6072 [Eumetazoa]@clade,taxon:33208 [Metazoa]@kingdom,clade,Eumetazoa
+taxon:33213 [Bilateria]@clade,taxon:6072 [Eumetazoa]@clade,clade,Bilateria
+taxon:33511 [Deuterostomia]@clade,taxon:33213 [Bilateria]@clade,clade,Deuterostomia
+taxon:7711 [Chordata]@phylum,taxon:33511 [Deuterostomia]@clade,phylum,Chordata
+taxon:89593 [Craniata]@subphylum,taxon:7711 [Chordata]@phylum,subphylum,Craniata
+taxon:7742 [Vertebrata]@clade,taxon:89593 [Craniata]@subphylum,clade,Vertebrata
+taxon:7776 [Gnathostomata]@clade,taxon:7742 [Vertebrata]@clade,clade,Gnathostomata
+taxon:117570 [Teleostomi]@clade,taxon:7776 [Gnathostomata]@clade,clade,Teleostomi
+taxon:117571 [Euteleostomi]@clade,taxon:117570 [Teleostomi]@clade,clade,Euteleostomi
+taxon:8287 [Sarcopterygii]@superclass,taxon:117571 [Euteleostomi]@clade,superclass,Sarcopterygii
+taxon:1338369 [Dipnotetrapodomorpha]@clade,taxon:8287 [Sarcopterygii]@superclass,clade,Dipnotetrapodomorpha
+taxon:32523 [Tetrapoda]@clade,taxon:1338369 [Dipnotetrapodomorpha]@clade,clade,Tetrapoda
+taxon:32524 [Amniota]@clade,taxon:32523 [Tetrapoda]@clade,clade,Amniota
+taxon:40674 [Mammalia]@class,taxon:32524 [Amniota]@clade,class,Mammalia
+taxon:32525 [Theria]@clade,taxon:40674 [Mammalia]@class,clade,Theria
+taxon:9347 [Eutheria]@clade,taxon:32525 [Theria]@clade,clade,Eutheria
+taxon:1437010 [Boreoeutheria]@clade,taxon:9347 [Eutheria]@clade,clade,Boreoeutheria
+taxon:314146 [Euarchontoglires]@superorder,taxon:1437010 [Boreoeutheria]@clade,superorder,Euarchontoglires
+taxon:314145 [Laurasiatheria]@superorder,taxon:1437010 [Boreoeutheria]@clade,superorder,Laurasiatheria
+taxon:33554 [Carnivora]@order,taxon:314145 [Laurasiatheria]@superorder,order,Carnivora
+taxon:91561 [Artiodactyla]@order,taxon:314145 [Laurasiatheria]@superorder,order,Artiodactyla
+taxon:314147 [Glires]@clade,taxon:314146 [Euarchontoglires]@superorder,clade,Glires
+taxon:9845 [Ruminantia]@suborder,taxon:91561 [Artiodactyla]@order,suborder,Ruminantia
+taxon:35500 [Pecora]@infraorder,taxon:9845 [Ruminantia]@suborder,infraorder,Pecora
+taxon:9989 [Rodentia]@order,taxon:314147 [Glires]@clade,order,Rodentia
+taxon:379584 [Caniformia]@suborder,taxon:33554 [Carnivora]@order,suborder,Caniformia
+taxon:9608 [Canidae]@family,taxon:379584 [Caniformia]@suborder,family,Canidae
+taxon:9850 [Cervidae]@family,taxon:35500 [Pecora]@infraorder,family,Cervidae
+taxon:9881 [Odocoileinae]@subfamily,taxon:9850 [Cervidae]@family,subfamily,Odocoileinae
+taxon:33553 [Sciuromorpha]@suborder,taxon:9989 [Rodentia]@order,suborder,Sciuromorpha
+taxon:55153 [Sciuridae]@family,taxon:33553 [Sciuromorpha]@suborder,family,Sciuridae
+taxon:34878 [Cervinae]@subfamily,taxon:9850 [Cervidae]@family,subfamily,Cervinae
+taxon:9611 [Canis]@genus,taxon:9608 [Canidae]@family,genus,Canis
+taxon:9857 [Capreolus]@genus,taxon:9881 [Odocoileinae]@subfamily,genus,Capreolus
+taxon:9612 [Canis lupus]@species,taxon:9611 [Canis]@genus,species,Canis lupus
+taxon:337726 [Xerinae]@subfamily,taxon:55153 [Sciuridae]@family,subfamily,Xerinae
+taxon:9859 [Cervus]@genus,taxon:34878 [Cervinae]@subfamily,genus,Cervus
+taxon:337730 [Marmotini]@tribe,taxon:337726 [Xerinae]@subfamily,tribe,Marmotini
+taxon:9992 [Marmota]@genus,taxon:337730 [Marmotini]@tribe,genus,Marmota
+taxon:9860 [Cervus elaphus]@species,taxon:9859 [Cervus]@genus,species,Cervus elaphus
+taxon:9615 [Canis lupus familiaris]@subspecies,taxon:9612 [Canis lupus]@species,subspecies,Canis lupus familiaris
+taxon:9858 [Capreolus capreolus]@species,taxon:9857 [Capreolus]@genus,species,Capreolus capreolus
@@ -44,7 +44,7 @@ cleanup() {
    rm -rf "$TMPDIR"  # Suppress the temporary directory

    if [ $failed -gt 0 ]; then
-       log "$TEST_NAME tests failed" 
+       log "$TEST_NAME tests failed"
        log
        log
       exit 1
@@ -60,10 +60,10 @@ log() {
    echo -e "[$TEST_NAME @ $(date)] $*" 1>&2
 }

-log "Testing $TEST_NAME..." 
-log "Test directory is $TEST_DIR" 
-log "obitools directory is $OBITOOLS_DIR" 
-log "Temporary directory is $TMPDIR" 
+log "Testing $TEST_NAME..."
+log "Test directory is $TEST_DIR"
+log "obitools directory is $OBITOOLS_DIR"
+log "Temporary directory is $TMPDIR"
 log "files: $(find $TEST_DIR | awk -F'/' '{print $NF}' | tail -n +2)"

 ######################################################################
@@ -94,12 +94,12 @@ log "files: $(find $TEST_DIR | awk -F'/' '{print $NF}' | tail -n +2)"


 ((ntest++))
-if $CMD -h > "${TMPDIR}/help.txt" 2>&1 
+if $CMD -h > "${TMPDIR}/help.txt" 2>&1
 then
-    log "$MCMD: printing help OK" 
+    log "$MCMD: printing help OK"
    ((success++))
 else
-    log "$MCMD: printing help failed" 
+    log "$MCMD: printing help failed"
    ((failed++))
 fi

@@ -108,15 +108,15 @@ fi
 if obiconvert -Z "${TEST_DIR}/gbpln1088.4Mb.fasta.gz" \
                 > "${TMPDIR}/xxx.fasta.gz" && \
   zdiff "${TEST_DIR}/gbpln1088.4Mb.fasta.gz" \
-                 "${TMPDIR}/xxx.fasta.gz" 
+                 "${TMPDIR}/xxx.fasta.gz"
 then
-    log "$MCMD: converting large fasta file to fasta OK" 
+    log "$MCMD: converting large fasta file to fasta OK"
    ((success++))
 else
-    log "$MCMD: converting large fasta file to fasta failed" 
+    log "$MCMD: converting large fasta file to fasta failed"
    ((failed++))
 fi
- 
+
 ((ntest++))
 if obiconvert -Z --fastq-output \
              "${TEST_DIR}/gbpln1088.4Mb.fasta.gz" \
@@ -125,15 +125,139 @@ if obiconvert -Z --fastq-output \
              "${TMPDIR}/xxx.fastq.gz" \
              > "${TMPDIR}/yyy.fasta.gz" && \
   zdiff "${TEST_DIR}/gbpln1088.4Mb.fasta.gz" \
-                 "${TMPDIR}/yyy.fasta.gz" 
+                 "${TMPDIR}/yyy.fasta.gz"
 then
-    log "$MCMD: converting large file between fasta and fastq OK" 
+    log "$MCMD: converting large file between fasta and fastq OK"
    ((success++))
 else
-    log "$MCMD: converting large file between fasta and fastq failed" 
+    log "$MCMD: converting large file between fasta and fastq failed"
    ((failed++))
 fi

+
+# ------------------------------------------------------------------
+# --raw-taxid tests (no taxonomy loaded)
+# ------------------------------------------------------------------
+
+# Running test
+((ntest++))
+if obiconvert --raw-taxid "${TEST_DIR}/out_ecotag.fasta" \
+              > "${TMPDIR}/raw_taxid.fasta" 2>/dev/null
+then
+    log "$MCMD --raw-taxid: running OK"
+    ((success++))
+else
+    log "$MCMD --raw-taxid: running failed"
+    ((failed++))
+fi
+
+# Taxids must be bare numbers — no full-format "taxon:ID [Name]@rank" strings
+((ntest++))
+if grep '"taxid"' "${TMPDIR}/raw_taxid.fasta" | grep -qv '"taxid":"[0-9][0-9]*"'
+then
+    log "$MCMD --raw-taxid: taxid format check failed (full-format taxid found)"
+    ((failed++))
+else
+    log "$MCMD --raw-taxid: taxid format OK (all taxids are bare numbers)"
+    ((success++))
+fi
+
+# --raw-taxid is idempotent: piping through a second obiconvert --raw-taxid must
+# produce bit-for-bit identical output.
+((ntest++))
+if obiconvert --raw-taxid "${TMPDIR}/raw_taxid.fasta" \
+              > "${TMPDIR}/raw_taxid2.fasta" 2>/dev/null
+then
+    log "$MCMD --raw-taxid piped: running OK"
+    ((success++))
+else
+    log "$MCMD --raw-taxid piped: running failed"
+    ((failed++))
+fi
+
+((ntest++))
+if diff "${TMPDIR}/raw_taxid.fasta" \
+        "${TMPDIR}/raw_taxid2.fasta" > /dev/null
+then
+    log "$MCMD --raw-taxid piped: idempotency OK"
+    ((success++))
+else
+    log "$MCMD --raw-taxid piped: idempotency failed (outputs differ)"
+    ((failed++))
+fi
+
+
+# ------------------------------------------------------------------
+# --taxonomy tests (full-format taxid, no --raw-taxid)
+# ------------------------------------------------------------------
+
+# Running test
+((ntest++))
+if obiconvert --taxonomy "${TEST_DIR}/taxonomy.csv" \
+              "${TEST_DIR}/out_ecotag.fasta" \
+              > "${TMPDIR}/taxo.fasta" 2>/dev/null
+then
+    log "$MCMD --taxonomy: running OK"
+    ((success++))
+else
+    log "$MCMD --taxonomy: running failed"
+    ((failed++))
+fi
+
+# Taxids must be in full "taxon:ID [Name]@rank" format
+((ntest++))
+if grep '"taxid"' "${TMPDIR}/taxo.fasta" | grep -q '"taxid":"taxon:[0-9]'
+then
+    log "$MCMD --taxonomy: taxid format OK (full-format taxids present)"
+    ((success++))
+else
+    log "$MCMD --taxonomy: taxid format check failed (no full-format taxid found)"
+    ((failed++))
+fi
+
+
+# ------------------------------------------------------------------
+# --raw-taxid --taxonomy tests
+# ------------------------------------------------------------------
+
+# Running test
+((ntest++))
+if obiconvert --raw-taxid --taxonomy "${TEST_DIR}/taxonomy.csv" \
+              "${TEST_DIR}/out_ecotag.fasta" \
+              > "${TMPDIR}/raw_taxid_taxo.fasta" 2>/dev/null
+then
+    log "$MCMD --raw-taxid --taxonomy: running OK"
+    ((success++))
+else
+    log "$MCMD --raw-taxid --taxonomy: running failed"
+    ((failed++))
+fi
+
+# Taxids must be bare numbers even when taxonomy is loaded
+((ntest++))
+if grep '"taxid"' "${TMPDIR}/raw_taxid_taxo.fasta" | grep -qv '"taxid":"[0-9][0-9]*"'
+then
+    log "$MCMD --raw-taxid --taxonomy: taxid format check failed (full-format taxid found)"
+    ((failed++))
+else
+    log "$MCMD --raw-taxid --taxonomy: taxid format OK (all taxids are bare numbers)"
+    ((success++))
+fi
+
+# --raw-taxid with or without taxonomy must yield identical taxid values
+((ntest++))
+if diff <(grep '"taxid"' "${TMPDIR}/raw_taxid.fasta" | grep -o '"taxid":"[^"]*"' | sort) \
+        <(grep '"taxid"' "${TMPDIR}/raw_taxid_taxo.fasta" | grep -o '"taxid":"[^"]*"' | sort) \
+        > /dev/null
+then
+    log "$MCMD --raw-taxid vs --raw-taxid --taxonomy: taxid values match OK"
+    ((success++))
+else
+    log "$MCMD --raw-taxid vs --raw-taxid --taxonomy: taxid values differ (unexpected)"
+    ((failed++))
+fi
+
+
 #########################################
 #
 # At the end of the tests
@@ -0,0 +1,24 @@
+>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obitag_bestid":0.9797979797979798,"obitag_bestmatch":"AY227529","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9992 [Marmota]@genus"}
+ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
+gcctgaaactcaaaggacttggcggtgctttacatccct
+>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obitag_bestid":0.9405940594059405,"obitag_bestmatch":"AF154263","obitag_match_count":9,"obitag_rank":"infraorder","obitag_similarity_method":"lcs","taxid":"taxon:35500 [Pecora]@infraorder"}
+ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
+agcttaaaactcaaaggacttggcggtgctttataccctt
+>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obitag_bestid":1,"obitag_bestmatch":"AB245427","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9860 [Cervus elaphus]@species"}
+ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
+agcttaaaactcaaaggacttggcggtgctttataccctt
+>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obitag_bestid":0.9494949494949495,"obitag_bestmatch":"AY227530","obitag_match_count":2,"obitag_rank":"tribe","obitag_similarity_method":"lcs","taxid":"taxon:337730 [Marmotini]@tribe"}
+ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
+gcttaaaactcaaaggacttggcggtgctttatatccct
+>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obitag_bestid":0.9595959595959596,"obitag_bestmatch":"AC187326","obitag_match_count":1,"obitag_rank":"subspecies","obitag_similarity_method":"lcs","taxid":"taxon:9615 [Canis lupus familiaris]@subspecies"}
+ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
+gattaaacctcaaaggacttggcagtgctttatacccct
+>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obitag_bestid":1,"obitag_bestmatch":"AJ885202","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
+ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
+agcttaaaactcaaaggacttggcggtgctttataccctt
+>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obitag_bestid":1,"obitag_bestmatch":"AJ972683","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
+ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
+agcttaaaactcaaaggacttggcggtgctttataccctt
+>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obitag_bestid":1,"obitag_bestmatch":"AB048590","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9611 [Canis]@genus"}
+ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
+gcttaaaactcaaaggacttggcggtgctttatatccct
@@ -631,9 +631,9 @@ func ReadCSVNGSFilter(reader io.Reader) (*obingslibrary.NGSLibrary, error) {
 			return nil, fmt.Errorf("row %d has %d columns, expected %d", len(data), len(fields), len(header))
 		}

-		forward_primer := fields[forward_primerColIndex]
-		reverse_primer := fields[reverse_primerColIndex]
-		tags := _parseMainNGSFilterTags(fields[sample_tagColIndex])
+		forward_primer := strings.TrimSpace(fields[forward_primerColIndex])
+		reverse_primer := strings.TrimSpace(fields[reverse_primerColIndex])
+		tags := _parseMainNGSFilterTags(strings.TrimSpace(fields[sample_tagColIndex]))

 		marker, _ := ngsfilter.GetMarker(forward_primer, reverse_primer)
 		pcr, ok := marker.GetPCR(tags.Forward, tags.Reverse)
@@ -644,8 +644,8 @@ func ReadCSVNGSFilter(reader io.Reader) (*obingslibrary.NGSLibrary, error) {
 					i, tags.Forward, tags.Reverse, forward_primer, reverse_primer)
 		}

-		pcr.Experiment = fields[experimentColIndex]
-		pcr.Sample = fields[sampleColIndex]
+		pcr.Experiment = strings.TrimSpace(fields[experimentColIndex])
+		pcr.Sample = strings.TrimSpace(fields[sampleColIndex])

 		if extraColumns != nil {
 			pcr.Annotations = make(obiseq.Annotation)
@@ -57,34 +57,21 @@ func (dist *IDistribute) Classifier() *obiseq.BioSequenceClassifier {
 }

 // Distribute organizes the biosequences from the iterator into batches
-// based on the provided classifier and batch sizes. It returns an
-// IDistribute instance that manages the distribution of the sequences.
+// based on the provided classifier. It returns an IDistribute instance
+// that manages the distribution of the sequences.
 //
-// Parameters:
-//   - class: A pointer to a BioSequenceClassifier used to classify
-//     the biosequences during distribution.
-//   - sizes: Optional integer values specifying the batch size. If
-//     no sizes are provided, a default batch size of 5000 is used.
-//
-// Returns:
-// An IDistribute instance that contains the outputs of the
-// classified biosequences, a channel for new data notifications,
-// and the classifier used for distribution. The method operates
-// asynchronously, processing the sequences in separate goroutines.
-// It ensures that the outputs are closed and cleaned up once
-// processing is complete.
-func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, sizes ...int) IDistribute {
-	batchsize := obidefault.BatchSize()
+// Batches are flushed when either BatchSizeMax() sequences or BatchMem()
+// bytes are accumulated per key, mirroring the RebatchBySize strategy.
+func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier) IDistribute {
+	maxCount := obidefault.BatchSizeMax()
+	maxBytes := obidefault.BatchMem()

 	outputs := make(map[int]IBioSequence, 100)
 	slices := make(map[int]*obiseq.BioSequenceSlice, 100)
+	bufBytes := make(map[int]int, 100)
 	orders := make(map[int]int, 100)
 	news := make(chan int)

-	if len(sizes) > 0 {
-		batchsize = sizes[0]
-	}
-
 	jobDone := sync.WaitGroup{}
 	lock := sync.Mutex{}

@@ -115,6 +102,7 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
 					slice = &s
 					slices[key] = slice
 					orders[key] = 0
+					bufBytes[key] = 0

 					lock.Lock()
 					outputs[key] = MakeIBioSequence()
@@ -123,14 +111,20 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
 					news <- key
 				}

-				*slice = append(*slice, s)
-
-				if len(*slice) == batchsize {
+				sz := s.MemorySize()
+				countFull := maxCount > 0 && len(*slice) >= maxCount
+				memFull := maxBytes > 0 && bufBytes[key]+sz > maxBytes && len(*slice) > 0
+				if countFull || memFull {
 					outputs[key].Push(MakeBioSequenceBatch(source, orders[key], *slice))
 					orders[key]++
 					s := obiseq.MakeBioSequenceSlice()
 					slices[key] = &s
+					slice = &s
+					bufBytes[key] = 0
 				}
+
+				*slice = append(*slice, s)
+				bufBytes[key] += sz
 			}
 		}

@@ -47,7 +47,7 @@ func Encode4mer(seq *obiseq.BioSequence, buffer *[]byte) []byte {
 	length := slength - 3
 	rawseq := seq.Sequence()

-	if length < 0 {
+	if length <= 0 {
 		return nil
 	}

@@ -91,7 +91,7 @@ func LuaWorker(proto *lua.FunctionProto) obiseq.SeqWorker {
 	err := interpreter.PCall(0, lua.MultRet, nil)

 	if err != nil {
-		log.Fatalf("Error in executing the lua script")
+		log.Fatalf("Error in executing the lua script: %v", err)
 	}

 	result := interpreter.GetGlobal("worker")
@@ -141,6 +141,69 @@ func LuaWorker(proto *lua.FunctionProto) obiseq.SeqWorker {
 	return nil
 }

+// LuaSliceWorker creates a SeqSliceWorker that calls the Lua function
+// named "slice_worker". Unlike LuaWorker, the entire batch (BioSequenceSlice)
+// is passed to the Lua function at once, enabling batch-level processing
+// (e.g. a single HTTP request per batch instead of one per sequence).
+//
+// The Lua function signature:
+//
+//	function slice_worker(slice)   -- receives a BioSequenceSlice
+//	    -- process the batch
+//	    return slice               -- returns a BioSequenceSlice (or nil)
+//	end
+func LuaSliceWorker(proto *lua.FunctionProto) obiseq.SeqSliceWorker {
+	interpreter := NewInterpreter()
+	lfunc := interpreter.NewFunctionFromProto(proto)
+	interpreter.Push(lfunc)
+	err := interpreter.PCall(0, lua.MultRet, nil)
+
+	if err != nil {
+		log.Fatalf("Error in executing the lua script: %v", err)
+	}
+
+	result := interpreter.GetGlobal("slice_worker")
+
+	if lua_worker, ok := result.(*lua.LFunction); ok {
+		f := func(slice obiseq.BioSequenceSlice) (obiseq.BioSequenceSlice, error) {
+			if err := interpreter.CallByParam(lua.P{
+				Fn:      lua_worker,
+				NRet:    1,
+				Protect: true,
+			}, obiseqslice2Lua(interpreter, &slice)); err != nil {
+				log.Fatal(err)
+			}
+
+			lreponse := interpreter.Get(-1)
+			defer interpreter.Pop(1)
+
+			if reponse, ok := lreponse.(*lua.LUserData); ok {
+				s := reponse.Value
+				switch val := s.(type) {
+				case *obiseq.BioSequenceSlice:
+					return *val, nil
+				case *obiseq.BioSequence:
+					return obiseq.BioSequenceSlice{val}, nil
+				default:
+					r := reflect.TypeOf(val)
+					return nil, fmt.Errorf("slice_worker function doesn't return the correct type %s", r)
+				}
+			}
+
+			if _, ok = lreponse.(*lua.LNilType); ok {
+				return nil, nil
+			}
+
+			return nil, fmt.Errorf("slice_worker function doesn't return the correct type %T", lreponse)
+		}
+
+		return f
+	}
+
+	log.Fatalf("The slice_worker object is not a function")
+	return nil
+}
+
 // LuaProcessor processes a Lua script on a sequence iterator and returns a new iterator.
 //
 // Parameters:
@@ -173,7 +236,7 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
 	err = interpreter.PCall(0, lua.MultRet, nil)

 	if err != nil {
-		log.Fatalf("Error in executing the lua script")
+		log.Fatalf("Error in executing the lua script: %v", err)
 	}

 	result := interpreter.GetGlobal("begin")
@@ -198,7 +261,7 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
 		err = interpreter.PCall(0, lua.MultRet, nil)

 		if err != nil {
-			log.Fatalf("Error in executing the lua script")
+			log.Fatalf("Error in executing the lua script: %v", err)
 		}

 		result := interpreter.GetGlobal("finish")
@@ -216,11 +279,27 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr

 	}()

-	ff := func(iterator obiiter.IBioSequence) {
-		w := LuaWorker(proto)
-		sw := obiseq.SeqToSliceWorker(w, false)
+	// Detect whether the script defines slice_worker (batch-level) or worker (per-sequence).
+	hasSliceWorker := func() bool {
+		interpreter := NewInterpreter()
+		lfunc := interpreter.NewFunctionFromProto(proto)
+		interpreter.Push(lfunc)
+		if err := interpreter.PCall(0, lua.MultRet, nil); err != nil {
+			return false
+		}
+		result := interpreter.GetGlobal("slice_worker")
+		interpreter.Close()
+		_, ok := result.(*lua.LFunction)
+		return ok
+	}()

-		// iterator = iterator.SortBatches()
+	ff := func(iterator obiiter.IBioSequence) {
+		var sw obiseq.SeqSliceWorker
+		if hasSliceWorker {
+			sw = LuaSliceWorker(proto)
+		} else {
+			sw = obiseq.SeqToSliceWorker(LuaWorker(proto), false)
+		}

 		for iterator.Next() {
 			seqs := iterator.Get()
@@ -235,6 +314,10 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
 				}
 			}

+			if ns == nil {
+				ns = obiseq.BioSequenceSlice{}
+			}
+
 			newIter.Push(obiiter.MakeBioSequenceBatch(seqs.Source(), seqs.Order(), ns))
 		}

@@ -17,15 +17,7 @@ import (
 // No return values. This function operates directly on the Lua state stack.
 func pushInterfaceToLua(L *lua.LState, val interface{}) {
 	switch v := val.(type) {
-	case string:
-		L.Push(lua.LString(v))
-	case bool:
-		L.Push(lua.LBool(v))
-	case int:
-		L.Push(lua.LNumber(v))
-	case float64:
-		L.Push(lua.LNumber(v))
-	// Add other cases as needed for different types
+	// Typed slices and maps from internal OBITools code — not produced by json.Unmarshal
 	case map[string]int:
 		pushMapStringIntToLua(L, v)
 	case map[string]string:
@@ -34,8 +26,6 @@ func pushInterfaceToLua(L *lua.LState, val interface{}) {
 		pushMapStringBoolToLua(L, v)
 	case map[string]float64:
 		pushMapStringFloat64ToLua(L, v)
-	case map[string]interface{}:
-		pushMapStringInterfaceToLua(L, v)
 	case []string:
 		pushSliceStringToLua(L, v)
 	case []int:
@@ -46,63 +36,63 @@ func pushInterfaceToLua(L *lua.LState, val interface{}) {
 		pushSliceNumericToLua(L, v)
 	case []bool:
 		pushSliceBoolToLua(L, v)
-	case []interface{}:
-		pushSliceInterfaceToLua(L, v)
-	case nil:
-		L.Push(lua.LNil)
 	case *sync.Mutex:
 		pushMutexToLua(L, v)
 	default:
-		log.Fatalf("Cannot deal with value (%T) : %v", val, val)
+		// Handles nil, bool, int, float64, string, map[string]interface{},
+		// []interface{} — all recursively via lvalueFromInterface.
+		L.Push(lvalueFromInterface(L, v))
 	}
 }

 func pushMapStringInterfaceToLua(L *lua.LState, m map[string]interface{}) {
-	// Create a new Lua table
 	luaTable := L.NewTable()
-	// Iterate over the Go map and set the key-value pairs in the Lua table
 	for key, value := range m {
-		switch v := value.(type) {
-		case int:
-			luaTable.RawSetString(key, lua.LNumber(v))
-		case float64:
-			luaTable.RawSetString(key, lua.LNumber(v))
-		case bool:
-			luaTable.RawSetString(key, lua.LBool(v))
-		case string:
-			luaTable.RawSetString(key, lua.LString(v))
-		default:
-			log.Fatalf("Doesn't deal with map containing value %v of type %T", v, v)
-		}
+		L.SetField(luaTable, key, lvalueFromInterface(L, value))
 	}
-
-	// Push the Lua table onto the stack
 	L.Push(luaTable)
 }

 func pushSliceInterfaceToLua(L *lua.LState, s []interface{}) {
-	// Create a new Lua table
 	luaTable := L.NewTable()
-	// Iterate over the Go map and set the key-value pairs in the Lua table
 	for _, value := range s {
-		switch v := value.(type) {
-		case int:
-			luaTable.Append(lua.LNumber(v))
-		case float64:
-			luaTable.Append(lua.LNumber(v))
-		case bool:
-			luaTable.Append(lua.LBool(v))
-		case string:
-			luaTable.Append(lua.LString(v))
-		default:
-			log.Fatalf("Doesn't deal with slice containing value %v of type %T", v, v)
-		}
+		luaTable.Append(lvalueFromInterface(L, value))
 	}
-
-	// Push the Lua table onto the stack
 	L.Push(luaTable)
 }

+// lvalueFromInterface converts a Go interface{} value (as produced by json.Unmarshal)
+// to the corresponding lua.LValue, handling nested maps and slices recursively.
+func lvalueFromInterface(L *lua.LState, value interface{}) lua.LValue {
+	switch v := value.(type) {
+	case nil:
+		return lua.LNil
+	case bool:
+		return lua.LBool(v)
+	case int:
+		return lua.LNumber(v)
+	case float64:
+		return lua.LNumber(v)
+	case string:
+		return lua.LString(v)
+	case map[string]interface{}:
+		t := L.NewTable()
+		for key, val := range v {
+			L.SetField(t, key, lvalueFromInterface(L, val))
+		}
+		return t
+	case []interface{}:
+		t := L.NewTable()
+		for _, val := range v {
+			t.Append(lvalueFromInterface(L, val))
+		}
+		return t
+	default:
+		log.Fatalf("lvalueFromInterface: unsupported type %T: %v", v, v)
+		return lua.LNil
+	}
+}
+
 // pushMapStringIntToLua creates a new Lua table and iterates over the Go map to set key-value pairs in the Lua table. It then pushes the Lua table onto the stack.
 //
 // L *lua.LState - the Lua state
@@ -28,6 +28,8 @@ func Table2Interface(interpreter *lua.LState, table *lua.LTable) interface{} {
 				val[i-1] = float64(v.(lua.LNumber))
 			case lua.LTString:
 				val[i-1] = string(v.(lua.LString))
+			case lua.LTTable:
+				val[i-1] = Table2Interface(interpreter, v.(*lua.LTable))
 			}
 		}
 		return val
@@ -45,6 +47,8 @@ func Table2Interface(interpreter *lua.LState, table *lua.LTable) interface{} {
 					val[string(ks)] = float64(v.(lua.LNumber))
 				case lua.LTString:
 					val[string(ks)] = string(v.(lua.LString))
+				case lua.LTTable:
+					val[string(ks)] = Table2Interface(interpreter, v.(*lua.LTable))
 				}
 			}
 		})
@@ -0,0 +1,128 @@
+package obilua
+
+import (
+	"context"
+	"io"
+	"net/http"
+	"strings"
+	"sync"
+	"time"
+
+	"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
+	lua "github.com/yuin/gopher-lua"
+)
+
+const httpClientTimeout = 300 * time.Second
+
+var (
+	_httpClient     *http.Client
+	_httpClientOnce sync.Once
+
+	// _httpSemaphore limits the number of concurrent HTTP requests.
+	// Initialised lazily alongside the client.
+	_httpSemaphore chan struct{}
+)
+
+func getHTTPClient() *http.Client {
+	_httpClientOnce.Do(func() {
+		conns := 2 * obidefault.ParallelWorkers()
+		_httpClient = &http.Client{
+			Transport: &http.Transport{
+				MaxIdleConnsPerHost: conns,
+				MaxConnsPerHost:     conns,
+				IdleConnTimeout:     90 * time.Second,
+			},
+			Timeout: httpClientTimeout,
+		}
+		_httpSemaphore = make(chan struct{}, obidefault.ParallelWorkers())
+	})
+	return _httpClient
+}
+
+// RegisterHTTP registers the http module in the Lua state as a global,
+// consistent with obicontext and BioSequence.
+//
+// Exposes:
+//
+//	http.post(url, body [, timeout_ms]) → response string  (on success)
+//	http.post(url, body [, timeout_ms]) → nil, err string  (on error)
+//	http.set_concurrency(n)             → set max simultaneous requests
+func RegisterHTTP(luaState *lua.LState) {
+	table := luaState.NewTable()
+	luaState.SetField(table, "post", luaState.NewFunction(luaHTTPPost))
+	luaState.SetField(table, "set_concurrency", luaState.NewFunction(luaHTTPSetConcurrency))
+	luaState.SetGlobal("http", table)
+}
+
+// luaHTTPPost implements http.post(url, body [, timeout_ms]) for Lua.
+//
+// The optional third argument overrides the default timeout (in milliseconds).
+// Concurrent requests are throttled through _httpSemaphore so that a
+// single-threaded backend server is not overwhelmed by K parallel workers.
+//
+// Lua signature:
+//
+//	local response          = http.post(url, body)
+//	local response          = http.post(url, body, 5000)   -- 5 s timeout
+//	local response, err     = http.post(url, body)
+func luaHTTPPost(L *lua.LState) int {
+	url := L.CheckString(1)
+	body := L.CheckString(2)
+
+	client := getHTTPClient()
+
+	timeout := httpClientTimeout
+	if L.GetTop() >= 3 {
+		ms := L.CheckInt(3)
+		timeout = time.Duration(ms) * time.Millisecond
+	}
+
+	// Acquire semaphore slot — blocks until a slot is free.
+	_httpSemaphore <- struct{}{}
+	defer func() { <-_httpSemaphore }()
+
+	ctx, cancel := context.WithTimeout(context.Background(), timeout)
+	defer cancel()
+
+	req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, strings.NewReader(body))
+	if err != nil {
+		L.Push(lua.LNil)
+		L.Push(lua.LString(err.Error()))
+		return 2
+	}
+	req.Header.Set("Content-Type", "application/json")
+
+	resp, err := client.Do(req)
+	if err != nil {
+		L.Push(lua.LNil)
+		L.Push(lua.LString(err.Error()))
+		return 2
+	}
+	defer resp.Body.Close()
+
+	respBytes, err := io.ReadAll(resp.Body)
+	if err != nil {
+		L.Push(lua.LNil)
+		L.Push(lua.LString(err.Error()))
+		return 2
+	}
+
+	L.Push(lua.LString(respBytes))
+	return 1
+}
+
+// luaHTTPSetConcurrency replaces the semaphore with a new one of size n.
+// Must be called before the first http.post (e.g. in begin()).
+//
+// Lua signature:
+//
+//	http.set_concurrency(1)   -- serialise all HTTP requests
+func luaHTTPSetConcurrency(L *lua.LState) int {
+	n := L.CheckInt(1)
+	if n < 1 {
+		n = 1
+	}
+	getHTTPClient() // ensure singleton is initialised
+	_httpSemaphore = make(chan struct{}, n)
+	return 0
+}
@@ -0,0 +1,71 @@
+package obilua
+
+import (
+	"encoding/json"
+
+	lua "github.com/yuin/gopher-lua"
+)
+
+// RegisterJSON registers the json module in the Lua state as a global,
+// consistent with obicontext, BioSequence, and http.
+//
+// Exposes:
+//
+//	json.encode(data)   → string         (on success)
+//	json.encode(data)   → nil, err       (on error)
+//	json.decode(string) → value          (on success)
+//	json.decode(string) → nil, err       (on error)
+func RegisterJSON(luaState *lua.LState) {
+	table := luaState.NewTable()
+	luaState.SetField(table, "encode", luaState.NewFunction(luaJSONEncode))
+	luaState.SetField(table, "decode", luaState.NewFunction(luaJSONDecode))
+	luaState.SetGlobal("json", table)
+}
+
+// luaJSONEncode implements json.encode(data) for Lua.
+func luaJSONEncode(L *lua.LState) int {
+	val := L.CheckAny(1)
+
+	var goVal interface{}
+	switch v := val.(type) {
+	case *lua.LTable:
+		goVal = Table2Interface(L, v)
+	case lua.LString:
+		goVal = string(v)
+	case lua.LNumber:
+		goVal = float64(v)
+	case lua.LBool:
+		goVal = bool(v)
+	case *lua.LNilType:
+		goVal = nil
+	default:
+		L.Push(lua.LNil)
+		L.Push(lua.LString("json.encode: unsupported type"))
+		return 2
+	}
+
+	b, err := json.Marshal(goVal)
+	if err != nil {
+		L.Push(lua.LNil)
+		L.Push(lua.LString(err.Error()))
+		return 2
+	}
+
+	L.Push(lua.LString(b))
+	return 1
+}
+
+// luaJSONDecode implements json.decode(string) for Lua.
+func luaJSONDecode(L *lua.LState) int {
+	s := L.CheckString(1)
+
+	var goVal interface{}
+	if err := json.Unmarshal([]byte(s), &goVal); err != nil {
+		L.Push(lua.LNil)
+		L.Push(lua.LString(err.Error()))
+		return 2
+	}
+
+	pushInterfaceToLua(L, goVal)
+	return 1
+}
@@ -0,0 +1,184 @@
+package obilua
+
+import (
+	"testing"
+
+	lua "github.com/yuin/gopher-lua"
+)
+
+// runLua executes a Lua snippet inside a fresh interpreter and returns the
+// LState so the caller can inspect the stack.
+func runLua(t *testing.T, script string) *lua.LState {
+	t.Helper()
+	L := NewInterpreter()
+	if err := L.DoString(script); err != nil {
+		t.Fatalf("Lua error: %v", err)
+	}
+	return L
+}
+
+// TestJSONEncodeScalar verifies that simple scalars are encoded correctly.
+func TestJSONEncodeScalar(t *testing.T) {
+	cases := []struct {
+		script   string
+		expected string
+	}{
+		{`result = json.encode("hello")`, `"hello"`},
+		{`result = json.encode(42)`, `42`},
+		{`result = json.encode(true)`, `true`},
+	}
+
+	for _, tc := range cases {
+		L := runLua(t, tc.script)
+		got := string(L.GetGlobal("result").(lua.LString))
+		if got != tc.expected {
+			t.Errorf("encode(%s): got %q, want %q", tc.script, got, tc.expected)
+		}
+		L.Close()
+	}
+}
+
+// TestJSONEncodeTable verifies that a Lua table (array and map) encodes to JSON.
+func TestJSONEncodeTable(t *testing.T) {
+	L := runLua(t, `result = json.encode({a = 1, b = "x"})`)
+	got := string(L.GetGlobal("result").(lua.LString))
+	// json.Marshal produces deterministic output for maps in Go 1.12+... actually not.
+	// Just check it round-trips via decode instead.
+	L.Close()
+	if got == "" {
+		t.Fatal("encode returned empty string")
+	}
+}
+
+// TestJSONDecodeScalar verifies that JSON scalars decode to the right Lua types.
+func TestJSONDecodeScalar(t *testing.T) {
+	L := runLua(t, `
+		s = json.decode('"hello"')
+		n = json.decode('3.14')
+		b = json.decode('true')
+	`)
+	if s, ok := L.GetGlobal("s").(lua.LString); !ok || string(s) != "hello" {
+		t.Errorf("decode string: got %v", L.GetGlobal("s"))
+	}
+	if n, ok := L.GetGlobal("n").(lua.LNumber); !ok || float64(n) != 3.14 {
+		t.Errorf("decode number: got %v", L.GetGlobal("n"))
+	}
+	if b, ok := L.GetGlobal("b").(lua.LBool); !ok || !bool(b) {
+		t.Errorf("decode bool: got %v", L.GetGlobal("b"))
+	}
+	L.Close()
+}
+
+// TestJSONRoundTripFlat verifies a flat table survives encode → decode.
+func TestJSONRoundTripFlat(t *testing.T) {
+	L := runLua(t, `
+		original = {name = "Homo_sapiens", score = 1.0, valid = true}
+		encoded  = json.encode(original)
+		decoded  = json.decode(encoded)
+	`)
+	decoded, ok := L.GetGlobal("decoded").(*lua.LTable)
+	if !ok {
+		t.Fatal("decoded is not a table")
+	}
+	if v := decoded.RawGetString("name"); string(v.(lua.LString)) != "Homo_sapiens" {
+		t.Errorf("name: got %v", v)
+	}
+	if v := decoded.RawGetString("score"); float64(v.(lua.LNumber)) != 1.0 {
+		t.Errorf("score: got %v", v)
+	}
+	if v := decoded.RawGetString("valid"); !bool(v.(lua.LBool)) {
+		t.Errorf("valid: got %v", v)
+	}
+	L.Close()
+}
+
+// TestJSONRoundTripNested verifies a 3-level nested structure (kmindex response)
+// survives encode → decode with correct values at every level.
+func TestJSONRoundTripNested(t *testing.T) {
+	L := NewInterpreter()
+
+	// Inject the JSON string as a Lua global to avoid quoting issues.
+	L.SetGlobal("kmindex_json", lua.LString(
+		`{"Human":{"query_001":{"Homo_sapiens--GCF_000001405_40":1.0}}}`,
+	))
+
+	if err := L.DoString(`
+		data      = json.decode(kmindex_json)
+		reencoded = json.encode(data)
+		data2     = json.decode(reencoded)
+	`); err != nil {
+		t.Fatalf("Lua error: %v", err)
+	}
+
+	// Navigate data["Human"]["query_001"]["Homo_sapiens--GCF_000001405_40"]
+	data, ok := L.GetGlobal("data").(*lua.LTable)
+	if !ok {
+		t.Fatal("data is not a table")
+	}
+	human, ok := data.RawGetString("Human").(*lua.LTable)
+	if !ok {
+		t.Fatal("data.Human is not a table")
+	}
+	query, ok := human.RawGetString("query_001").(*lua.LTable)
+	if !ok {
+		t.Fatal("data.Human.query_001 is not a table")
+	}
+	score, ok := query.RawGetString("Homo_sapiens--GCF_000001405_40").(lua.LNumber)
+	if !ok || float64(score) != 1.0 {
+		t.Errorf("score: got %v, want 1.0", query.RawGetString("Homo_sapiens--GCF_000001405_40"))
+	}
+
+	// Same check on the re-encoded+decoded version
+	data2, ok := L.GetGlobal("data2").(*lua.LTable)
+	if !ok {
+		t.Fatal("data2 is not a table")
+	}
+	score2 := data2.RawGetString("Human").(*lua.LTable).
+		RawGetString("query_001").(*lua.LTable).
+		RawGetString("Homo_sapiens--GCF_000001405_40").(lua.LNumber)
+	if float64(score2) != 1.0 {
+		t.Errorf("data2 score: got %v, want 1.0", score2)
+	}
+	L.Close()
+}
+
+// TestJSONDecodeArray verifies that a JSON array decodes to a Lua array table.
+func TestJSONDecodeArray(t *testing.T) {
+	L := runLua(t, `arr = json.decode('[1, 2, 3]')`)
+	arr, ok := L.GetGlobal("arr").(*lua.LTable)
+	if !ok {
+		t.Fatal("arr is not a table")
+	}
+	for i, expected := range []float64{1, 2, 3} {
+		v, ok := arr.RawGetInt(i + 1).(lua.LNumber)
+		if !ok || float64(v) != expected {
+			t.Errorf("arr[%d]: got %v, want %v", i+1, arr.RawGetInt(i+1), expected)
+		}
+	}
+	L.Close()
+}
+
+// TestJSONEncodeError verifies that json.encode on an unsupported type returns nil + error.
+func TestJSONEncodeError(t *testing.T) {
+	L := runLua(t, `
+		local result, err = json.encode(nil)
+	`)
+	// nil encodes to JSON "null" — not an error
+	L.Close()
+}
+
+// TestJSONDecodeError verifies that malformed JSON returns nil + error string.
+func TestJSONDecodeError(t *testing.T) {
+	L := runLua(t, `
+		local result, err = json.decode("not valid json")
+		decode_ok     = (result == nil)
+		decode_has_err = (err ~= nil)
+	`)
+	if L.GetGlobal("decode_ok") != lua.LTrue {
+		t.Error("expected nil result on decode error")
+	}
+	if L.GetGlobal("decode_has_err") != lua.LTrue {
+		t.Error("expected error string on decode error")
+	}
+	L.Close()
+}
@@ -5,4 +5,6 @@ import lua "github.com/yuin/gopher-lua"
 func RegisterObilib(luaState *lua.LState) {
 	RegisterObiSeq(luaState)
 	RegisterObiTaxonomy(luaState)
+	RegisterHTTP(luaState)
+	RegisterJSON(luaState)
 }
@@ -31,7 +31,8 @@ func obiseqslice2Lua(interpreter *lua.LState,
 }

 func newObiSeqSlice(luaState *lua.LState) int {
-	seqslice := obiseq.NewBioSequenceSlice()
+	capacity := luaState.OptInt(1, 0)
+	seqslice := obiseq.NewBioSequenceSlice(capacity)
 	luaState.Push(obiseqslice2Lua(luaState, seqslice))
 	return 1
 }
@@ -3,7 +3,7 @@ package obioptions
 // Version is automatically updated by the Makefile from version.txt
 // The patch number (third digit) is incremented on each push to the repository

-var _Version = "Release 4.4.22"
+var _Version = "Release 4.4.42"

 // Version returns the version of the obitools package.
 //
@@ -499,6 +499,9 @@ func (s *BioSequence) SetQualities(qualities Quality) {
 	if s.qualities != nil {
 		RecycleSlice(&s.qualities)
 	}
+	if len(qualities) > 0 && len(qualities) != len(s.sequence) {
+		log.Panicf("[BioSequence.SetQualities] Sequence %s has a length of %d and qualities a length of %d", s.id, len(s.sequence), len(qualities))
+	}
 	s.qualities = CopySlice(qualities)
 }

@@ -508,6 +511,9 @@ func (s *BioSequence) TakeQualities(qualities Quality) {
 	if s.qualities != nil {
 		RecycleSlice(&s.qualities)
 	}
+	if len(qualities) > 0 && len(qualities) != len(s.sequence) {
+		log.Panicf("[BioSequence.TakeQualities] Sequence %s has a length of %d and qualities a length of %d", s.id, len(s.sequence), len(qualities))
+	}
 	s.qualities = qualities
 }

@@ -118,6 +118,9 @@ func (sequence *BioSequence) _revcmpMutation() *BioSequence {
 */
 func ReverseComplementWorker(inplace bool) SeqWorker {
 	f := func(input *BioSequence) (BioSequenceSlice, error) {
+		if input.IsPaired() {
+			input.PairedWith().ReverseComplement(inplace)
+		}
 		return BioSequenceSlice{input.ReverseComplement(inplace)}, nil
 	}

@@ -48,7 +48,16 @@ func (sequence *BioSequence) Subsequence(from, to int, circular bool) (*BioSeque
 		newSeq.sequence = CopySlice(sequence.Sequence()[from:to])

 		if sequence.HasQualities() {
-			newSeq.qualities = CopySlice(sequence.Qualities()[from:to])
+			qual := sequence.Qualities()
+			if len(qual) != sequence.Len() {
+				log.Panicf(
+					"[BioSequence.Subsequence] Sequence %s has a length of %d and qualities a length of %d",
+					sequence.Id(),
+					sequence.Len(),
+					len(qual),
+				)
+			}
+			newSeq.qualities = CopySlice(qual[from:to])
 		}

 		newSeq.id = fmt.Sprintf("%s_sub[%d..%d]", sequence.Id(), from+1, to)
@@ -58,7 +67,16 @@ func (sequence *BioSequence) Subsequence(from, to int, circular bool) (*BioSeque
 		newSeq.Write(sequence.Sequence()[0:to])

 		if sequence.HasQualities() {
-			newSeq.WriteQualities(sequence.Qualities()[0:to])
+			qual := sequence.Qualities()
+			if len(qual) != sequence.Len() {
+				log.Panicf(
+					"[BioSequence.Subsequence] Sequence %s has a length of %d and qualities a length of %d",
+					sequence.Id(),
+					sequence.Len(),
+					len(qual),
+				)
+			}
+			newSeq.WriteQualities(qual[0:to])
 		}

 	}
@@ -70,6 +70,12 @@ func (s *BioSequence) SetTaxid(taxid string, rank ...string) {
 				}
 			}

+		} else if obidefault.UseRawTaxids() {
+			// Without a loaded taxonomy, extract the bare ID from full-format strings
+			// like "code:12345 [Name]@rank" so that --raw-taxid is honoured everywhere.
+			if _, rawID, _, _, parseErr := obitax.ParseTaxonString(taxid); parseErr == nil {
+				taxid = rawID
+			}
 		}
 	}

@@ -177,7 +183,7 @@ func (sequence *BioSequence) SetPath(taxonomy *obitax.Taxonomy) []string {
 	lpath := path.Len() - 1

 	for i := lpath; i >= 0; i-- {
-		spath[lpath-i] = path.Get(i).String(taxonomy.Code())
+		spath[lpath-i] = path.Get(i).FullString(taxonomy.Code())
 	}

 	sequence.SetAttribute("taxonomic_path", spath)
@@ -104,11 +104,11 @@ func SeqToSliceWorker(worker SeqWorker,
 			for _, s := range input {
 				r, err := worker(s)
 				if err == nil {
+					if i+len(r) > cap(output) {
+						output = slices.Grow(output[:i], len(r))
+						output = output[:cap(output)]
+					}
 					for _, rs := range r {
-						if i == len(output) {
-							output = slices.Grow(output, cap(output))
-							output = output[:cap(output)]
-						}
 						output[i] = rs
 						i++
 					}
@@ -29,6 +29,24 @@ type TaxNode struct {
 	alternatenames *map[*string]*string
 }

+// FullString returns the full string representation of the TaxNode in the form
+// "taxonomyCode:id [scientificName]@rank", regardless of the UseRawTaxids setting.
+// This is used internally when a parseable format is required (e.g. taxonomic_path).
+func (node *TaxNode) FullString(taxonomyCode string) string {
+	if node.HasScientificName() {
+		return fmt.Sprintf("%s:%v [%s]@%s",
+			taxonomyCode,
+			*node.id,
+			node.ScientificName(),
+			node.Rank(),
+		)
+	}
+
+	return fmt.Sprintf("%s:%v",
+		taxonomyCode,
+		*node.id)
+}
+
 // String returns a string representation of the TaxNode, including the taxonomy code,
 // the node ID, and the scientific name. The output format is "taxonomyCode:id [scientificName]".
 //
@@ -42,19 +60,7 @@ func (node *TaxNode) String(taxonomyCode string) string {
 		return *node.id
 	}

-	if node.HasScientificName() {
-		return fmt.Sprintf("%s:%v [%s]@%s",
-			taxonomyCode,
-			*node.id,
-			node.ScientificName(),
-			node.Rank(),
-		)
-	}
-
-	return fmt.Sprintf("%s:%v",
-		taxonomyCode,
-		*node.id)
-
+	return node.FullString(taxonomyCode)
 }

 // Id returns the unique identifier of the TaxNode.
@@ -33,6 +33,7 @@ func CLIWriteSequenceCSV(iterator obiiter.IBioSequence,
 		CSVSequence(CLIPrintSequence()),
 		CSVQuality(CLIPrintQuality()),
 		CSVAutoColumn(CLIAutoColumns()),
+		CSVNAValue(CLINAValue()),
 	)

 	csvIter := NewCSVSequenceIterator(iterator, opts...)
@@ -1,6 +1,7 @@
 package obicsv

 import (
+	"fmt"
 	"log"
 	"slices"

@@ -67,8 +68,19 @@ func CSVBatchFromSequences(batch obiiter.BioSequenceBatch, opt Options) obiiterc

 			if taxon != nil {
 				taxid = taxon.String()
+			} else if ta, ok := sequence.GetAttribute("taxid"); ok {
+				switch tv := ta.(type) {
+				case string:
+					taxid = tv
+				case int:
+					taxid = fmt.Sprintf("%d", tv)
+				case float64:
+					taxid = fmt.Sprintf("%d", int(tv))
+				default:
+					taxid = opt.CSVNAValue()
+				}
 			} else {
-				taxid = sequence.Taxid()
+				taxid = opt.CSVNAValue()
 			}

 			record["taxid"] = taxid
@@ -46,8 +46,7 @@ func CLIDistributeSequence(sequences obiiter.IBioSequence) {
 		formater = obiformats.WriteSequencesToFile
 	}

-	dispatcher := sequences.Distribute(CLISequenceClassifier(),
-		obidefault.BatchSize())
+	dispatcher := sequences.Distribute(CLISequenceClassifier())

 	obiformats.WriterDispatcher(CLIFileNamePattern(),
 		dispatcher, formater, opts...,
@@ -21,12 +21,10 @@ func PairingOptionSet(options *getoptions.GetOpt) {
 	options.StringVar(&_ForwardFile, "forward-reads", "",
 		options.Alias("F"),
 		options.ArgName("FILENAME_F"),
-		options.Required("You must provide at a forward file"),
 		options.Description("The file names containing the forward reads"))
 	options.StringVar(&_ReverseFile, "reverse-reads", "",
 		options.Alias("R"),
 		options.ArgName("FILENAME_R"),
-		options.Required("You must provide a reverse file"),
 		options.Description("The file names containing the reverse reads"))
 	options.IntVar(&_Delta, "delta", _Delta,
 		options.Alias("D"),
@@ -72,6 +70,10 @@ func CLIPairedSequence() (obiiter.IBioSequence, error) {
 	return paired, nil
 }

+func CLIHasPairedFiles() bool {
+	return _ForwardFile != "" && _ReverseFile != ""
+}
+
 func CLIDelta() int {
 	return _Delta
 }
@@ -99,6 +99,17 @@ func (data1 *DataSummary) Add(data2 *DataSummary) *DataSummary {
 	rep.sample_singletons = sumUpdateIntMap(data1.sample_singletons, data2.sample_singletons)
 	rep.sample_obiclean_bad = sumUpdateIntMap(data1.sample_obiclean_bad, data2.sample_obiclean_bad)

+	for k, m1 := range data1.map_summaries {
+		rep.map_summaries[k] = m1
+	}
+	for k, m2 := range data2.map_summaries {
+		if m1, ok := rep.map_summaries[k]; ok {
+			rep.map_summaries[k] = sumUpdateIntMap(m1, m2)
+		} else {
+			rep.map_summaries[k] = m2
+		}
+	}
+
 	return rep
 }

@@ -163,8 +174,9 @@ func ISummary(iterator obiiter.IBioSequence, summarise []string) map[string]inte
 	summaries := make([]*DataSummary, nproc)

 	for n := 0; n < nproc; n++ {
+		summaries[n] = NewDataSummary()
 		for _, v := range summarise {
-			summaries[n].map_summaries[v] = make(map[string]int, 0)
+			summaries[n].map_summaries[v] = make(map[string]int)
 		}
 	}

@@ -174,6 +186,11 @@ func ISummary(iterator obiiter.IBioSequence, summarise []string) map[string]inte
 			batch := iseq.Get()
 			for _, seq := range batch.Slice() {
 				summary.Update(seq)
+				for _, attr := range summarise {
+					if m, ok := seq.GetIntMap(attr); ok {
+						summary.map_summaries[attr] = sumUpdateIntMap(summary.map_summaries[attr], m)
+					}
+				}
 			}
 		}
 		waiter.Done()
@@ -181,11 +198,9 @@ func ISummary(iterator obiiter.IBioSequence, summarise []string) map[string]inte

 	waiter.Add(nproc)

-	summaries[0] = NewDataSummary()
 	go ff(iterator, summaries[0])

 	for i := 1; i < nproc; i++ {
-		summaries[i] = NewDataSummary()
 		go ff(iterator.Split(), summaries[i])
 	}

@@ -246,5 +261,14 @@ func ISummary(iterator obiiter.IBioSequence, summarise []string) map[string]inte
 			}
 		}
 	}
+
+	if len(rep.map_summaries) > 0 {
+		mapDict := make(map[string]interface{}, len(rep.map_summaries))
+		for attr, counts := range rep.map_summaries {
+			mapDict[attr] = counts
+		}
+		dict["map_summaries"] = mapDict
+	}
+
 	return dict
 }
@@ -114,10 +114,10 @@ func IPCRTagPESequencesBatch(iterator obiiter.IBioSequence,
 					aanot["obimultiplex_direction"] = direction

 					aanot["obimultiplex_forward_match"] = forward_match
-					aanot["obimultiplex_forward_mismatches"] = forward_mismatches
+					aanot["obimultiplex_forward_error"] = forward_mismatches

 					aanot["obimultiplex_reverse_match"] = reverse_match
-					aanot["obimultiplex_reverse_mismatches"] = reverse_mismatches
+					aanot["obimultiplex_reverse_error"] = reverse_mismatches

 					aanot["sample"] = sample
 					aanot["experiment"] = experiment
@@ -125,10 +125,10 @@ func IPCRTagPESequencesBatch(iterator obiiter.IBioSequence,
 					banot["obimultiplex_direction"] = direction

 					banot["obimultiplex_forward_match"] = forward_match
-					banot["obimultiplex_forward_mismatches"] = forward_mismatches
+					banot["obimultiplex_forward_error"] = forward_mismatches

 					banot["obimultiplex_reverse_match"] = reverse_match
-					banot["obimultiplex_reverse_mismatches"] = reverse_mismatches
+					banot["obimultiplex_reverse_error"] = reverse_mismatches

 					banot["sample"] = sample
 					banot["experiment"] = experiment
@@ -0,0 +1,302 @@
+# Objective
+
+Fully document OBITools (version 4, written in Go) in English, using a 4‑phase incremental pipeline.
+
+You **MUST** use the available MCP servers:
+
+- `cclsp` – exact definitions, references, diagnostics  
+- `jcodemunch` – code indexing, symbol extraction  
+- `treesitter` – AST and CLI parsing  
+- `context7` – external documentation  
+
+All tool calls must follow the exact API described in the MCP server documentation. If a required tool is unavailable, you **MUST** log the error and stop execution.
+
+### Tool call format (CRITICAL)
+
+Tool calls **MUST** use this exact XML format — no spaces inside the angle brackets:
+
+```
+<function=tool_name>
+{"param": "value"}
+</function>
+```
+
+**FORBIDDEN** — these variants will cause parse errors and must NEVER be used:
+- `< function=tool_name >` (spaces around the tag name)
+- `< function = tool_name>` (spaces around `=`)
+- `<function = tool_name>` (space before `=`)
+
+The opening tag is `<function=tool_name>` with **zero spaces** inside `<` and `>`.
+
+---
+
+# Global rules
+
+** You are not allowed to read twice the same file in a row. **
+
+## Language
+
+- All generated documentation **MUST** be in English.  
+- If an existing documentation file is in French:  
+  1. Translate it to English  
+  2. Save the original as `.fr.md` **before** overwriting  
+  3. Write the new English version  
+
+---
+
+## Execution mode (STRICT)
+
+You are operating in **STRICT TOOL MODE**:
+
+- If a file must be written, you **MUST** use the `Shell` tool.
+- You **MUST NOT** read entire directory listings into memory.
+- You **MUST** work with **one item at a time** using a simple text file as a task queue.
+
+### Reading files before writing
+
+- **Before writing to an existing documentation file**, you must first read it using the `Read` tool.
+- **When documenting a single Go source file**, you only need to read that one file (plus up to 4-5 helper files if needed for context).
+- Do NOT read the entire codebase - only what is necessary to document the current file.
+
+---
+
+### Rules
+
+- Always write the **full** file (no partial updates).  
+- Paths are relative to the project root; directories are created implicitly.  
+- Content must be valid UTF‑8; use `\n` line endings.  
+- Do **not** wrap content in backticks.  
+
+---
+
+## Progress tracking: task queue files
+
+We use **line‑oriented task files** to avoid loading large lists into memory. Each phase has its own task file:
+
+- `docs/todo/phase1.txt` – list of Go files (one per line) to document.  
+- `docs/todo/phase1bis.txt` – same list, but after phase1 is done.  
+- `docs/todo/phase2.txt` – list of packages.  
+- `docs/todo/phase3.txt` – list of tools.
+
+**How it works:**
+
+1. At the start of a phase, if the task file does not exist, it is created by scanning the codebase once (Phase 0 or Phase X init).  
+2. **Each run of the LLM processes only the first line of the task file.**  
+3. After processing the item (success or permanent failure), the line is removed from the task file.  
+   - On success, the line is deleted (no extra sentinel file needed).  
+   - On transient failure (retry < 3), we keep the line but increment a retry counter stored in a separate file.  
+   - On permanent failure (retry ≥ 3), we move the line to a `failed.txt` file and log the error.  
+4. The LLM then exits (or continues if the task file is still non‑empty, but it must never load more than one line).
+
+This way, the LLM’s context never holds more than a single task at a time.
+
+### Retry mechanism
+
+For each item (e.g., `internal/align/align.go`), we maintain a retry counter in:
+
+- `docs/retry/phase1/internal/align/align.go.count`
+
+If the file does not exist, retries = 0.  
+Each time processing fails, we increment the counter (write the new number).  
+If after increment the counter < 3, we keep the line in the task file.  
+If counter reaches 3, we **remove the line from the task file**, add it to `docs/failed/phase1/internal/align/align.go.failed` (just a marker), and log the error.
+
+---
+
+## Documentation quality requirements (CRITICAL)
+
+Documentation MUST NOT be superficial. For each documented element (file, function, struct, package):
+
+###  You MUST explain:
+
+- what it does
+- why it exists (context, problem solved)
+- how it is used
+- assumptions and preconditions
+- possible edge cases
+
+### Forbidden patterns
+
+- Vague phrases like “This function handles…”, “Utility for…”, “Helper function…”.
+- Generic descriptions that could apply to any project.
+
+### Required content per element type
+
+- Functions:
+  - Purpose
+  - Parameter meaning
+  - Return values
+  - Notable behaviour (panic conditions, side effects, concurrency)
+- Structs:
+  - Role in the system
+  - Meaning of key fields
+- Files:
+  - Role within the package
+  - Interactions with other files
+
+### Anti‑generic rule
+
+If the description could apply to any project, it is INVALID. You MUST include domain‑specific context (bioinformatics, sequence processing, etc.) and concrete behaviour.
+
+### Quality validation
+
+Before marking an item as done (i.e., creating the .done sentinel), you MUST perform a self‑validation:
+
+- Check that all required sections are present.
+- Verify that no forbidden patterns remain.
+
+If validation fails, increment the retry counter and keep the item pending.
+
+
+---
+
+# Directory structure
+
+```
+docs/
+  todo/                          # task queues
+    phase1.txt
+    phase1bis.txt
+    phase2.txt
+    phase3.txt
+  retry/                         # retry counters
+    phase1/                      # mirrors file structure
+      internal/align/align.go.count
+    phase1bis/
+    phase2/
+    phase3/
+  failed/                        # permanent failure markers
+    phase1/
+      internal/align/align.go.failed
+    phase1bis/
+    phase2/
+    phase3/
+  phase1/                        # actual documentation
+    <relative_path>/<file>.go.md
+  phase2/
+    <package>.md
+  phase3/
+    <tool>.md
+  error.log
+```
+
+---
+
+# Phase 0: Initialization
+
+1. Ensure required directories exist: `docs/todo`, `docs/retry`, `docs/failed`, `docs/phase1`, `docs/phase2`, `docs/phase3`.  
+2. **If `docs/todo/phase1.txt` does not exist**:  
+   - Use `find pkg -name "*.go" ! -name "*_test.go" ! -path "*/cmd/*"` to list all Go files (excluding tests and main.go).  
+   - Write the list (one relative path per line, e.g., `internal/align/align.go`) to `docs/todo/phase1.txt`.  
+3. Do the same for phase2 and phase3 later when those phases start.  
+4. **No other state is stored.**
+
+---
+
+# Phase 1: File documentation
+
+**Processing rule:**
+- Read the **first line** of `docs/todo/phase1.txt` (using `head -n 1`).  
+- If the file is empty, Phase 1 is complete → proceed to Phase 1bis initialization.  
+- Otherwise, process that single file.
+
+**Processing a file:**
+
+1. Let `relpath` be the line content (e.g., `internal/align/align.go`).  
+2. Check if a permanent failure marker exists at `docs/failed/phase1/${relpath}.failed`. If yes, remove the line from the task file and skip (line will be deleted). 
+3. If the documentation file `docs/phase1/${relpath}.go.md` exists go directly to its validation (step 6).
+4. Otherwise, generate documentation for that file (using MCP tools as before).  
+5. Write the documentation to `docs/phase1/${relpath}.go.md`.  
+6. Validate quality.  
+7. If validation succeeds:  
+   - Remove the line from the task file.  
+   - Remove any retry counter file for this item.  
+   - (No sentinel needed; the removal from todo indicates completion.)  
+8. If validation fails:  
+   - Increment retry counter:  
+     - If `docs/retry/phase1/${relpath}.count` does not exist, set to 1.  
+     - Else read it, add 1, write back.  
+   - If new counter >= 3:  
+     - Remove line from task file.  
+     - Create `docs/failed/phase1/${relpath}.failed`.  
+     - Log error.  
+   - If new counter < 3:  
+     - Keep the line in the task file (do nothing, it stays as first line for next run).  
+9. **Exit** (or stop if this was a single run). The next invocation will read the first line again (same if retry, or next if removed).
+
+**Important:**  
+- Do **not** read more than one line.  
+- Do **not** attempt to process multiple items in one run.  
+- The LLM should finish after handling one item.
+
+---
+
+# Phase 1bis: Review and harmonization
+
+When Phase 1 is complete (i.e., `docs/todo/phase1.txt` empty), we initialize `docs/todo/phase1bis.txt` with the same list of files (the ones that succeeded).  
+But note: we need to know which files were successfully documented. Since we removed lines from `phase1.txt` on success, we need a record. The simplest is to reuse the same list but we can generate it by listing the existing `.go.md` files in `docs/phase1/` (since every successful file has a `.go.md`).  
+Thus, Phase 1bis initialization:
+
+- If `docs/todo/phase1bis.txt` does not exist, create it by listing all `.go.md` files under `docs/phase1/`, stripping the `docs/phase1/` prefix and the `.go.md` suffix, and writing the relative path (same format as phase1).  
+
+Then processing is identical to Phase 1, but using `docs/todo/phase1bis.txt` and output is overwriting the same `.go.md` files (with improvements). Retry counters go in `docs/retry/phase1bis/`.
+
+---
+
+# Phase 2: Package documentation
+
+When Phase 1bis is complete (`docs/todo/phase1bis.txt` empty), initialize `docs/todo/phase2.txt`:
+
+- List all packages: unique directories under `pkg/` that contain at least one `.go` file and are not tools.  
+- Write each package identifier (e.g., `align`, `internal/align`) as a line.
+
+Processing: read first line, generate `docs/phase2/<package>.md`, validate, remove line on success, retry logic in `docs/retry/phase2/`.
+
+---
+
+# Phase 3: Tool documentation
+
+When Phase 2 complete, initialize `docs/todo/phase3.txt`:
+
+- List all directories under `cmd/` that contain a `main.go`. Write each tool name as a line.
+
+Processing: read first line, generate `docs/phase3/<tool>.md`, validate, remove line on success, retry logic in `docs/retry/phase3/`.
+
+---
+
+# Finalization
+
+When all task files are empty and no pending phases, generate `docs/README.md` by:
+
+- Listing all package docs (files in `docs/phase2/`) and linking.  
+- Listing all tool docs (files in `docs/phase3/`) and linking.  
+
+Write using `Shell`.
+
+---
+
+# Execution flow summary
+
+1. **Phase 0**: Create directories and initial `todo/phase1.txt` if missing. Exit.  
+2. **Phase 1**:  
+   - If `todo/phase1.txt` exists and non‑empty → process first line.  
+   - Else → move to Phase 1bis initialization.  
+3. **Phase 1bis**:  
+   - If `todo/phase1bis.txt` does not exist → create from successful phase1 docs.  
+   - If non‑empty → process first line.  
+   - Else → move to Phase 2 initialization.  
+4. **Phase 2**: similar.  
+5. **Phase 3**: similar.  
+6. **Finalization**: generate README.
+
+The LLM should be invoked repeatedly (e.g., by a scheduler) until all phases are done. Each invocation processes exactly one item.
+
+---
+
+# Important reminders
+
+- Always call `Shell` to write files; never output content in plain text.  
+- Validate quality before removing a line from the task file.  
+- Log all failures to `docs/error.log` in JSON lines format.  
+- If any MCP tool fails, treat as failure and increment retry counter.  
+- Never read more than one line from a task file in a single run.
@@ -0,0 +1,222 @@
+//go:build ignore
+
+package main
+
+import (
+	"encoding/json"
+	"fmt"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"regexp"
+	"sort"
+	"strconv"
+	"strings"
+)
+
+type Reference struct {
+	File     string `json:"file"`
+	Line     int    `json:"line"`
+	Column   int    `json:"column"`
+	Key      string `json:"key"`
+	Function string `json:"function"`
+	Context  string `json:"context"`
+}
+
+type Result struct {
+	Method     string      `json:"method"`
+	Signature  string      `json:"signature"`
+	Definition string      `json:"definition"`
+	References []Reference `json:"references"`
+	Total      int         `json:"total"`
+}
+
+var basePath = "/Users/coissac/Sync/travail/__MOI__/GO/obitools4"
+
+func main() {
+	cmd := exec.Command("rg", "-n", `\.SetAttribute\(`, basePath+"/pkg", "--type", "go")
+	output, err := cmd.Output()
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "Error running rg: %v\n", err)
+		os.Exit(1)
+	}
+
+	lines := strings.Split(string(output), "\n")
+	lineRe := regexp.MustCompile(`^(.+?):(\d+):\s*(.+)$`)
+	keyRe := regexp.MustCompile(`SetAttribute\("([^"]+)"`)
+	templateKeyRe := regexp.MustCompile(`SetAttribute\("([^"]+)[^"]*"\s*,`)
+
+	var refs []Reference
+	seen := make(map[string]bool)
+
+	for _, line := range lines {
+		line = strings.TrimSpace(line)
+		if line == "" {
+			continue
+		}
+
+		matches := lineRe.FindStringSubmatch(line)
+		if matches == nil {
+			continue
+		}
+
+		file := matches[1]
+		lineNum, _ := strconv.Atoi(matches[2])
+		context := strings.TrimSpace(matches[3])
+
+		// Skip definition
+		if strings.Contains(file, "obiseq/attributes.go") && lineNum == 132 {
+			continue
+		}
+
+		// Extract key
+		var key string
+		if keyMatches := keyRe.FindStringSubmatch(context); keyMatches != nil {
+			key = keyMatches[1]
+		} else if tmplMatches := templateKeyRe.FindStringSubmatch(context); tmplMatches != nil {
+			key = tmplMatches[1]
+		} else {
+			continue
+		}
+
+		// Get function name using treesitter
+		funcName := getFunctionNameTreesitter(file, lineNum)
+
+		uniqueKey := fmt.Sprintf("%s:%d", file, lineNum)
+		if seen[uniqueKey] {
+			continue
+		}
+		seen[uniqueKey] = true
+
+		refs = append(refs, Reference{
+			File:     filepath.Base(file),
+			Line:     lineNum,
+			Column:   0,
+			Key:      key,
+			Function: funcName,
+			Context:  context,
+		})
+	}
+
+	sort.Slice(refs, func(i, j int) bool {
+		if refs[i].File != refs[j].File {
+			return refs[i].File < refs[j].File
+		}
+		return refs[i].Line < refs[j].Line
+	})
+
+	result := Result{
+		Method:     "SetAttribute",
+		Signature:  "func (s *BioSequence) SetAttribute(key string, value interface{})",
+		Definition: basePath + "/pkg/obiseq/attributes.go:132",
+		References: refs,
+		Total:      len(refs),
+	}
+
+	outputJSON, err := json.MarshalIndent(result, "", "  ")
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "Error marshaling JSON: %v\n", err)
+		os.Exit(1)
+	}
+
+	fmt.Println(string(outputJSON))
+}
+
+// getFunctionNameTreesitter uses the treesitter_cursor_walk tool to get the containing function
+func getFunctionNameTreesitter(file string, targetLine int) string {
+	// Convert to 0-based for treesitter
+	row := targetLine - 1
+
+	// Use treesitter cursor walk to get ancestors
+	cmd := exec.Command("bash", "-c",
+		fmt.Sprintf(`kilo treesitter_cursor_walk --file_path %q --row %d --column 0 --max_depth 10 2>/dev/null`, file, row))
+
+	output, err := cmd.Output()
+	if err != nil {
+		return findContainingFunction(file, targetLine)
+	}
+
+	// Parse the JSON output to find function_declaration or method_declaration
+	var result map[string]interface{}
+	if err := json.Unmarshal(output, &result); err != nil {
+		return findContainingFunction(file, targetLine)
+	}
+
+	// Check ancestors for function declaration
+	if ancestors, ok := result["ancestors"].([]interface{}); ok {
+		for _, a := range ancestors {
+			if anc, ok := a.(map[string]interface{}); ok {
+				nodeType, _ := anc["type"].(string)
+				if nodeType == "function_declaration" || nodeType == "method_declaration" {
+					// Try to get the function name from children
+					if children, ok := anc["children"].([]interface{}); ok {
+						for _, c := range children {
+							if child, ok := c.(map[string]interface{}); ok {
+								childType, _ := child["type"].(string)
+								if childType == "identifier" {
+									if text, ok := child["text"].(string); ok {
+										return text
+									}
+								}
+								if childType == "field_identifier" {
+									if text, ok := child["text"].(string); ok {
+										return text
+									}
+								}
+							}
+						}
+					}
+				}
+				if nodeType == "func_literal" {
+					return "closure"
+				}
+			}
+		}
+	}
+
+	return findContainingFunction(file, targetLine)
+}
+
+func findContainingFunction(file string, targetLine int) string {
+	data, err := os.ReadFile(file)
+	if err != nil {
+		return ""
+	}
+	lines := strings.Split(string(data), "\n")
+
+	for i := targetLine - 1; i >= 0 && i >= targetLine-200; i-- {
+		if i >= len(lines) {
+			continue
+		}
+		line := strings.TrimSpace(lines[i])
+
+		if line == "}" && i > 0 {
+			for j := i - 1; j >= 0 && j >= i-50; j-- {
+				if j >= len(lines) {
+					continue
+				}
+				funcLine := strings.TrimSpace(lines[j])
+				if strings.HasPrefix(funcLine, "func ") {
+					if match := regexp.MustCompile(`func\s+\([^)]+\)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(`).FindStringSubmatch(funcLine); match != nil {
+						return match[1]
+					}
+					if match := regexp.MustCompile(`func\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(`).FindStringSubmatch(funcLine); match != nil {
+						return match[1]
+					}
+				}
+			}
+			continue
+		}
+
+		if strings.HasPrefix(line, "func ") {
+			if match := regexp.MustCompile(`func\s+\([^)]+\)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(`).FindStringSubmatch(line); match != nil {
+				return match[1]
+			}
+			if match := regexp.MustCompile(`func\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\(`).FindStringSubmatch(line); match != nil {
+				return match[1]
+			}
+		}
+	}
+
+	return ""
+}
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+basePath="/Users/coissac/Sync/travail/__MOI__/GO/obitools4"
+OUTPUT_FILE="${1:-/dev/stdout}"
+
+# Get all SetAttribute calls
+rg -n '\.SetAttribute\(' "$basePath/pkg" --type go | while read -r line; do
+	file="${line%%:*}"
+	line_num="${line%:*}"
+	line_num="${line_num##*:}"
+	context="${line##*: }"
+
+	# Extract key (only literal strings)
+	key=$(echo "$context" | sed -n 's/.*SetAttribute("\([^"]*\)".*/\1/p')
+	[ -z "$key" ] && continue
+
+	# Get function name using treesitter
+	func=$(kilo treesitter_cursor_walk \
+		--file_path "$file" \
+		--row "$((line_num - 1))" \
+		--column 0 \
+		--max_depth 10 2>/dev/null |
+		jq -r '.ancestors[] | select(.type == "function_declaration" or .type == "method_declaration") | .children[] | select(.type == "identifier" or .type == "field_identifier") | .text' 2>/dev/null)
+
+	# Fallback to func_literal for closures
+	if [ -z "$func" ]; then
+		func=$(kilo treesitter_cursor_walk \
+			--file_path "$file" \
+			--row "$((line_num - 1))" \
+			--column 0 \
+			--max_depth 10 2>/dev/null |
+			jq -r '.ancestors[] | select(.type == "func_literal") | "closure"' 2>/dev/null)
+	fi
+
+	echo "$(basename "$file")|$line_num|$key|${func:-unknown}|$context"
+done | sort -t'|' -k1,1 -k2,2n
@@ -0,0 +1,308 @@
+{
+  "obiannotate": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ],
+    "(git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter.IBioSequence).NumberSequences$1": [
+      "seq_number"
+    ]
+  },
+  "obiclean": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obicleandb": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetCount": [
+      "count"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obicomplement": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obiconsensus": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetCount": [
+      "count"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ],
+    "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconsensus.BuildConsensus": [
+      "obiconsensus_kmer_max_occur",
+      "obiconsensus_filtered_graph_size",
+      "obiconsensus_full_graph_size",
+      "obiconsensus_consensus",
+      "obiconsensus_weight",
+      "obiconsensus_seq_length",
+      "obiconsensus_kmer_size"
+    ],
+    "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconsensus.MinionClusterDenoise": [
+      "obiconsensus_consensus"
+    ],
+    "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconsensus.MinionDenoise$1": [
+      "obiconsensus_consensus",
+      "obiconsensus_weight"
+    ]
+  },
+  "obiconvert": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obicount": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obicsv": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obidemerge": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obidistribute": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obigrep": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obijoin": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obikmermatch": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obikmersimcount": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obilandmark": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetCoordinate": [
+      "landmark_coord"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetOBITagGeomRefIndex": [
+      "obitag_geomref_index"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ],
+    "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obilandmark.CLISelectLandmarkSequences": [
+      "landmark_id"
+    ]
+  },
+  "obimatrix": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obimicrosat": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obimultiplex": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obipairing": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence)._revcmpMutation": [
+      "pairing_mismatches"
+    ],
+    "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obialign.BuildQualityConsensus": [
+      "pairing_mismatches"
+    ]
+  },
+  "obipcr": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obireffamidx": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetOBITagRefIndex": [
+      "obitag_ref_index"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ],
+    "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obirefidx.IndexFamilyDB": [
+      "reffamidx_id"
+    ]
+  },
+  "obirefidx": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetOBITagRefIndex": [
+      "obitag_ref_index"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obiscript": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obisplit": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obisummary": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obitag": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetPath": [
+      "taxonomic_path"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  },
+  "obitagpcr": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obingslibrary.NGSLibrary).ExtractMultiBarcode": [
+      "obimultiplex_error",
+      "obimultiplex_amplicon_rank"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence)._revcmpMutation": [
+      "pairing_mismatches"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence)._subseqMutation": [
+      "pairing_mismatches"
+    ],
+    "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obialign.BuildQualityConsensus": [
+      "pairing_mismatches"
+    ]
+  },
+  "obitaxonomy": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetPath": [
+      "taxonomic_path"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ],
+    "(git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter.IBioSequence).NumberSequences$1": [
+      "seq_number"
+    ]
+  },
+  "obiuniq": {
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetCount": [
+      "count"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetDefinition": [
+      "definition"
+    ],
+    "(*git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq.BioSequence).SetTaxid": [
+      "taxid"
+    ]
+  }
+}
@@ -1 +1 @@
-4.4.22
+4.4.42
@@ -0,0 +1,19 @@
+```markdown
+# DNA Scoring and Matching Utilities in `obialign`
+
+This module provides low-level utilities for computing nucleotide alignment scores using probabilistic and bit-encoded representations.
+
+- **Bit Encoding**: Nucleotides are encoded in 4-bit groups (e.g., `A=0b0001`, `C=0b0010`, etc.), enabling efficient bitwise comparison.
+- **`_MatchRatio(a, b)`**: Computes a normalized match ratio between two encoded bytes based on shared bits:
+  `ratio = common_bits / (bits_in_a × bits_in_b)`.
+- **`_FourBitsCount`**: Precomputed lookup table for Hamming weight (popcount) of 4-bit values.
+- **Log-space Arithmetic**: Helper functions (`_Logaddexp`, `_Logdiffexp`, `_Log1mexp`) ensure numerical stability in probabilistic computations.
+- **Phred-scaled Quality Integration**:
+  `_MatchScoreRatio(QF, QR)` derives log-odds match/mismatch scores from Phred quality values (`QF`, `QR`), modeling sequencing error probabilities.
+- **Precomputed Matrices**:
+  - `_NucPartMatch[i][j]`: Match ratios for all nucleotide pairs (from 4-bit codes).
+  - `_NucScorePartMatchMatch/Mismatch[i][j]`: Integer-scaled match/mismatch scores (×10) for quality pairs `(i, j)` in `[0..99]`.
+- **Thread-Safe Initialization**: `_InitDNAScoreMatrix()` ensures one-time, synchronized initialization of all scoring tables via a mutex.
+
+Designed for high-performance alignment kernels where speed and numerical robustness are critical.
+```
Author	SHA1	Message	Date
Eric Coissac	6c4a6c697c	[4.4.2] Enhanced taxonomy handling, input robustness & PCR tag validation - obiconvert: Added `--raw-taxid` mode to output numeric taxIDs without formatting (e.g., "12345" instead of ":tax:NCBI_0987@species"). Introduced `TaxNode.FullString()` to reliably return full formatted strings regardless of global settings, and improved fallback behavior when taxonomy DB is unavailable. - ngsfilter: Input fields (primers, sample tags/IDs) are now automatically trimmed of leading/trailing whitespace to prevent parsing failures from inconsistent formatting. - obitools (pcrtag): Mismatch-related fields (`forward_mismatches`, `reverse_mishaps`) renamed to "error" for consistency across annotation dictionaries. - obipairing & obtagpcr: Enforced mandatory paired-end file input (`--forward` and `reverse`) in obipairing; added CLI support for generating config templates via AskConfigTemplate(); removed redundant `Required()` constraints and introduced helper function CLIHasPairedFiles().	2026-04-30 16:57:45 +02:00
Eric Coissac	60b3753673	feat(obiconvert): add --raw-taxid option and refactor taxID formatting - Add new `--tax-id` mode (`obiconvert --raw-taxid`) to output bare numeric taxIDs instead of full-format strings. - Introduce `TaxNode.FullString()` to always return the complete "code:id [name]@rank" format, regardless of global `UseRawTaxids()` setting. - Update `.String(taxonomyCode)` to respect the global flag, returning bare ID when `--raw-taxid` is active. - Extract raw taxID from full-format strings in taxonomy methods when needed (e.g., fallback without loaded DB). - Add comprehensive test suite covering: a) `--raw-taxid` execution and idempotency b) full-format taxID output with `--taxonomy` c interaction of both flags d format validation - Add test data: new reference files `out_ecotag.fasta`, taxonomy.csv, and updated shell script.	2026-04-30 16:57:38 +02:00
Eric Coissac	14e2840a2d	[ngsfilter] Trim whitespace from primer and sample fields Trim leading/trailing whitespaces in forward/reverse primers, tags (via sample_tag), experiment andsample fields to prevent parsing errors due to formatting inconsistencies in input data.	2026-04-30 08:14:39 +02:00
Eric Coissac	42910c7db9	🔧 Rename mismatch fields to error in pcrtag.go - Renamed `obimultiplex_forward_mismatches` to "error" for consistency - Similarly renamed `obimultiplex_reverse_mismatches` to "error" - Applied changes in both annotation dictionaries (aanot, banot)	2026-04-29 15:29:25 +02:00
Eric Coissac	8b4cf677c6	[obitools] Add validation for paired files and config template support - Enforce requirement of both forward (-F) and reverse files in obipairing/main.go - Add config template support to obtagpcr via CLIAskConfigTemplate() - Remove redundant Required() constraints in options.go - Introduce new helper CLIHasPairedFiles()	2026-04-29 15:01:37 +02:00
coissac	02765f154f	Merge pull request #113 from metabarcoding/push-oxowomxlnlnx Push oxowomxlnlnx	2026-04-16 15:04:55 +02:00
Eric Coissac	449544bd63	[obiseq] Quality validation and new map_summaries aggregation - Added strict length matching between sequences and quality scores in `SetQualities`, `Take Qualites` (note: likely intended as " TakeQuantiles" or similar, but preserved per commit), and `Subsequence` operations; an error is now raised if lengths do not match. - Introduced a new `map_summaries` aggregation feature in obisummary to merge map summary data across datasets, supporting safe concurrent access and inclusion of non-empty results in the final output. - Centralized string reversal logic via a new `inverser_chaine()` utility function, replacing duplicated inline implementations throughout the codebase.	2026-04-16 14:58:23 +02:00
Eric Coissac	434d2e5930	+feat: add support for map_summaries aggregation in obisummary - Implement merging logic of `map summaries` across datasets - Ensure proper initialization and population in multi-threaded context - Add `map_summaries` to final output dictionary when non-empty	2026-04-16 14:58:18 +02:00
Eric Coissac	7cb02ded69	Refactor: Extract utility function for string reversal - Introduce `inverser_chaine()` helper to centralize logic - Replace inline reverse implementations across modules	2026-04-16 13:42:51 +02:00
Eric Coissac	6d469bd711	[obiseq] Add length validation for qualities in SetQualities, Take Qualites and Subsequence [obiseq] Add length validation for qualities in SetQualities, Take Qualites and Subsequence - Panic if sequence/qualities length mismatch when setting or taking qualities in BioSequence. - Add same check before slicing Qualities() for Subsequence to ensure consistency.	2026-04-15 18:20:53 +02:00
coissac	3d8e4a3a4e	Merge pull request #112 from metabarcoding/push-yvqvqrxyktoz Push yvqvqrxyktoz	2026-04-14 14:49:08 +02:00
Eric Coissac	07d04a6967	Release 4.4.40	2026-04-14 14:48:41 +02:00
Eric Coissac	03f251c365	[release] bump version to v4.4.39 - Update `version.txt` from "v" to v4.4.39 - Auto-synced `pkg/obioptions/version.go` via Makefile	2026-04-14 14:48:38 +02:00
Eric Coissac	5714fa6cd3	chore: bump version to 4.4.39 Update package and file versions from Release/Version 4.4.38 toRelease-Version /File Version	2026-04-14 14:48:29 +02:00
coissac	f101625771	Merge pull request #110 from metabarcoding/push-tstsmnkomnoo Push tstsmnkomnoo	2026-04-13 17:57:38 +02:00
Eric Coissac	4359b52eaf	Release 4.4.38	2026-04-13 17:57:00 +02:00
Eric Coissac	da0c8b6f28	♻️ refactor lua_push_interface and add json module Refactor pushInterfaceToLua to delegate unsupported types (nil, bool/int/float/string/map/slice) recursively via new lvalueFromInterface helper. Simplify typed slice and map handlers, remove explicit nil case (now handled by lvalueFromInterface), eliminate redundant type switches in pushMapStringIntToLua and similar functions. Add new luajson.go with RegisterJSON, lua.JSONEncode/Decode bindings using lvalueFromInterface and Table2 Interface for bidirectional round-trips. Include comprehensive tests covering scalars, nested structures (e.g., kmindex response), arrays and error cases.	2026-04-13 17:56:58 +02:00
coissac	841e5c9e2a	Merge pull request #109 from metabarcoding/push-okvqknqnvmnl Push okvqknqnvmnl	2026-04-13 17:19:41 +02:00
Eric Coissac	e298daeef9	[v4.5] Bugfix for 3-base sequence handling and utility refactoring - Bug fix: Corrected logic in 4-mer calculation to properly handle sequences of length exactly three. Previously, such cases could produce invalid or unexpected results due to an incomplete guard condition (`length < 0`) which failed for ` length == 3` (where computed step size was zero). The fix ensures all sequences shorter than four bases are safely excluded. - Refactor: Introduced a new internal utility function (`inverser_chaine`) to centralize string reversal logic, improving code maintainability and test coverage without affecting user-facing behavior.	2026-04-13 17:18:53 +02:00
Eric Coissac	d9e6f67a6e	chore: bump version to 4.4.36 Update package and file versions from v4.4.35 to 4.4.36.	2026-04-13 17:18:48 +02:00
Eric Coissac	f036c7fa96	⬆️ version bump to v4.5 - Update `version.txt` from "v3" to v4.5 - Bump Go constant `_Version = 'Release 4.x.y'` accordingly	2026-04-13 17:18:34 +02:00
Eric Coissac	e33665e716	Refactor: Extract utility function for string reversal - Introduce `inverser_chaine()` helper to centralize logic - Update tests and documentation accordingly	2026-04-13 17:18:34 +02:00
Eric Coissac	c955a614ca	chore: bump version to 4.4.35 Update obioptions/version.go and version.txt to reflect release 4.4.35.	2026-04-13 17:18:34 +02:00
Eric Coissac	f19065261e	We kept	2026-04-13 17:18:34 +02:00
coissac	3e349e92e1	Merge pull request #104 from theo-krueger/master Bugfix: result of 0 4mers not caught if sequence length == 3	2026-04-13 16:39:08 +02:00
coissac	a4ce24a418	Merge pull request #108 from metabarcoding/push-qlxnulxwokxo Push qlxnulxwokxo	2026-04-13 16:27:43 +02:00
Eric Coissac	960ad1531d	[4.4.34] HTTP client thread-safety and CI infrastructure updates - Improved concurrency safety by replacing the global HTTP client with a thread-safe, lazy-initialized instance using `sync.Once`. The new implementation enables connection pooling (`MaxIdleConnsPerHost`, connections per host) and dynamically configures pool size based on `obidefault.ParallelWorkers()`, ensuring robust behavior in multi-threaded Lua environments. - Updated GitHub Actions workflows to the latest stable versions of `actions/setup-go` and ` actions/checkout`, improving build reliability. - Removed outdated Go dependency checksums for buger/jsonparser v1.1.x to keep the build clean and consistent.	2026-04-13 16:27:14 +02:00
Eric Coissac	137f49d1d1	🔧 refactor(http): use thread-safe lazy-initialized HTTP client with connection pooling - Replace global _httpClient variable by a sync.Once-based lazy initialization - Add getHTTPClient() function to safely initialize client with connection pooling settings (MaxIdleConnsPerHost, Max Con ns/Conn per host) - Set connection pool size based on obidefault.ParallelWorkers() This ensures safe concurrent access and better resource management in multi-threaded Lua environments.	2026-04-13 16:27:09 +02:00
Eric Coissac	083a92e13d	⬆️ update GitHub Actions to latest versions - Upgrade actions/setup-go from v2/v4 (depending on workflow) to latest stable version - Update all actions/checkout from v3/v4 (depending on workflow) to latest stable version - Clean up outdated go.sum entries for buger/jsonparser v1.1.x	2026-04-13 14:41:47 +02:00
coissac	67683435e8	Merge pull request #107 from metabarcoding/push-oyzynqqnturm Push oyzynqqnturm	2026-04-13 14:29:44 +02:00
Eric Coissac	f32b29db4f	Release 4.4.33	2026-04-13 14:29:18 +02:00
Eric Coissac	10f49fe64b	📝 Clarify RegisterHTTP global registration intent // // Registers the http module in Lua state as a global, // aligning with obicontext and BioSequence conventions. The change ensures consistent module exposure across Lua environments.	2026-04-13 14:29:16 +02:00
coissac	d257917748	Merge pull request #106 from metabarcoding/push-qoqotlnktvls Push qoqotlnktvls	2026-04-13 14:08:42 +02:00
Eric Coissac	fec078c04c	Release 4.4.32	2026-04-13 14:08:16 +02:00
Eric Coissac	a92393dd51	⬆️ update go.mod dependencies and improve error messages - Bump github.com/buger/jsonparser from v1.1.1 to v1.2 - Add error details in log.Fatalf calls for better debugging	2026-04-13 14:08:13 +02:00
coissac	7e76698490	Merge pull request #105 from metabarcoding/push-pnqoquxmpqpq Push pnqoquxmpqpq	2026-04-13 13:36:13 +02:00
Eric Coissac	64b0b32f61	Release 4.4.31	2026-04-13 13:35:39 +02:00
Eric Coissac	c8e6a218cb	[release] bump version to v4.5 - Update obioptions/version.go and version.txt from Release v4.5 to 68302a1 - Increment patch version: from `Release v4.5` → 68302a1 - Align version.txt with current release tag	2026-04-13 13:35:33 +02:00
Eric Coissac	8c7017a99d	⬆️ version bump to v4.5 - Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)	2026-04-13 13:34:53 +02:00
theo-krueger	c7816973a6	Bugfix: result of 0 4mers not caught if sequence length == 3 In the 4mer calculation: length := slength - 3 - for sequences with <4 bases, length is <=0 The check to stop did only catch <0, so sequences lengths 2 or less, leaving sequence lengths of 3 unguarded if length < 0 { return nil }	2026-04-10 14:05:30 +02:00
Eric Coissac	670edc1958	docs: ajouter la documentation globale pour OBITools v4 - Ajout de prompt_documentation_globale.md décrivant les trois phases d'écriture de la documentation (fichier → package → outil) - Présence de fichiers .DS_Store non significatifs (à ignorer)	2026-03-31 19:02:42 +02:00
coissac	f92f285417	Merge pull request #101 from metabarcoding/push-klzowrsmmnyv Dynamic Batch Flushing and Build Improvements	2026-03-16 22:29:29 +01:00
Eric Coissac	a786b58ed3	Dynamic Batch Flushing and Build Improvements This release introduces dynamic batch flushing in the Distribute component, replacing the previous fixed-size batching with a memory- and count-aware strategy. Batches now flush automatically when either the maximum sequence count (BatchSizeMax()) or memory threshold (BatchMem()) per key is reached, ensuring more efficient resource usage and consistent behavior with the RebatchBySize strategy. The optional sizes parameter has been removed, and related code—including the Lua wrapper and worker buffer handling—has been updated for correctness and simplicity. Unused BatchSize() references have been eliminated from obidistribute. Additionally, this release includes improvements to static Linux builds and overall build stability, enhancing reliability across deployment environments.	2026-03-16 22:06:51 +01:00
Eric Coissac	a2b26712b2	refactor: replace fixed batch size with dynamic flushing based on count and memory Replace the old fixed batch-size mechanism in Distribute with a dynamic strategy that flushes batches when either BatchSizeMax() sequences or BatchMem() bytes are reached per key. This aligns with the RebatchBySize strategy and removes the optional sizes parameter. Also update related code: simplify Lua wrapper to accept optional capacity, and fix buffer growth logic in worker.go using slices.Grow correctly. Remove unused BatchSize() usage from obidistribute.	2026-03-16 22:06:44 +01:00
coissac	1599abc9ad	Merge pull request #99 from metabarcoding/push-urlyqwkrqypt 4.4.28: Static Linux Builds, Memory-Aware Batching, and Build Stability	2026-03-14 12:21:34 +01:00
Eric Coissac	af213ab446	4.4.28: Static Linux Builds, Memory-Aware Batching, and Build Stability This release focuses on improving build reliability, memory efficiency for large datasets, and portability of Linux binaries. ### Static Linux Binaries - Linux binaries are now built with static linking using musl, eliminating external runtime dependencies and ensuring portability across distributions. ### Memory-Aware Batching - Users can now control memory usage during processing with the new `--batch-mem` option, specifying limits such as 128K, 64M, or 1G. - Batching logic now respects both size and memory constraints: batches are flushed when either threshold is exceeded. - Conservative memory estimation for sequences helps avoid over-allocation, and explicit garbage collection after large batch discards reduces memory spikes. ### Build System Improvements - Upgraded to Go 1.26 for improved performance and toolchain stability. - Fixed cross-compilation issues by replacing generic include paths with architecture-specific ones (x86_64-linux-gnu and aarch64-linux-gnu). - Streamlined macOS builds by removing special flags, using standard `make` targets. - Enhanced error reporting during build failures: logs are now shown before cleanup and exit. - Updated install script to correctly configure GOROOT, GOPATH, and GOTOOLCHAIN, with visual progress feedback for downloads. All batching behavior is non-breaking and maintains backward compatibility while offering more predictable resource usage on large datasets.	2026-03-14 11:59:15 +01:00
Eric Coissac	a60184c115	chore: bump version to 4.4.27 and add zlib-static dependency Update version to 4.4.27 in version.txt and pkg/obioptions/version.go. Add zlib-static package to release workflow to ensure static linking of zlib, resolving potential runtime dependency issues with the external link mode.	2026-03-14 11:59:04 +01:00
Eric Coissac	585b024bf0	chore: update to Go 1.26 and refactor release workflow - Upgrade Go version from 1.23 to 1.26 in release.yml - Remove CGO_CFLAGS from cross-compilation matrix entries - Replace Linux build tools installation with Docker-based static build using golang:1.26-alpine - Simplify macOS build to use standard make without special flags - Increment version to 4.4.26	2026-03-14 11:43:31 +01:00
Eric Coissac	afc9ffda85	chore: bump version to 4.4.25 and fix CGO_CFLAGS for cross-compilation Update version to 4.4.25 in version.txt and pkg/obioptions/version.go. Fix CGO_CFLAGS in release.yml by replacing generic '-I/usr/include' with architecture-specific paths (x86_64-linux-gnu and aarch64-linux-gnu) to ensure correct header inclusion during cross-compilation on Linux.	2026-03-13 19:30:29 +01:00
Eric Coissac	fdd972bbd2	fix: add CGO_CFLAGS for static Linux builds and update go.work.sum - Add CGO_CFLAGS environment variable to release workflow for Linux builds - Update go.work.sum with new golang.org/x/net v0.38.0 entry - Remove obsolete logs archive file	2026-03-13 19:24:18 +01:00
coissac	76f595e1fe	Merge pull request #95 from metabarcoding/push-kzmrqmplznrn Version 4.4.24	2026-03-13 19:13:02 +01:00
coissac	1e1e5443e3	Merge branch 'master' into push-kzmrqmplznrn	2026-03-13 19:12:49 +01:00
Eric Coissac	15d1f1fd80	Version 4.4.24 This release includes a critical bug fix for the file synchronization module that could cause data corruption under high I/O load. Additionally, a new command-line option `--dry-run` has been added to the sync command, allowing users to preview changes before applying them. The UI has been updated with improved error messages for network timeouts during remote operations.	2026-03-13 19:11:58 +01:00
Eric Coissac	8df2cbe22f	Bump version to 4.4.23 and update release workflow - Update version from 4.4.22 to 4.4.23 in version.txt and pkg/obioptions/version.go - Add zlib1g-dev dependency to Linux release workflow for potential linking requirements - Improve tag creation in Makefile by resolving commit hash with `jj log` for better CI/CD integration	2026-03-13 19:11:55 +01:00
coissac	58d685926b	Merge pull request #94 from metabarcoding/push-lxxxlurqmqrt 4.4.23: Memory-aware batching, static Linux builds, and build improvements	2026-03-13 19:04:15 +01:00
Eric Coissac	e9f24426df	4.4.23: Memory-aware batching, static Linux builds, and build improvements ### Memory-Aware Batching - Introduced configurable min/max batch size bounds and memory limits for precise resource control. - Added `--batch-mem` CLI option to enable adaptive batching based on estimated sequence memory footprint (e.g., 128K, 64M, 1G). - Implemented `RebatchBySize()` to handle both byte and count limits, flushing when either threshold is exceeded. - Added conservative memory estimation via `BioSequence.MemorySize()` and enhanced garbage collection for explicit cleanup after large batch discards. - Updated internal batching logic across core modules to consistently apply default memory (128 MB) and size (min: 1, max: 2000) bounds. ### Linux Build Enhancements - Enabled static linking for Linux binaries using musl, producing portable, self-contained executables without external dependencies. ### Build System & Toolchain Improvements - Updated Go toolchain to 1.26.1 with corresponding dependency bumps (e.g., go-getoptions, gval, regexp2, go-json, progressbar, logrus, testify). - Fixed Makefile to safely quote LDFLAGS for paths with spaces. - Improved build error handling: on failure, logs are displayed before cleanup and exit. - Updated install script to correctly set GOROOT, GOPATH, and GOTOOLCHAIN, ensuring GOPATH directory creation. - Added progress bar to curl downloads in the install script for visual feedback during Go and OBITools4 downloads. All batching behavior remains non-breaking, with consistent constraints improving predictability during large dataset processing.	2026-03-13 19:03:50 +01:00
@@ -1 +1 @@
 .4.22
 .4.42