mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.8 KiB
1.8 KiB
IFragments Functionality Overview
The IFragments() function in the obiiter package implements a parallelized sequence fragmentation pipeline for biological sequences. It is designed to split long nucleotide or protein sequences into smaller, overlapping fragments while preserving metadata and enabling concurrent processing.
Core Parameters
minsize: Minimum sequence length to skip fragmentation.length: Desired fragment size (in bases/amino acids).overlap: Number of overlapping residues between consecutive fragments.size,nworkers: Batch size and number of worker goroutines (currently unused in active logic).
Workflow
- Batch Sorting: Input sequences are batched and sorted for efficient processing.
- Parallel Fragmentation:
- Each worker processes a subset of batches independently using goroutines.
- For each sequence longer than
minsize, it is split into overlapping fragments of lengthlengthwith step size =length - overlap. - The final fragment is extended to cover the remainder (fusion mode), avoiding tiny trailing pieces.
- Resource Management:
- Original sequences are recycled (
s.Recycle()) to optimize memory usage. - Fragments are reassembled into batches, sorted by source and order, then rebatched to respect memory/size limits.
- Original sequences are recycled (
Key Features
- Overlap handling: Ensures contiguous coverage without gaps.
- Memory efficiency: Uses recycling and batched output.
- Scalability: Leverages Go concurrency via
nworkers. - Error safety: Panics on subsequence errors (e.g., invalid indices).
Use Case
Ideal for preparing long-read sequencing data (e.g., PacBio, Nanopore) or assembled contigs for downstream analysis requiring fixed-length inputs (e.g., k-mer indexing, ML inference).