# Chunk reader — implementation The `obiread` crate provides a streaming iterator that reads FASTA or FASTQ files in fixed-size blocks and yields self-contained chunks, each ending on a complete sequence record boundary. Chunks are consumed in parallel by downstream workers. ## Output type: Rope Each chunk is a `Rope` — a segmented byte sequence: a `Vec` of blocks, where each block is a `Vec>`. The consumer iterates over the blocks via a forward or backward cursor. `Rope::split_off(pos)` splits at an absolute byte offset in O(log n) (binary search over block-start index). If `pos` falls inside a block, that block is split in two via `Vec::split_off` — no `memcpy` in the common case. ## SeqChunkIter ```rust pub struct SeqChunkIter { /* private */ } impl Iterator for SeqChunkIter { type Item = io::Result; } pub fn fasta_chunks(source: R) -> SeqChunkIter pub fn fastq_chunks(source: R) -> SeqChunkIter ``` `next()` loop: ```text 1. read one block of block_size bytes → push onto Rope 2. call splitter(rope) → Option if Some(pos): tail = rope.split_off(pos) ← O(log n), may split one block chunk = mem::replace(&mut rope, tail) return Some(Ok(chunk)) 3. if EOF and rope non-empty: return Some(Ok(rope)) as final chunk 4. if EOF and rope empty: return None ``` The `Splitter` function signature is `fn(&Rope) -> Option`. It returns the absolute byte offset of the start of the last complete record, or `None` if no boundary was found in the accumulated rope (need more data). ## Boundary detection — FASTA Backward scan with a 2-state machine. Searches (right to left) for `>` followed by `\n` or `\r` (i.e., a `>` that is preceded by a newline in forward order): ```mermaid stateDiagram-v2 direction LR [*] --> Scanning Scanning --> FoundGt : '>' FoundGt --> Scanning : other FoundGt --> [*] : '\\n' / '\\r' ✓ ``` Returns the byte offset of the `>` that starts the last complete record. Returns `None` if only one `>` is found (cannot confirm there is a prior complete record). ## Boundary detection — FASTQ FASTQ records have a rigid 4-line structure (`@header`, sequence, `+`, quality). The `@` character (ASCII 64, Phred score 31) can appear legitimately in quality lines, making any forward heuristic unreliable. The backward scanner verifies the full structural context before accepting a candidate `@`. 7-state machine (states 0–6), scanning from **right to left**. Each time a `+` is found, its position is saved as `restart`; any state mismatch resets the scan to that position. ```mermaid stateDiagram-v2 direction LR [*] --> Scanning Scanning --> FoundPlus : '+' (save restart) FoundPlus --> AfterNlPlus : '\\n' / '\\r' FoundPlus --> Scanning : other → backtrack AfterNlPlus --> AfterNlPlus : séparateur AfterNlPlus --> InSequence : lettre / - / . / [ / ] AfterNlPlus --> Scanning : other → backtrack InSequence --> AfterSequence : '\\n' / '\\r' InSequence --> InSequence : lettre / - / . / [ / ] InSequence --> Scanning : other → backtrack AfterSequence --> AfterSequence : '\\n' / '\\r' AfterSequence --> InHeader : other InHeader --> FoundAt : '@' (save cut) InHeader --> Scanning : '\\n' / '\\r' → backtrack InHeader --> InHeader : other FoundAt --> [*] : '\\n' / '\\r' ✓ FoundAt --> InHeader : other ``` `restart` is updated each time a `+` is found. When any state fails its expected input, the scan jumps back to `restart` and continues from there — guaranteeing that a `@` in a quality line cannot be accepted as a record start, because the `\n+\n` structure immediately following it (going backward) will not be found. Returns the byte offset of the `@` that starts the last complete record.