refactor: add rolling buffer methods and document label constraints
Added `is_empty()`, `clear()`, and `iter()` methods to the rolling statistics buffer to enable standard traversal and state reset operations. Documented genome label constraints, specifying forbidden characters, empty label rejection, space quoting requirements, and auto-derived label bypass rules. Additionally, updated doc comments and added `#[allow(dead_code)]` attributes for `kmer_offset` and `n_kmers` fields to suppress compiler warnings while reserving them for future `--detail` coverage vector logic.
This commit is contained in:
+15
-1
@@ -17,7 +17,7 @@
|
||||
| `unitig` | Dump unitigs from a built index to stdout (debug) |
|
||||
| `estimate` | Estimate approximate-index parameters (z, evidence bits, FP rates) before indexing |
|
||||
| `reindex` | Convert an index's evidence in-place: exact ↔ approx |
|
||||
| `utils` | Miscellaneous index utilities: `--new-label NEW=OLD` renames a genome label in-place |
|
||||
| `utils` | Miscellaneous index utilities: `--new-label NEW=OLD` renames a genome label in-place (NEW gets OLD's identity) |
|
||||
|
||||
## Constraints
|
||||
|
||||
@@ -27,6 +27,20 @@
|
||||
- Canonical form: `min(kmer, revcomp(kmer))` reduces strand-symmetric space by half
|
||||
- Input formats: FASTA, FASTQ, gzip, streaming stdin; `index` reads from stdin automatically when no input files are provided (`-` can also be passed explicitly among other paths)
|
||||
|
||||
## Genome label constraints
|
||||
|
||||
Genome labels are arbitrary Unicode strings with the following restrictions:
|
||||
|
||||
| Character | Forbidden | Reason |
|
||||
|-----------|-----------|--------|
|
||||
| `/` | yes | filesystem path separator |
|
||||
| `=` | yes | `--new-label` parser separator |
|
||||
| `\0` | yes | null byte |
|
||||
| `\n` `\r` `\t` | yes | break CSV output |
|
||||
| spaces | **allowed** | use shell quoting: `--new-label 'new label=old label'` |
|
||||
|
||||
Empty labels are also rejected. Labels derived automatically from the index directory name (when `--label` is omitted) are not validated since they come from the filesystem and are already safe.
|
||||
|
||||
## Priority operations
|
||||
|
||||
- Kmer counting (frequencies)
|
||||
|
||||
Reference in New Issue
Block a user