# Semantic Description of `IUniqueSequence` Functionality

The `IUniqueSequence` function performs **dereplication** of biological sequence data — i.e., grouping identical or near-identical sequences while preserving metadata and counts. It operates on an `obiiter.IBioSequenceBatch` iterator.

## Core Workflow

1. **Input Processing**  
   Accepts an input sequence iterator and optional configuration via `WithOption`.

2. **Parallelization Strategy**  
   Supports configurable parallel workers (`nworkers`). When `SortOnDisk()` is enabled, it falls back to single-threaded processing for disk-based sorting.

3. **Data Splitting Phase**  
   - Uses `HashClassifier` to partition input into buckets (controlled by `BatchCount`).  
   - Ensures deterministic chunking for reproducibility.

4. **Storage Choice**  
   - *In-memory*: via `ISequenceChunkOnMemory`.  
   - *Disk-based*: via `ISequenceSubChunk` + external sorting (requires single worker).

5. **Uniqueness Classification**  
   - Builds a composite classifier combining:
     - Sequence identity (`SequenceClassifier`)
     - Optional annotation categories (e.g., sample, primer), with NA handling.
   - If no annotations are specified, only raw sequence identity is used.

6. **Singleton Filtering**  
   Optionally excludes singleton reads (count = 1) via `NoSingleton()` option.

7. **Parallel Dereplication**  
   - Spawns worker goroutines to process chunks.
   - Each worker applies `ISequenceSubChunk` + deduplication logic per classifier group.

8. **Output Merging**  
   - Aggregates results using `IMergeSequenceBatch`, preserving:
     - Sequence counts
     - Statistics (if enabled)
     - NA handling and ordering

## Key Features

- **Scalable**: Supports both memory-efficient (disk) and high-speed (RAM) modes.
- **Configurable**: Via functional options (`Options`).
- **Thread-safe**: Uses `sync.Mutex` for deterministic ordering.
- **Metadata-aware**: Incorporates annotation-based grouping (e.g., sample, primer).