feat: add pairwise distance computation and phylogenetic trees
This commit introduces a new `distance` CLI subcommand that computes pairwise genomic distance matrices using configurable metrics (Jaccard, Hamming, Bray-Curtis, Euclidean, and Hellinger). It optionally generates phylogenetic trees (NJ or UPGMA) in Newick format and outputs results as CSV. The implementation adds a robust distance computation backend that dynamically routes to optimized backends based on index configuration, supports parallel iteration, and gracefully handles missing data. Additionally, it adds a `dump` task for exporting k-mer to genome mappings as CSV, introduces an `InvalidInput` error variant, updates dependencies to support numerical operations and tree construction, and performs minor module reorganizations.
This commit is contained in:
@@ -80,6 +80,13 @@ pub trait CountPartials: ColumnWeights {
|
||||
let sq2 = std::f64::consts::SQRT_2;
|
||||
self.partial_hellinger(&global).mapv(|v| v.sqrt() / sq2)
|
||||
}
|
||||
|
||||
/// Euclidean distance in the Hellinger (√relative-frequency) space.
|
||||
/// Equal to √2 × hellinger_dist — unnormalised variant.
|
||||
fn hellinger_euclidean_dist_matrix(&self) -> Array2<f64> {
|
||||
let global = self.col_weights();
|
||||
self.partial_hellinger(&global).mapv(|v| v.sqrt())
|
||||
}
|
||||
}
|
||||
|
||||
/// Partial distance matrices for bit-based data (`PersistentBitMatrix`).
|
||||
|
||||
Reference in New Issue
Block a user