<h1id="kmers-and-super-kmers">Kmers and super-kmers</h1>
<h2id="kmers">Kmers</h2>
<p>A <strong>kmer</strong> is a DNA subsequence of fixed length k. Two constraints govern the choice of k:</p>
<ul>
<li><strong>k ∈ [11, 31]</strong>: the range ensures the kmer is long enough to be specific and short enough to fit in a single machine word.</li>
<li><strong>k is odd</strong>: an odd-length sequence cannot equal its own reverse complement (no palindromes). This guarantees that the canonical form <code>min(kmer, revcomp(kmer))</code> is always strictly defined — the two orientations are always distinct — which is required for strand-independent counting.</li>
<p>A <strong>super-kmer</strong> is a maximal run of consecutive kmers from a DNA read, each overlapping the next by k−1 nucleotides. Each kmer of the run carries the same <strong>canonical minimizer</strong>. The <strong>canonical minimizer</strong> of a kmer is the smallest value of <code>min(m-mer, revcomp(m-mer))</code> over all m-mers within the kmer (m < k, m odd), with the constraint that <strong>non-degenerate m-mers are always preferred</strong> over degenerate ones. A degenerate m-mer is one composed of a single repeated nucleotide (all-A, all-C, all-G, or all-T); such m-mers are selected only if no non-degenerate candidate exists in the window.</p>
<p>When a read and its reverse-complement are both sequenced, they produce super-kmers that are reverse complements of each other. Both map to the same canonical form: the same genomic region is represented by a single canonical super-kmer regardless of which strand was read.</p>
<h3id="expected-length-of-a-super-kmer">Expected length of a super-kmer</h3>
<p>For a random minimizer of length m over k-mers of length k, the density of minimizer positions is approximately 2/(k−m+2) (Golan & Shur 2025; Zheng <em>et al.</em> 2020)<supid="fnref:Zheng2020-ji"><aclass="footnote-ref"href="#fn:Zheng2020-ji">2</a></sup><supid="fnref:Golan2025-xf"><aclass="footnote-ref"href="#fn:Golan2025-xf">3</a></sup>, so the expected number of consecutive k-mers per super-kmer is (k−m+2)/2. A run of n k-mers spans n + k − 1 nucleotides, giving:</p>
<divclass="arithmatex">\[L_{\text{nt}} = \frac{k-m+2}{2} + k - 1\]</div>
<p>For k=31, m=13: expected ≈ 40 nt. In practice super-kmers rarely exceed a few dozen nucleotides.<supid="fnref:superkmer_length"><aclass="footnote-ref"href="#fn:superkmer_length">1</a></sup></p>
<divclass="footnote">
<hr/>
<ol>
<liid="fn:superkmer_length">
<p>The expected length formula and the density approximation 2/(k−m+2) should be verified against the values reported in (Zheng <em>et al.</em> 2020)<supid="fnref2:Zheng2020-ji"><aclass="footnote-ref"href="#fn:Zheng2020-ji">2</a></sup> and (Golan & Shur 2025)<supid="fnref2:Golan2025-xf"><aclass="footnote-ref"href="#fn:Golan2025-xf">3</a></sup>. <aclass="footnote-backref"href="#fnref:superkmer_length"title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<liid="fn:Zheng2020-ji">
<p>Zheng, H., Kingsford, C. & Marçais, G. (2020). <ahref="https://doi.org/10.1093/bioinformatics/btaa472">Improved design and analysis of practical minimizers</a>. <em>Bioinformatics (Oxford, England)</em>, 36, i119--i127. <aclass="footnote-backref"href="#fnref:Zheng2020-ji"title="Jump back to footnote 2 in the text">↩</a><aclass="footnote-backref"href="#fnref2:Zheng2020-ji"title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<liid="fn:Golan2025-xf">
<p>Golan, S. & Shur, A.M. (2025). <ahref="https://doi.org/10.1007/978-3-031-82670-2\_25">Expected density of random minimizers</a>. In: <em>Lecture notes in computer science</em>, Lecture notes in computer science. Springer Nature Switzerland, Cham, pp. 347--360. <aclass="footnote-backref"href="#fnref:Golan2025-xf"title="Jump back to footnote 3 in the text">↩</a><aclass="footnote-backref"href="#fnref2:Golan2025-xf"title="Jump back to footnote 3 in the text">↩</a></p>
<scriptid="__config"type="application/json">{"annotate":null,"base":"..","features":[],"search":"../assets/javascripts/workers/search.2c215733.min.js","tags":null,"translations":{"clipboard.copied":"Copied to clipboard","clipboard.copy":"Copy to clipboard","search.result.more.one":"1 more on this page","search.result.more.other":"# more on this page","search.result.none":"No matching documents","search.result.one":"1 matching document","search.result.other":"# matching documents","search.result.placeholder":"Type to start searching","search.result.term.missing":"Missing","select.version":"Select version"},"version":null}</script>