Files
OBIJupyterHub/jupyterhub_volumes/web/obidoc/docs/cookbook/ecoprimers/index.html
Eric Coissac 30b7175702 Make cleaning
2025-11-17 14:18:13 +01:00

2607 lines
118 KiB
HTML

<!DOCTYPE html>
<html lang="en-us" dir="ltr">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="
Designing new barcodes with ecoPrimers
#
ecoPrimers
(
Citation: Riaz,&#32;Shehzad
&amp; al.,&#32;2011
Riaz,&#32;
T.,&#32;
Shehzad,&#32;
W.,&#32;
Viari,&#32;
A.,&#32;
Pompanon,&#32;
F.,&#32;
Taberlet,&#32;
P.&#32;&amp;&#32;Coissac,&#32;
E.
&#32;
(2011).
&#32;ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis.
Nucleic acids research,&#32;39(21).&#32;e145.
https://doi.org/10.1093/nar/gkr732
)
is a tool for designing new DNA metabarcodes.
It is capable of working with a collection of mitochondrial genomes, chloroplast genomes or rRNA nuclear gene clusters. It is an alignment free method, which guarantees its efficiency.">
<meta name="theme-color" media="(prefers-color-scheme: light)" content="#ffffff">
<meta name="theme-color" media="(prefers-color-scheme: dark)" content="#343a40">
<meta name="color-scheme" content="light dark"><meta property="og:url" content="http://metabar:8888/obidoc/docs/cookbook/ecoprimers/">
<meta property="og:site_name" content="OBITools4 documentation">
<meta property="og:title" content="Designing new barcodes">
<meta property="og:description" content="Designing new barcodes with ecoPrimers # ecoPrimers ( Citation: Riaz, Shehzad &amp; al., 2011 Riaz, T., Shehzad, W., Viari, A., Pompanon, F., Taberlet, P. &amp; Coissac, E. (2011). ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic acids research, 39(21). e145. https://doi.org/10.1093/nar/gkr732 ) is a tool for designing new DNA metabarcodes. It is capable of working with a collection of mitochondrial genomes, chloroplast genomes or rRNA nuclear gene clusters. It is an alignment free method, which guarantees its efficiency.">
<meta property="og:locale" content="en_us">
<meta property="og:type" content="website">
<title>Designing new barcodes | OBITools4 documentation</title>
<link rel="icon" href="/obidoc/favicon.png" >
<link rel="manifest" href="/obidoc/manifest.json">
<link rel="canonical" href="http://metabar:8888/obidoc/docs/cookbook/ecoprimers/">
<link rel="stylesheet" href="/obidoc/book.min.5fd7b8e2d1c0ae15da279c52ff32731130386f71b58f011468f20d0056fe6b78.css" integrity="sha256-X9e44tHArhXaJ5xS/zJzETA4b3G1jwEUaPINAFb&#43;a3g=" crossorigin="anonymous">
<script defer src="/obidoc/fuse.min.js"></script>
<script defer src="/obidoc/en.search.min.4da51bdd2d833922fdbc0e19df517221387fc625ffb68ee140d605b3c5b68058.js" integrity="sha256-TaUb3S2DOSL9vA4Z31FyITh/xiX/to7hQNYFs8W2gFg=" crossorigin="anonymous"></script>
<script defer src="/obidoc/sw.min.32af8eafce4180aa1c5dea66d99fb26ba9043ea7c7a4c706138c91d9051b285e.js" integrity="sha256-Mq&#43;Or85BgKocXepm2Z&#43;ya6kEPqfHpMcGE4yR2QUbKF4=" crossorigin="anonymous"></script>
<link rel="alternate" type="application/rss+xml" href="http://metabar:8888/obidoc/docs/cookbook/ecoprimers/index.xml" title="OBITools4 documentation" />
<!--
Made with Book Theme
https://github.com/alex-shpak/hugo-book
-->
<link rel="stylesheet" type="text/css" href="http://metabar:8888/obidoc/hugo-cite.css" />
</head>
<body dir="ltr">
<input type="checkbox" class="hidden toggle" id="menu-control" />
<input type="checkbox" class="hidden toggle" id="toc-control" />
<main class="container flex">
<aside class="book-menu">
<div class="book-menu-content">
<nav>
<h2 class="book-brand">
<a class="flex align-center" href="/obidoc/"><img src="/obidoc/obitools_logo.jpg" alt="Logo" class="book-icon" /><span>OBITools4 documentation</span>
</a>
</h2>
<div class="book-search hidden">
<input type="text" id="book-search-input" placeholder="Search" aria-label="Search" maxlength="64" data-hotkeys="s/" />
<div class="book-search-spinner hidden"></div>
<ul id="book-search-results"></ul>
</div>
<script>document.querySelector(".book-search").classList.remove("hidden")</script>
<ul>
<li>
<span>Docs</span>
<ul>
<li>
<a href="/obidoc/docs/about/" class="">About</a>
</li>
<li>
<a href="/obidoc/docs/installation/" class="">Installation</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/principles/" class="">General operating principles</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-08756b4c1f14be6ee584ece005b9f621" class="toggle" />
<label for="section-08756b4c1f14be6ee584ece005b9f621" class="flex justify-between">
<a role="button" class="">File formats</a>
</label>
<ul>
<li>
<input type="checkbox" id="section-933c2e64b905b84e22aa5273cea2d0bd" class="toggle" />
<label for="section-933c2e64b905b84e22aa5273cea2d0bd" class="flex justify-between">
<a role="button" class="">Sequence file formats</a>
</label>
<ul>
<li>
<a href="/obidoc/formats/fasta/" class="">FASTA file format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/fastq/" class="">FASTQ file format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/genbank/" class="">GenBank Flat File format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/embl/" class="">EMBL Flat File format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/sequence_files/csv/" class="">CSV format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/json/" class="">JSON format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/sequence_files/annotations/" class="">Annotation of sequences</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-0258ae1c222f9a38cc1b75254c93b0f4" class="toggle" />
<label for="section-0258ae1c222f9a38cc1b75254c93b0f4" class="flex justify-between">
<a role="button" class="">Taxonomy file formats</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/file_format/taxonomy_file/csv_taxdump/" class="">CSV formatted taxdump</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/taxonomy_file/ncbi_taxdump/" class="">NCBI taxdump</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/formats/csv/" class="">The CSV format</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-70b1e6e5ec7f3ccab643155fa50659b6" class="toggle" />
<label for="section-70b1e6e5ec7f3ccab643155fa50659b6" class="flex justify-between">
<a role="button" class="">Patterns</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/patterns/regular/" class="">Regular Expressions</a>
</li>
<li>
<a href="/obidoc/docs/patterns/dnagrep/" class="">DNA Patterns</a>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-8223f464911a1fe6c655972143684e93" class="toggle" />
<label for="section-8223f464911a1fe6c655972143684e93" class="flex justify-between">
<a role="button" class="">The OBITools4 commands</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/commands/options/" class="">Shared command options</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-8921ea65523c266b128dd4263232b0fc" class="toggle" />
<label for="section-8921ea65523c266b128dd4263232b0fc" class="flex justify-between">
<a role="button" class="">Basics</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiannotate/" class="">obiannotate</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicomplement/" class="">obicomplement</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiconvert/" class="">obiconvert</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicount/" class="">obicount</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicsv/" class="">obicsv</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obidemerge/" class="">obidemerge</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obidistribute/" class="">obidistribute</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obigrep/" class="">obigrep</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obijoin/" class="">obijoin</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obimatrix/" class="">obimatrix</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obisplit/" class="">obisplit</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obisummary/" class="">obisummary</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiuniq/" class="">obiuniq</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-dbdf1bb5377572439394e60e08c30f50" class="toggle" />
<label for="section-dbdf1bb5377572439394e60e08c30f50" class="flex justify-between">
<a role="button" class="">Demultiplexing samples</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obimultiplex/" class="">obimultiplex</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obitagpcr/" class="">obitagpcr</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-aa98fedd067b51150db59691a8ea8edd" class="toggle" />
<label for="section-aa98fedd067b51150db59691a8ea8edd" class="flex justify-between">
<a role="button" class="">Sequence alignments</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiclean/" class="">obiclean</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-7433746525d8c2b29b033f765c869acd" class="toggle" />
<label for="section-7433746525d8c2b29b033f765c869acd" class="flex justify-between">
<a href="/obidoc/obitools/obipairing/" class="">obipairing</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/commands/alignments/obipairing/fasta-like/" class="">The FASTA-like alignment</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/commands/alignments/obipairing/exact-alignment/" class="">Exact alignment</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obipcr/" class="">obipcr</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obirefidx/" class="">obirefidx</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obitag/" class="">obitag</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-5746f699d10490780dec8e30ab2dd3ce" class="toggle" />
<label for="section-5746f699d10490780dec8e30ab2dd3ce" class="flex justify-between">
<a role="button" class="">Taxonomy</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obitaxonomy/" class="">obitaxonomy</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-3f50c4fe7ab436a56ae92897d5444956" class="toggle" />
<label for="section-3f50c4fe7ab436a56ae92897d5444956" class="flex justify-between">
<a role="button" class="">Advanced tools</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiscript/" class="">obiscript</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-549be3934679fcb82a232f6bd5435563" class="toggle" />
<label for="section-549be3934679fcb82a232f6bd5435563" class="flex justify-between">
<a role="button" class="">Others</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obimicrosat/" class="">obimicrosat</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-ceca4455173761e30cbc0a6dc2327167" class="toggle" />
<label for="section-ceca4455173761e30cbc0a6dc2327167" class="flex justify-between">
<a role="button" class="">Experimentals</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obicleandb/" class="">obicleandb</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiconsensus/" class="">obiconsensus</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obilandmark/" class="">obilandmark</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/docs/commands/tags/" class="">Glossary of tags</a>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-9b1bcd52530c59dc4819b1f61c128f54" class="toggle" checked />
<label for="section-9b1bcd52530c59dc4819b1f61c128f54" class="flex justify-between">
<a role="button" class="">Cookbook</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/cookbook/illumina/" class="">Analysing an Illumina data set</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/ecoprimers/" class="active">Designing new barcodes</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/local_genbank/" class="">Prepare a local copy of Genbank</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/reference_db/" class="">Build a reference database</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/minion/" class="">Oxford Nanopore data analysis</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<span>Programming OBITools</span>
<ul>
<li>
<a href="/obidoc/docs/programming/expression/" class="">Expression language</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-6d580829a667b5cca790b286d99a10fe" class="toggle" />
<label for="section-6d580829a667b5cca790b286d99a10fe" class="flex justify-between">
<a href="/obidoc/docs/programming/lua/" class="">Lua: for scripting OBITools</a>
</label>
<ul>
<li>
<input type="checkbox" id="section-2fb081dac812d624eea5f4268fca9e26" class="toggle" />
<label for="section-2fb081dac812d624eea5f4268fca9e26" class="flex justify-between">
<a role="button" class="">Obitools Classes</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/biosequence/" class="">BioSequence</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/biosequenceslice/" class="">BioSequenceSlice</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/taxonomy/" class="">Taxonomy</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/taxon/" class="">Taxon</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/mutex/" class="">Mutex</a>
<ul>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</nav>
<script>(function(){var e=document.querySelector("aside .book-menu-content");addEventListener("beforeunload",function(){localStorage.setItem("menu.scrollTop",e.scrollTop)}),e.scrollTop=localStorage.getItem("menu.scrollTop")})()</script>
</div>
</aside>
<div class="book-page">
<header class="book-header">
<div class="flex align-center justify-between">
<label for="menu-control">
<img src="/obidoc/svg/menu.svg" class="book-icon" alt="Menu" />
</label>
<h3>Designing new barcodes</h3>
<label for="toc-control">
<img src="/obidoc/svg/toc.svg" class="book-icon" alt="Table of Contents" />
</label>
</div>
<aside class="hidden clearfix">
<nav id="TableOfContents">
<ul>
<li><a href="#designing-new-barcodes-with-ecoprimers">Designing new barcodes with ecoPrimers</a>
<ul>
<li><a href="#installation-of-ecoprimers">Installation of <code>ecoPrimers</code></a></li>
<li><a href="#preparing-the-data">Preparing the data</a>
<ul>
<li><a href="#what-do-we-need-">What do we need ?</a></li>
</ul>
</li>
<li><a href="#preparing-the-set-of-complete-genomes">Preparing the set of complete genomes</a></li>
<li><a href="#preparing-a-database-for-new-barcode-inference">Preparing a database for new barcode inference</a>
<ul>
<li><a href="#searching-for-the-taxid-of-vertebrates">Searching for the taxid of vertebrates.</a></li>
<li><a href="#re-annotation-of-sequences-to-species-level-and-selection-of-genomes">Re-annotation of sequences to species level and selection of genomes</a></li>
<li><a href="#look-at-the-evenness-of-the-species-representation">Look at the evenness of the species representation</a></li>
<li><a href="#selection-of-vertebrate-genomes">Selection of <em>vertebrate</em> genomes</a></li>
<li><a href="#formatting-data-for-ecoprimers">Formatting data for <code>ecoPrimers</code></a></li>
<li><a href="#indexing-the-mitochondrial-learning-database">Indexing the mitochondrial learning database</a></li>
</ul>
</li>
<li><a href="#selecting-the-best-primer-pairs">Selecting the best primer pairs</a>
<ul>
<li><a href="#searching-the-teleostei-taxid">Searching the <em>Teleostei</em> <code>taxid</code></a></li>
<li><a href="#running-the-ecoprimers-program">Running the <code>ecoPrimers</code> program</a></li>
</ul>
</li>
<li><a href="#testing-the-new-primer-pair">Testing the new primer pair</a></li>
<li><a href="#references">References</a></li>
</ul>
</li>
</ul>
</nav>
</aside>
</header>
<article class="markdown book-article"><h1 id="designing-new-barcodes-with-ecoprimers">
Designing new barcodes with ecoPrimers
<a class="anchor" href="#designing-new-barcodes-with-ecoprimers">#</a>
</h1>
<p>
<a href="http://metabarcoding.org/ecoprimers"><code>ecoPrimers</code></a>
<span class="hugo-cite-intext"
itemprop="citation">(<span class="hugo-cite-group">
<a href="#riaz2011-gn"><span class="visually-hidden">Citation: </span><span itemprop="author" itemscope itemtype="https://schema.org/Person"><meta itemprop="givenName" content="Tiayyba"><span itemprop="familyName">Riaz</span></span>,&#32;<span itemprop="author" itemscope itemtype="https://schema.org/Person"><meta itemprop="givenName" content="Wasim"><span itemprop="familyName">Shehzad</span></span>
<em>&amp; al.</em>,&#32;<span itemprop="datePublished">2011</span></a><span class="hugo-cite-citation">
<span itemscope
itemtype="https://schema.org/Article"
data-type="article"><span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Riaz</span>,&#32;
<meta itemprop="givenName" content="Tiayyba" />
T.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Shehzad</span>,&#32;
<meta itemprop="givenName" content="Wasim" />
W.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Viari</span>,&#32;
<meta itemprop="givenName" content="Alain" />
A.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Pompanon</span>,&#32;
<meta itemprop="givenName" content="François" />
F.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Taberlet</span>,&#32;
<meta itemprop="givenName" content="Pierre" />
P.</span>&#32;&amp;&#32;<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Coissac</span>,&#32;
<meta itemprop="givenName" content="Eric" />
E.</span>
&#32;
(<span itemprop="datePublished">2011</span>).
&#32;<span itemprop="name">ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis</span>.<i>
<span itemprop="about">Nucleic acids research</span>,&#32;39(21)</i>.&#32;<span itemprop="pagination">e145</span>.
<a href="https://doi.org/10.1093/nar/gkr732"
itemprop="identifier"
itemtype="https://schema.org/URL">https://doi.org/10.1093/nar/gkr732</a></span>
</span></span>)</span>
is a tool for designing new DNA metabarcodes.
It is capable of working with a collection of mitochondrial genomes, chloroplast genomes or rRNA nuclear gene clusters. It is an alignment free method, which guarantees its efficiency.</p>
<p>The
<a href="http://metabarcoding.org/ecoprimers"><code>ecoPrimers</code></a> program was developed to be used in conjunction with the original <em>OBITools</em>. Therefore, using it with the new <em>OBITools4</em> requires some special care in data preparation.</p>
<p>In this recipe we will use
<a href="http://metabarcoding.org/ecoprimers"><code>ecoPrimers</code></a> to design a new bony fish DNA metabarcode.</p>
<h2 id="installation-of-ecoprimers">
Installation of
<a href="http://metabarcoding.org/ecoprimers"><code>ecoPrimers</code></a>
<a class="anchor" href="#installation-of-ecoprimers">#</a>
</h2>
<p>
<a href="http://metabarcoding.org/ecoprimers"><code>ecoPrimers</code></a> is available from the git reposiroty of
<a href="https://git.metabarcoding.org">metabarcoding</a> site at</p>
<ul>
<li>
<a href="https://git.metabarcoding.org/obitools/ecoprimers">https://git.metabarcoding.org/obitools/ecoprimers</a></li>
</ul>
<p>Installation can be done by cloning the project:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>git clone https://git.metabarcoding.org/obitools/ecoprimers.git
</span></span></code></pre></div><p>This will create a new <code>ecoprimers</code> directory with a <code>src</code> subdirectory containing the source code.
You will need to change your current working directory to this <code>ecoprimers/src</code> directory.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cd ecoprimers/src
</span></span></code></pre></div><p>It is now possible to compile the ecoPrimers program using the make command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>make
</span></span></code></pre></div><p>This command will produce a series of messages on your screen similar to the following.
You may get some extra warning messages, but no errors should be reported.
If compilation is successful, an <code>ecoPrimers</code> executable will be created in the current directory.</p>
<pre tabindex="0"><code>gcc -DMAC_OS_X -M -o ecoprimer.d ecoprimer.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecoprimer.o ecoprimer.c
/Library/Developer/CommandLineTools/usr/bin/make -C libecoPCR
gcc -DMAC_OS_X -M -o econame.d econame.c
gcc -DMAC_OS_X -M -o ecofilter.d ecofilter.c
gcc -DMAC_OS_X -M -o ecotax.d ecotax.c
gcc -DMAC_OS_X -M -o ecoseq.d ecoseq.c
gcc -DMAC_OS_X -M -o ecorank.d ecorank.c
gcc -DMAC_OS_X -M -o ecoMalloc.d ecoMalloc.c
gcc -DMAC_OS_X -M -o ecoIOUtils.d ecoIOUtils.c
gcc -DMAC_OS_X -M -o ecoError.d ecoError.c
gcc -DMAC_OS_X -M -o ecodna.d ecodna.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecodna.o ecodna.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecoError.o ecoError.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecoIOUtils.o ecoIOUtils.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecoMalloc.o ecoMalloc.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecorank.o ecorank.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecoseq.o ecoseq.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecotax.o ecotax.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ecofilter.o ecofilter.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o econame.o econame.c
ar -cr libecoPCR.a ecodna.o ecoError.o ecoIOUtils.o ecoMalloc.o ecorank.o ecoseq.o ecotax.o ecofilter.o econame.o
ranlib libecoPCR.a
/Library/Developer/CommandLineTools/usr/bin/make -C libecoprimer
gcc -DMAC_OS_X -M -o ahocorasick.d ahocorasick.c
gcc -DMAC_OS_X -M -o PrimerSets.d PrimerSets.c
gcc -DMAC_OS_X -M -o filtering.d filtering.c
gcc -DMAC_OS_X -M -o apat_search.d apat_search.c
gcc -DMAC_OS_X -M -o taxstats.d taxstats.c
gcc -DMAC_OS_X -M -o pairs.d pairs.c
gcc -DMAC_OS_X -M -o pairtree.d pairtree.c
gcc -DMAC_OS_X -M -o sortmatch.d sortmatch.c
gcc -DMAC_OS_X -M -o libstki.d libstki.c
gcc -DMAC_OS_X -M -o queue.d queue.c
gcc -DMAC_OS_X -M -o merge.d merge.c
gcc -DMAC_OS_X -M -o aproxpattern.d aproxpattern.c
gcc -DMAC_OS_X -M -o strictprimers.d strictprimers.c
gcc -DMAC_OS_X -M -o hashsequence.d hashsequence.c
gcc -DMAC_OS_X -M -o sortword.d sortword.c
gcc -DMAC_OS_X -M -o smothsort.d smothsort.c
gcc -DMAC_OS_X -M -o readdnadb.d readdnadb.c
gcc -DMAC_OS_X -M -o goodtaxon.d goodtaxon.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o goodtaxon.o goodtaxon.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o readdnadb.o readdnadb.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o smothsort.o smothsort.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o sortword.o sortword.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o hashsequence.o hashsequence.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o strictprimers.o strictprimers.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o aproxpattern.o aproxpattern.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o merge.o merge.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o queue.o queue.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o libstki.o libstki.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o sortmatch.o sortmatch.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o pairtree.o pairtree.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o pairs.o pairs.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o taxstats.o taxstats.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o apat_search.o apat_search.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o filtering.o filtering.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o PrimerSets.o PrimerSets.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o ahocorasick.o ahocorasick.c
ar -cr libecoprimer.a goodtaxon.o readdnadb.o smothsort.o sortword.o hashsequence.o strictprimers.o aproxpattern.o merge.o queue.o libstki.o sortmatch.o pairtree.o pairs.o taxstats.o apat_search.o filtering.o PrimerSets.o ahocorasick.o
ranlib libecoprimer.a
/Library/Developer/CommandLineTools/usr/bin/make -C libthermo
gcc -DMAC_OS_X -M -o thermostats.d thermostats.c
gcc -DMAC_OS_X -M -o nnparams.d nnparams.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o nnparams.o nnparams.c
gcc -DMAC_OS_X -W -Wall -m64 -g -c -o thermostats.o thermostats.c
ar -cr libthermo.a nnparams.o thermostats.o
ranlib libthermo.a
gcc -g -O5 -m64 -o ecoPrimers ecoprimer.o -LlibecoPCR -Llibecoprimer -Llibthermo -L/usr/local/lib -lecoprimer -lecoPCR -lthermo -lz -lm
</code></pre><p>You can now copy the <code>ecoPrimers</code> executable to a directory that is part of your <code>PATH</code> environment variable.
You can use the following command to list all these directories. For example, the result is:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#66d9ef">for</span> p in $path; <span style="color:#66d9ef">do</span> echo $p; <span style="color:#66d9ef">done</span> | sort -u
</span></span></code></pre></div><pre tabindex="0"><code>/Users/coissac/bin
/Users/coissac/go/bin
/bin
/opt/X11/bin
/sbin
/usr/bin
/usr/local/bin
/usr/local/go/bin
/usr/sbin
</code></pre><p>From this list you can choose the directory where you want to install the <code>ecoPrimers</code> executable.
Here we can choose the folder <code>/Users/coissac/bin</code> to store it, as it is in the path of the home directory, and therefore does not require root privileges to copy the <code>ecoPrimers</code> executable into. <code>/usr/local/bin</code> is also a good choice, as it is the default directory for installing non-standard software on a UNIX system. When software is installed in <code>/usr/local/bin</code>, it is available to all users of the system. However, copying the <code>ecoPrimers</code> executable to <code>/usr/local/bin</code> requires root privileges.</p>
<p>If we install the software without root privileges:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cp ecoPrimers /Users/coissac/bin
</span></span></code></pre></div><p>If we install the software for all users on the system, but with root privileges:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sudo cp ecoPrimers /usr/local/bin
</span></span></code></pre></div><h2 id="preparing-the-data">
Preparing the data
<a class="anchor" href="#preparing-the-data">#</a>
</h2>
<h3 id="what-do-we-need-">
What do we need ?
<a class="anchor" href="#what-do-we-need-">#</a>
</h3>
<p>To design a new animal DNA metabarcode, we have to download the following data from the NCBI website:</p>
<ul>
<li>The complete set of whole mitochondrial genomes</li>
<li>The NCBI taxonomy</li>
</ul>
<h4 id="downloading-the-mitochondrial-genomes">
Downloading the mitochondrial genomes
<a class="anchor" href="#downloading-the-mitochondrial-genomes">#</a>
</h4>
<p>The file containing the complete set of mitochondrial genomes can be downloaded using your favourite web browser from the
<a href="https://ftp.ncbi.nlm.nih.gov/genomes/refseq/mitochondrion">NCBI FTP website</a>.</p>
<p>
<img src="ncbi-ftp.png" alt="" /></p>
<p>You will need to download the GenBank flat file format of the data, with extension <code>gbff.gz</code>.
This is the only one that contains the link to the NCBI taxonomy for each sequence.</p>
<p>If you need to download the data on a UNIX computer, you may not have access to a web browser on that system.
In this case, use the <code>curl</code> command to download the file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>curl <span style="color:#e6db74">&#39;https://ftp.ncbi.nlm.nih.gov/genomes/refseq/mitochondrion/mitochondrion.1.genomic.gbff.gz&#39;</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> &gt; mito.all.gb.gz
</span></span></code></pre></div><p>Because the file is compressed, you must use the <code>zless</code> command instead of the classic <code>less</code> command to inspect the file without decompressing it first:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>zless mito.all.gb.gz
</span></span></code></pre></div><pre tabindex="0"><code>LOCUS NW_009243181 45189 bp DNA linear CON 06-OCT-2014
DEFINITION Fonticula alba strain ATCC 38817 mitochondrial scaffold
supercont2.211, whole genome shotgun sequence.
ACCESSION NW_009243181 NZ_AROH01000000
VERSION NW_009243181.1
DBLINK BioProject: PRJNA262900
Assembly: GCF_000388065.1
KEYWORDS WGS; RefSeq.
SOURCE mitochondrion Fonticula alba
ORGANISM Fonticula alba
Eukaryota; Rotosphaerida; Fonticulaceae; Fonticula.
REFERENCE 1 (bases 1 to 45189)
AUTHORS Russ,C., Cuomo,C., Burger,G., Gray,M.W., Holland,P.W.H., King,N.,
Lang,F.B.F., Roger,A.J., Ruiz-Trillo,I., Brown,M., Walker,B.,
Young,S., Zeng,Q., Gargeya,S., Fitzgerald,M., Haas,B.,
Abouelleil,A., Allen,A.W., Alvarado,L., Arachchi,H.M., Berlin,A.M.,
Chapman,S.B., Gainer-Dewar,J., Goldberg,J., Griggs,A., Gujja,S.,
Hansen,M., Howarth,C., Imamovic,A., Ireland,A., Larimer,J.,
McCowan,C., Murphy,C., Pearson,M., Poon,T.W., Priest,M.,
Roberts,A., Saif,S., Shea,T., Sisk,P., Sykes,S., Wortman,J.,
Nusbaum,C. and Birren,B.
CONSRTM The Broad Institute Genomics Platform
TITLE The Genome Sequence of Fonticula alba ATCC 38817
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 45189)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (06-OCT-2014) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 3 (bases 1 to 45189)
AUTHORS Russ,C., Cuomo,C., Burger,G., Gray,M.W., Holland,P.W.H., King,N.,
Lang,F.B.F., Roger,A.J., Ruiz-Trillo,I., Brown,M., Walker,B.,
Young,S., Zeng,Q., Gargeya,S., Fitzgerald,M., Haas,B.,
Abouelleil,A., Allen,A.W., Alvarado,L., Arachchi,H.M., Berlin,A.M.,
Chapman,S.B., Gainer-Dewar,J., Goldberg,J., Griggs,A., Gujja,S.,
Hansen,M., Howarth,C., Imamovic,A., Ireland,A., Larimer,J.,
McCowan,C., Murphy,C., Pearson,M., Poon,T.W., Priest,M.,
Roberts,A., Saif,S., Shea,T., Sisk,P., Sykes,S., Wortman,J.,
Nusbaum,C. and Birren,B.
CONSRTM The Broad Institute Genomics Platform
TITLE Direct Submission
JOURNAL Submitted (26-APR-2013) Broad Institute of MIT and Harvard, 7
Cambridge Center, Cambridge, MA 02142, USA
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence is identical to KB932304.
##Genome-Assembly-Data-START##
Assembly Method :: ALLPATHS v. R44024; Mito ALLPATHS v.
R43919
Assembly Name :: Font_alba_ATCC_38817_V2
Genome Coverage :: 317.0x; Mito 63.0x
Sequencing Technology :: Illumina
##Genome-Assembly-Data-END##
FEATURES Location/Qualifiers
source 1..45189
/organism=&#34;Fonticula alba&#34;
/organelle=&#34;mitochondrion&#34;
/mol_type=&#34;genomic DNA&#34;
/strain=&#34;ATCC 38817&#34;
/isolation_source=&#34;dog dung&#34;
/culture_collection=&#34;ATCC:38817&#34;
/db_xref=&#34;taxon:691883&#34;
/geo_loc_name=&#34;USA: Grainfield, Kansas&#34;
/collection_date=&#34;1960&#34;
</code></pre><p>At the end of the top of the file shown above, we can see the <code>/db_xref=&quot;taxon:691883&quot;</code> field, which provides the link to the NCBI taxonomy for this first entry in the file.</p>
<h4 id="download-the-full-taxonomy">
Download the full taxonomy
<a class="anchor" href="#download-the-full-taxonomy">#</a>
</h4>
<p>The NCBI taxonomy is available as a
<a href="https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz">tarball</a> file. It can be downloaded in the same way as the RefSeq mitochondrial database. You can also download the NCBI taxonomy using the <a href="http://metabar:8888/obidoc/obitools/obitaxonomy/">
<abbr title="obitaxonomy: manage and search in the taxonomic database"><code>obitaxonomy</code></abbr>
</a> command with the <code>--download-ncbi</code> option.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obitaxonomy --download-ncbi
</span></span></code></pre></div><pre tabindex="0"><code>INFO[0000] Number of workers set 16
INFO[0000] Downloading NCBI Taxdump to ncbitaxo_20250211.tgz
downloading 100% ████████████████████████████████████████| (66/66 MB, 5.1 MB/s)
</code></pre><p>By default, <a href="http://metabar:8888/obidoc/obitools/obitaxonomy/">
<abbr title="obitaxonomy: manage and search in the taxonomic database"><code>obitaxonomy</code></abbr>
</a> downloads the latest version of the NCBI taxonomy available from the NCBI FTP site and saves it to the current directory in a file named <code>ncbitaxo_YYYYMMDD.tgz</code> where <code>YYYY</code> is the year, <code>MM</code> is the month and <code>DD</code> is the day of the download. Here the date is 2025/02/11, so the filename is <code>ncbitaxo_20250211.tgz</code>.</p>
<p>You can also specify the filename of the downloaded file using the <code>--out filename</code> option. For example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obitaxonomy --download-ncbi --out ncbitaxo.tgz
</span></span></code></pre></div><h5 id="the-archive-contains-several-files">
The archive contains several files
<a class="anchor" href="#the-archive-contains-several-files">#</a>
</h5>
<p>The NCBI taxonomy dump file contains all the relationships between taxa.
This information is stored in two files: <code>nodes.dmp</code> and <code>names.dmp</code>.</p>
<ul>
<li>
<p>The <strong>nodes.dmp</strong> file:</p>
<p>It contains the taxonomic hierarchy of the NCBI taxonomy. It is a tabular file
where the columns are separated by a <code>|</code> character and some whitespace.</p>
<ul>
<li>The first column is the taxid of the taxon.</li>
<li>The second column is the parent taxid of the taxon.</li>
<li>The third column is the taxonomic rank of the taxon.</li>
</ul>
<p>The remaining columns are not used by the <em>OBITools</em>.</p>
</li>
</ul>
<pre tabindex="0"><code>1 | 1 | no rank | | 8 | 0 | ...
2 | 131567 | superkingdom | | 0 | 0 |
6 | 335928 | genus | | 0 | 1 |
7 | 6 | species | AC | 0 | 1 |
9 | 32199 | species | BA | 0 |
10 | 135621 | genus | | 0 |
11 | 1707 | species | CG | 0 | 1 |
13 | 203488 | genus | | 0 | 1 |
14 | 13 | species | DT | 0 | 1 |
</code></pre><ul>
<li>
<p>The <strong>names.dmp</strong> file:</p>
<p>It contains the scientific names, and a set of alternative names, for all the taxa. It is also a tabular file where the columns are separated by a <code>|</code> character and some whitespace.</p>
<ul>
<li>The first column is the taxid of the taxon.</li>
<li>The second column is the name of the taxon.</li>
<li>The third column is the class name of this name (<em>e.g</em> scientific name, or blast name&hellip;)</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code>1 | root | | scientific name |
2 | Bacteria | Bacteria &lt;prokaryote&gt; | scientific name |
2 | Monera | Monera &lt;Bacteria&gt; | in-part |
2 | Procaryotae | Procaryotae &lt;Bacteria&gt; | in-part |
2 | Prokaryota | Prokaryota &lt;Bacteria&gt; | in-part |
2 | Prokaryotae | Prokaryotae &lt;Bacteria&gt; | in-part |
2 | bacteria | bacteria &lt;blast2&gt; | blast name |
2 | eubacteria | | genbank common name |
2 | prokaryote | prokaryote &lt;Bacteria&gt; | in-part |
...
10 | Cellvibrio | | scientific name |
11 | [Cellvibrio] gilvus | | scientific name |
13 | Dictyoglomus | | scientific name |
14 | Dictyoglomus thermophilum | | scientific name |
</code></pre><p>A <code>readme.txt</code> file is present in the archive for more information about the NCBI taxonomy dump file.</p>
<h2 id="preparing-the-set-of-complete-genomes">
Preparing the set of complete genomes
<a class="anchor" href="#preparing-the-set-of-complete-genomes">#</a>
</h2>
<p>With <em>OBITools</em>, the favorite format for storing sequences is the
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
format.
Therefore, we will use the <a href="http://metabar:8888/obidoc/obitools/obiconvert/">
<abbr title="obiconvert: convert format of a sequence file"><code>obiconvert</code></abbr>
</a> tool to convert the GenBank files into
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
format.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obiconvert --skip-empty <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --update-taxid <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -t ncbitaxo_20250211.tgz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> mito.all.gb.gz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> &gt; mito.all.fasta
</span></span><span style="display:flex;"><span>head -5 mito.all.fasta
</span></span></code></pre></div><p>It is not equivalent downloading directly the fasta formatted file from the NCBI FTP site and downloading a GenBank file that will be converted in
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
format using <a href="http://metabar:8888/obidoc/obitools/obiconvert/">
<abbr title="obiconvert: convert format of a sequence file"><code>obiconvert</code></abbr>
</a>. By converting from GenBank format, the
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
formatted file will contain the taxid of the taxon.</p>
<p>Here are the first lines of the <code>mito.all.fasta</code> file:</p>
<pre tabindex="0"><code>&gt;NC_072933 {&#34;definition&#34;:&#34;Echinosophora koreensis mitochondrion, complete genome.&#34;,&#34;scientific_name&#34;:&#34;mitochondrion Echinosophora koreensis&#34;,&#34;taxid&#34;:228658}
ctttcgggtcggaaatagaagatctggattagatcccttctcgatagctttagtcagagc
tcatccctcgaaaaagggagtagtgagatgagaaaagggtgactagaatacggaaattca
actagtgaagtcagatccgggaattccactattgaagttatccgtcttaggcttcaagca
agctatctttcaaggaagtcagtctaagccctaagccaagatctgctttttgccagtcaa
</code></pre><h2 id="preparing-a-database-for-new-barcode-inference">
Preparing a database for new barcode inference
<a class="anchor" href="#preparing-a-database-for-new-barcode-inference">#</a>
</h2>
<p>Preparing a database for new barcode inference involves three steps:</p>
<ul>
<li>Annotate the sequences by their species <code>taxid</code>.</li>
<li>Make sure that no species is represented much more than the others.</li>
<li>Extract only vertebrate genomes.</li>
</ul>
<h3 id="searching-for-the-taxid-of-vertebrates">
Searching for the taxid of vertebrates.
<a class="anchor" href="#searching-for-the-taxid-of-vertebrates">#</a>
</h3>
<p>First we will search for the taxid of <em>Vertebrata</em>, as the taxid is the only way to pass taxonomic information to the <em>OBITools</em>. The <code>--fixed</code> option asks for exact matches of the name. The name search is not case-sensitive.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obitaxonomy -t ncbitaxo_20250211.tgz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --fixed <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> <span style="color:#e6db74">&#39;vertebrata&#39;</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-csv" data-lang="csv"><span style="display:flex;"><span><span style="color:#e6db74">taxid</span>,<span style="color:#e6db74">parent</span>,<span style="color:#e6db74">taxonomic_rank</span>,<span style="color:#e6db74">scientific_name</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">taxon:1261581 [Vertebrata]@genus</span>,<span style="color:#e6db74">taxon:2008651 [Polysiphonioideae]@subfamily</span>,<span style="color:#e6db74">genus</span>,<span style="color:#e6db74">Vertebrata</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">taxon:7742 [Vertebrata]@clade</span>,<span style="color:#e6db74">taxon:89593 [Craniata]@subphylum</span>,<span style="color:#e6db74">clade</span>,<span style="color:#e6db74">Vertebrata</span>
</span></span></code></pre></div><p>The <code>csvlook</code> command allows to have a pretty and more readable table:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obitaxonomy -t ncbitaxo_20250211.tgz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --fixed <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> <span style="color:#e6db74">&#39;vertebrata&#39;</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | csvlook
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-csv" data-lang="csv"><span style="display:flex;"><span><span style="color:#e6db74">| taxid | parent | taxonomic_rank | scientific_name |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| -------------------------------- | ------------------------------------------- | -------------- | --------------- |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:1261581 [Vertebrata]@genus | taxon:2008651 [Polysiphonioideae]@subfamily | genus | Vertebrata |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:7742 [Vertebrata]@clade | taxon:89593 [Craniata]@subphylum | clade | Vertebrata |</span>
</span></span></code></pre></div><p>Surprisingly, the Latin name <em>Vertebrata</em> is shared by two different taxa. The first is a genus and obviously not the one we are looking for. The second is a clade, and it is the one we are looking for.</p>
<h4 id="looking-for-the-vertebrata-genus-taxid">
Looking for the <em>Vertebrata</em> genus taxid
<a class="anchor" href="#looking-for-the-vertebrata-genus-taxid">#</a>
</h4>
<p>Just out of curiosity, we are going to search for the taxonomic path <em>Vertebrata</em> genus taxid.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obitaxonomy -t ncbitaxo_20250211.tgz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -p <span style="color:#ae81ff">2008651</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | csvlook
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-csv" data-lang="csv"><span style="display:flex;"><span><span style="color:#e6db74">| taxid | parent | taxonomic_rank | scientific_name |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| ------------------------------------------- | ------------------------------------------- | -------------- | ------------------ |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:2008651 [Polysiphonioideae]@subfamily | taxon:2803 [Rhodomelaceae]@family | subfamily | Polysiphonioideae |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:2803 [Rhodomelaceae]@family | taxon:2802 [Ceramiales]@order | family | Rhodomelaceae |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:2802 [Ceramiales]@order | taxon:2045261 [Rhodymeniophycidae]@subclass | order | Ceramiales |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:2045261 [Rhodymeniophycidae]@subclass | taxon:2806 [Florideophyceae]@class | subclass | Rhodymeniophycidae |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:2806 [Florideophyceae]@class | taxon:2763 [Rhodophyta]@phylum | class | Florideophyceae |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:2763 [Rhodophyta]@phylum | taxon:2759 [Eukaryota]@superkingdom | phylum | Rhodophyta |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:2759 [Eukaryota]@superkingdom | taxon:131567 [cellular organisms]@no rank | superkingdom | Eukaryota |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:131567 [cellular organisms]@no rank | taxon:1 [root]@no rank | no rank | cellular organisms |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:1 [root]@no rank | taxon:1 [root]@no rank | no rank | root |</span>
</span></span></code></pre></div><p>You can see that <em>Vertebrata</em> genus belongs to the <em>Rhodophyta</em> phylum, which corresponds to red algae.</p>
<h3 id="re-annotation-of-sequences-to-species-level-and-selection-of-genomes">
Re-annotation of sequences to species level and selection of genomes
<a class="anchor" href="#re-annotation-of-sequences-to-species-level-and-selection-of-genomes">#</a>
</h3>
<p>In order to know how species are represented in the database, and more specifically how many sequences represent each species, we will annotate the sequences with taxonomic information at the species level. We need to do this because some mitochondrial genomes can be annotated at other taxonomic levels, such as subspecies.</p>
<p><a href="http://metabar:8888/obidoc/obitools/obiannotate/">
<abbr title="obiannotate: edit sequence annotations"><code>obiannotate</code></abbr>
</a> can perform this task using the <code>--with-taxon-at-rank</code> option.
This option requires you to specify the taxonomic rank at which the annotation should be performed.
In this example case, we have to use the rank <code>species</code>.
The species taxid is stored in the <code>species_taxid</code> tag of the sequence.</p>
<p>In the following command we combine three <a href="http://metabar:8888/obidoc/obitools/obiannotate/">
<abbr title="obiannotate: edit sequence annotations"><code>obiannotate</code></abbr>
</a> commands with one <a href="http://metabar:8888/obidoc/obitools/obiuniq/">
<abbr title="obiuniq: dereplicate a sequence file"><code>obiuniq</code></abbr>
</a> command using the <code>|</code> pipe operator (see the <em>General operating principles</em> section):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obiannotate -t ncbitaxo_20250211.tgz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --with-taxon-at-rank species <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> mito.all.fasta | <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> obiannotate -S <span style="color:#e6db74">&#39;ori_taxid=annotations.taxid&#39;</span> | <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> obiannotate -S <span style="color:#e6db74">&#39;taxid=annotations.species_taxid&#39;</span> | <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> obiuniq -c taxid &gt; mito.one.fasta
</span></span></code></pre></div><p>Looking at the sequence of NC_050066, it is annotated with taxon 2756270, which corresponds to the subspecies <em>Monochamus alternatus alternatus</em>:</p>
<pre tabindex="0"><code>&gt;NC_050066 {&#34;definition&#34;:&#34;Monochamus alternatus alternatus mitochondrion, complete genome.&#34;,&#34;scientific_name&#34;:&#34;mitochondrion Monochamus alternatus alternatus&#34;,&#34;taxid&#34;:&#34;taxon:2756270 [Monochamus alternatus alternatus]@subspecies&#34;}
aatgaagtgcctgagcaaagggtaattttgatagaattagtaacgtgaattttcaccttc
attaattatatttaatagaattaaactatttccttagatatcaaaaatctttatacatca
...
</code></pre><p>The first <a href="http://metabar:8888/obidoc/obitools/obiannotate/">
<abbr title="obiannotate: edit sequence annotations"><code>obiannotate</code></abbr>
</a> command adds the <code>species_taxid</code> tag to the sequences.</p>
<pre tabindex="0"><code>&gt;NC_050066 {&#34;definition&#34;:&#34;Monochamus alternatus alternatus mitochondrion, complete genome.&#34;,&#34;scientific_name&#34;:&#34;mitochondrion Monochamus alternatus alternatus&#34;,&#34;species_name&#34;:&#34;Monochamus alternatus&#34;,&#34;species_taxid&#34;:&#34;taxon:192382 [Monochamus alternatus]@species&#34;,&#34;taxid&#34;:&#34;taxon:2756270 [Monochamus alternatus alternatus]@subspecies&#34;}
aatgaagtgcctgagcaaagggtaattttgatagaattagtaacgtgaattttcaccttc
attaattatatttaatagaattaaactatttccttagatatcaaaaatctttatacatca
...
</code></pre><p>The second <a href="http://metabar:8888/obidoc/obitools/obiannotate/">
<abbr title="obiannotate: edit sequence annotations"><code>obiannotate</code></abbr>
</a> copies the original <code>taxid</code> tag into a new tag named <code>ori_taxid</code> to preserve the original taxid for possible future use.</p>
<pre tabindex="0"><code>&gt;NC_050066 {&#34;definition&#34;:&#34;Monochamus alternatus alternatus mitochondrion, complete genome.&#34;,&#34;ori_taxid&#34;:&#34;taxon:2756270 [Monochamus alternatus alternatus]@subspecies&#34;,&#34;scientific_name&#34;:&#34;mitochondrion Monochamus alternatus alternatus&#34;,&#34;species_name&#34;:&#34;Monochamus alternatus&#34;,&#34;species_taxid&#34;:&#34;taxon:192382 [Monochamus alternatus]@species&#34;,&#34;taxid&#34;:&#34;taxon:2756270 [Monochamus alternatus alternatus]@subspecies&#34;}
aatgaagtgcctgagcaaagggtaattttgatagaattagtaacgtgaattttcaccttc
attaattatatttaatagaattaaactatttccttagatatcaaaaatctttatacatca
...
</code></pre><p>The third <a href="http://metabar:8888/obidoc/obitools/obiannotate/">
<abbr title="obiannotate: edit sequence annotations"><code>obiannotate</code></abbr>
</a> then copies the <code>species_taxid</code> tag into the main <code>taxid</code> tag.
From now on, the <em>OBITools</em> will use the species taxid stored in the <code>taxid</code> tag as the taxonomic annotation for the sequence.</p>
<pre tabindex="0"><code>&gt;NC_050066 {&#34;definition&#34;:&#34;Monochamus alternatus alternatus mitochondrion, complete genome.&#34;,&#34;ori_taxid&#34;:&#34;taxon:2756270 [Monochamus alternatus alternatus]@subspecies&#34;,&#34;scientific_name&#34;:&#34;mitochondrion Monochamus alternatus alternatus&#34;,&#34;species_name&#34;:&#34;Monochamus alternatus&#34;,&#34;species_taxid&#34;:&#34;taxon:192382 [Monochamus alternatus]@species&#34;,&#34;taxid&#34;:&#34;taxon:192382 [Monochamus alternatus]@species&#34;}
aatgaagtgcctgagcaaagggtaattttgatagaattagtaacgtgaattttcaccttc
attaattatatttaatagaattaaactatttccttagatatcaaaaatctttatacatca
...
</code></pre><p>Look carefully at this latest version of the sequence. The <code>taxid</code> tag has been updated to the species taxid, the <code>ori_taxid</code> tag contains the original taxid as provided by Genbank, and the <code>species_taxid</code> tag also contains the species taxid.</p>
<p>The last <a href="http://metabar:8888/obidoc/obitools/obiuniq/">
<abbr title="obiuniq: dereplicate a sequence file"><code>obiuniq</code></abbr>
</a> merges in a single sequence entry all the sequences strictly identical.
Here, the <code>-c taxid</code> option ensures that only sequences with the same taxid are merged. Therefore, two strictly identical sequences not annotated with the same taxid will be kept as two sequence entries.</p>
<h3 id="look-at-the-evenness-of-the-species-representation">
Look at the evenness of the species representation
<a class="anchor" href="#look-at-the-evenness-of-the-species-representation">#</a>
</h3>
<p>The goal here is to create a histogram representing the number of sequences per species, thanks to UNIX commands.
More specifically, how many species are represented by one, two, three or more sequences.</p>
<p>The last command to run is the following:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicsv -k taxid mito.one.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | tail -n +2 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sort <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | uniq -c <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sort -nk1 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | cut -w -f <span style="color:#ae81ff">2</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | uplot count
</span></span></code></pre></div><p>But first, try to understand what is going on.</p>
<p><a href="http://metabar:8888/obidoc/obitools/obicsv/">
<abbr title="obicsv: convert a sequence file to a CSV file"><code>obicsv</code></abbr>
</a> converts a sequence file into a CSV file. Here because of the <code>-k taxid</code> option, the CSV file will only contain the <code>taxid</code> tag for every sequence. The <code>head</code> command is used to display the top ten first lines of the result.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicsv -k taxid mito.one.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | head
</span></span></code></pre></div><pre tabindex="0"><code>taxid
taxon:2065826 [Sineleotris saccharae]@species
taxon:2219250 [Ocinara albicollis]@species
taxon:8306 [Ambystoma talpoideum]@species
taxon:80600 [Rhizopogon vinicolor]@species
taxon:270463 [Vanessa indica]@species
taxon:1028098 [Hierodula patellifera]@species
taxon:56258 [Sagittarius serpentarius]@species
taxon:457650 [Myadora brevis]@species
taxon:763200 [Arma chinensis]@species
</code></pre><p>The <code>tail</code> command is used to remove the header line from the CSV file, to keep only the data part of the file.
It is done by extracting the tail, the end of the file, from its second line (option <code>-n +2</code>).</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicsv -k taxid mito.one.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | tail -n +2 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | head
</span></span></code></pre></div><pre tabindex="0"><code>taxon:2065826 [Sineleotris saccharae]@species
taxon:2219250 [Ocinara albicollis]@species
taxon:8306 [Ambystoma talpoideum]@species
taxon:80600 [Rhizopogon vinicolor]@species
taxon:270463 [Vanessa indica]@species
taxon:1028098 [Hierodula patellifera]@species
taxon:56258 [Sagittarius serpentarius]@species
taxon:457650 [Myadora brevis]@species
taxon:763200 [Arma chinensis]@species
taxon:2060314 [Neotrygon indica]@species
</code></pre><p>As you can see, the first line of the output does not contain the <code>taxid</code> column name header present in the previous output.
In the next command, the <code>sort</code> command is used to sort the line to put identical <code>taxid</code> values in a row.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicsv -k taxid mito.one.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | tail -n +2 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sort <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | head
</span></span></code></pre></div><pre tabindex="0"><code>&#34;taxon:1030158 [Ficus variegata Roding, 1798]@species&#34;
&#34;taxon:244488 [Pillucina pisidium (Dunker, 1860)]@species&#34;
&#34;taxon:352057 [Anopheles albitarsis F Brochero et al., 2007]@species&#34;
&#34;taxon:646521 [Contracaecum rudolphii B Bullini et al., 1986]@species&#34;
&#34;taxon:908352 [Anopheles albitarsis G Krzywinski et al., 2011]@species&#34;
taxon:1000982 [Steindachneridion melanodermatum]@species
taxon:1001283 [Calameuta idolon]@species
taxon:1001291 [Trachelus tabidus]@species
taxon:1001332 [Phylloporia weberiana]@species
taxon:1001553 [Dephomys defua]@species
</code></pre><p>We can then add the <code>uniq -c</code> command to count the number of times each <code>taxid</code> appears in the file.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicsv -k taxid mito.one.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | tail -n +2 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sort <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | uniq -c <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | head
</span></span></code></pre></div><pre tabindex="0"><code> 1 &#34;taxon:1030158 [Ficus variegata Roding, 1798]@species&#34;
1 &#34;taxon:244488 [Pillucina pisidium (Dunker, 1860)]@species&#34;
1 &#34;taxon:352057 [Anopheles albitarsis F Brochero et al., 2007]@species&#34;
1 &#34;taxon:646521 [Contracaecum rudolphii B Bullini et al., 1986]@species&#34;
1 &#34;taxon:908352 [Anopheles albitarsis G Krzywinski et al., 2011]@species&#34;
1 taxon:1000982 [Steindachneridion melanodermatum]@species
1 taxon:1001283 [Calameuta idolon]@species
1 taxon:1001291 [Trachelus tabidus]@species
1 taxon:1001332 [Phylloporia weberiana]@species
1 taxon:1001553 [Dephomys defua]@species
</code></pre><p>The <code>uniq</code> command added the first column to the output, which is the number of times each <code>taxid</code> appears in the original file.</p>
<p>Next step is to remove the <code>taxid</code> column from the output and keep only the <code>count</code> first column.
Because the <code>uniq</code> command adds a space between before the count column, the cut command will consider it as the second column despite for us it looks like the first column.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicsv -k taxid mito.one.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | tail -n +2 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sort <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | uniq -c <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | cut -w -f <span style="color:#ae81ff">2</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | head
</span></span></code></pre></div><pre tabindex="0"><code>1
1
1
1
1
1
1
1
1
1
</code></pre><ul>
<li>The <code>-w</code> is used to specify that the column separator is the space character.</li>
<li>The <code>-f 2</code> is used to specify that the second column is the one to be cut.</li>
</ul>
<p>The last step is to send this output to the <code>uplot</code> command to plot the histogram.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicsv -k taxid mito.one.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | tail -n +2 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sort <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | uniq -c <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | sort -nk1 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | cut -w -f <span style="color:#ae81ff">2</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | uplot count
</span></span></code></pre></div><pre tabindex="0"><code> ┌ ┐
1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 17769.0
2 ┤ 90.0
3 ┤ 17.0
4 ┤ 5.0
5 ┤ 4.0
6 ┤ 2.0
7 ┤ 1.0
└ ┘
</code></pre><p>Very few taxa are represented by more than one mitochondrial genome, while 17769 species are represented by a single genome.
Here we can assume that the mitochondrial genomes are not too much biased in favour of a particular taxon.</p>
<h3 id="selection-of-vertebrate-genomes">
Selection of <em>vertebrate</em> genomes
<a class="anchor" href="#selection-of-vertebrate-genomes">#</a>
</h3>
<p>The mitochondrial database we have downloaded contains mitochondrial genomes from vertebrates, but also from invertebrates, fungi, plants&hellip; Since the <code>ecoPrimers</code> require that potentially all sequences provided in the learning database can contain the barcode we are looking for, we will restrict the learning database to contain only vertebrate genomes.</p>
<p><a href="http://metabar:8888/obidoc/obitools/obigrep/">
<abbr title="obigrep: filter a sequence file"><code>obigrep</code></abbr>
</a> command will do this for us. We just need to provide the <code>taxid</code> of the <code>vertebrata</code> taxon use as the <code>-r</code> option, and the taxonomy using the <code>-t</code> option.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obigrep -t ncbitaxo_20250211.tgz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -r <span style="color:#ae81ff">7742</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> mito.one.fasta &gt; mito.vert.fasta
</span></span></code></pre></div><p>Now we can count the number of sequences in the new learning database.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicount mito.vert.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | csvlook
</span></span></code></pre></div><pre tabindex="0"><code>| entities | n |
| -------- | ----------- |
| variants | 7,822 |
| reads | 7,823 |
| symbols | 131,378,756 |
</code></pre><h3 id="formatting-data-for-ecoprimers">
Formatting data for <code>ecoPrimers</code>
<a class="anchor" href="#formatting-data-for-ecoprimers">#</a>
</h3>
<p>As mentioned in the introduction, the <code>ecoPrimers</code> have been designed to work with the original version of OBITools.
We now need to perform three more steps to prepare the data for the <code>ecoPrimers</code>.</p>
<h4 id="unarchiving-the-taxonomy">
Unarchiving the taxonomy
<a class="anchor" href="#unarchiving-the-taxonomy">#</a>
</h4>
<p>The old OBITools cannot use archived and compressed taxonomies. So we need to</p>
<ul>
<li>Create a new directory to store the unarchived taxonomy using the <code>mkdir</code> command.</li>
<li>Change to the new directory using the `cd&rsquo; command.</li>
<li>Extract the taxonomy from the compressed file using the <code>tar</code> command.</li>
<li>Return to the original directory using the `cd&rsquo; command.</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>mkdir ncbitaxo_20250211
</span></span><span style="display:flex;"><span>cd ncbitaxo_20250211
</span></span><span style="display:flex;"><span>tar zxvf ../ncbitaxo_20250211.tgz
</span></span><span style="display:flex;"><span>cd ..
</span></span></code></pre></div><h4 id="converting-the-database-to-the-old-obitools-format">
Converting the database to the old obitools format
<a class="anchor" href="#converting-the-database-to-the-old-obitools-format">#</a>
</h4>
<p>Now <em>OBITools4</em> stores the annotations in JSON format.</p>
<pre tabindex="0"><code>&gt;NC_050066 {&#34;definition&#34;:&#34;Monochamus alternatus alternatus mitochondrion, complete genome. &#34;,&#34;ori_taxid&#34;:&#34;taxon:2756270 [Monochamus alternatus alternatus]@subspecies&#34;,&#34;scientific_name&#34;:&#34;mitochondrion Monochamus alternatus alternatus&#34;,&#34;species_name&#34;:&#34;Monochamus alternatus&#34;,&#34;species_taxid&#34;:&#34;taxon:192382 [Monochamus alternatus]@species&#34;,&#34;taxid&#34;:&#34;taxon:192382 [Monochamus alternatus]@species&#34;}
aatgaagtgcctgagcaaagggtaattttgatagaattagtaacgtgaattttcaccttc
attaattatatttaatagaattaaactatttccttagatatcaaaaatctttatacatca
...
</code></pre><p>The original OBITools stored the annotation in a <code>key=value;</code> format.</p>
<pre tabindex="0"><code>&gt;NC_050066 ori_taxid=taxon:2756270 [Monochamus alternatus alternatus]@subspecies; scientific_name=mitochondrion Monochamus alternatus alternatus; species_name=Monochamus alternatus; species_taxid=taxon:192382 [Monochamus alternatus]@species; taxid=taxon:192382 [Monochamus alternatus]@species; count=1; Monochamus alternatus alternatus mitochondrion, complete genome.
aatgaagtgcctgagcaaagggtaattttgatagaattagtaacgtgaattttcaccttc
attaattatatttaatagaattaaactatttccttagatatcaaaaatctttatacatca
...
</code></pre><p>When the <code>-O</code> option is added to a <em>OBITools4</em> command, the old OBITools format is used instead of the new JSON-based format.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obiconvert -O mito.vert.fasta &gt; mito.vert.old.fasta
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>head -5 mito.vert.old.fasta
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-csv" data-lang="csv"><span style="display:flex;"><span><span style="color:#e6db74">&gt;NC_071784 taxid=taxon:2065826 [Sineleotris saccharae]@species; count=1; ori_taxid=taxon:2065826 [Sineleotris saccharae]@species; scientific_name=mitochondrion Sineleotris saccharae; species_name=Sineleotris saccharae; species_taxid=taxon:2065826 [Sineleotris saccharae]@species; Sineleotris saccharae mitochondrion</span>,<span style="color:#e6db74"> complete genome.</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">gctagcgtagcttaaccaaagcataacactgaagatgttaagatgggccctagaaagccc</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">cgcaagcacaaaagcttggtcctggctttactatcagcttaggctaaacttacacatgca</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">agtatccgcatccccgtgagaatgcccttaagctcccaccgctaacaggagtcaaggagc</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">cggtatcaggcacaaccctgagttagcccacgacaccttgctcagccacacccccaaggg</span>
</span></span></code></pre></div><h3 id="indexing-the-mitochondrial-learning-database">
Indexing the mitochondrial learning database
<a class="anchor" href="#indexing-the-mitochondrial-learning-database">#</a>
</h3>
<p>The last step for preparing the data for the <code>ecoPrimers</code> is to index the learning database.
This job was done by the original OBITools, but the new <em>OBITools4</em> do not.</p>
<p>Using the <a href="ecoPCRFormat" download="ecoPCRFormat"><code>ecoPCRFormat</code></a> python script, you can do that indexing without the need of the original OBITools.</p>
<p>Once you have downloaded the <code>ecoPCRFormat</code> python script by clicking <a href="ecoPCRFormat" download="ecoPCRFormat"><code>here</code></a>, you have to make it executable and to copy it to the same directory as the <code>ecoPrimers</code> program.</p>
<p>Here, an example of how to do that:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>curl http://localhost:1313/obitools4-doc/docs/cookbook/ecoprimers/ecoPCRFormat &gt; ecoPCRFormat
</span></span><span style="display:flex;"><span>chmod +x ecoPCRFormat
</span></span><span style="display:flex;"><span>cp ecoPCRFormat /Users/coissac/bin
</span></span></code></pre></div><p>You can now run the <code>ecoPCRFormat</code> script to create the index files.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>ecoPCRFormat -t ncbitaxo_20250211 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -f <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -n vertebrata <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> mito.vert.old.fasta
</span></span></code></pre></div><ul>
<li>The <code>-t</code> option specifies the directory where the taxonomy database is located.</li>
<li>The <code>-f</code> option specifies that the input file is in fasta format.</li>
<li>The <code>-n</code> option specifies the name of the indexed learning database.</li>
<li>The last parameter <code>mito.vert.old.fasta</code> is the name of the input file containing the sequences to be indexed.</li>
</ul>
<p>This command creates the following index files:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>ls -l vertebrata*
</span></span></code></pre></div><pre tabindex="0"><code>-rw-r--r--@ 1 coissac staff 260899785 Feb 11 11:53 vertabrata.ndx
-rw-r--r--@ 1 coissac staff 546 Feb 11 11:53 vertabrata.rdx
-rw-r--r--@ 1 coissac staff 121379751 Feb 11 11:53 vertabrata.tdx
-rw-r--r--@ 1 coissac staff 40446318 Feb 11 11:54 vertabrata_001.sdx
</code></pre><h2 id="selecting-the-best-primer-pairs">
Selecting the best primer pairs
<a class="anchor" href="#selecting-the-best-primer-pairs">#</a>
</h2>
<h3 id="searching-the-teleostei-taxid">
Searching the <em>Teleostei</em> <code>taxid</code>
<a class="anchor" href="#searching-the-teleostei-taxid">#</a>
</h3>
<p>To design a new DNA metabarcode for bony fish, we have first to find the <em>Teleostei</em> taxid.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obitaxonomy -t ncbitaxo_20250211.tgz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --fixed <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> <span style="color:#e6db74">&#39;Teleostei&#39;</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> | csvlook
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-csv" data-lang="csv"><span style="display:flex;"><span><span style="color:#e6db74">| taxid | parent | taxonomic_rank | scientific_name |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| ---------------------------------- | ---------------------------------- | -------------- | --------------- |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| taxon:32443 [Teleostei]@infraclass | taxon:41665 [Neopterygii]@subclass | infraclass | Teleostei |</span>
</span></span></code></pre></div><h3 id="running-the-ecoprimers-program">
Running the <code>ecoPrimers</code> program
<a class="anchor" href="#running-the-ecoprimers-program">#</a>
</h3>
<p>The <code>ecoPrimers</code> command is responsible for looking for the priming sites. <code>ecoPrimers</code> is an alignment free software able to identify conserved regions among a large set of sequences.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>ecoPrimers -d vertebrata <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -e <span style="color:#ae81ff">3</span> -3 <span style="color:#ae81ff">2</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -l <span style="color:#ae81ff">30</span> -L <span style="color:#ae81ff">150</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -r <span style="color:#ae81ff">32443</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -c &gt; Teleostei.ecoprimers
</span></span></code></pre></div><ul>
<li>The <code>-d</code> option allows you to specify the learning database, here the vertebrate mitochondrial genome database indexed above.</li>
<li>The <code>-e</code> option specifies the maximum number of mismatches allowed between the primer and the priming site. The number of mismatches is per primer.</li>
<li>The <code>-3</code> option, used here with the <em>2</em> argument (<code>-3 2</code>), indicates that no mismatches are allowed on the last two nucleotides (3&rsquo; end) of the primer.</li>
<li>The <code>-l</code> option specifies the minimum length of the barcode (excluding primers) to search for.</li>
<li>The <code>-L</code> option specifies the maximum length of the barcode (excluding primers) to search for.</li>
<li>The <code>-r</code> indicates which taxon (here <em>Teleostei</em>) <code>ecoPrimers</code> will focus on.</li>
<li>The <code>-c</code> indicates that the learning database consists of circular genomes.</li>
</ul>
<p>After a few minutes of running and writing information about its progress to the terminal, <code>ecoPrimer</code> returns a here, indicating that it has identified :</p>
<ul>
<li>Total number of pairs : 9407</li>
<li>Total number of good pairs : 407</li>
</ul>
<p>We can now have a look at the beginning of the result file.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>head -35 Teleostei.ecoprimers
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-csv" data-lang="csv"><span style="display:flex;"><span><span style="color:#e6db74">#</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># ecoPrimer version 0.5</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># Rank level optimisation : species</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># max error count by oligonucleotide : 3</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">#</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># Restricted to taxon:</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># 32443 : Teleostei (infraclass)</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">#</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># strict primer quorum : 0.70</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># example quorum : 0.90</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># counterexample quorum : 0.10</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">#</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># database : vertebrata</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># Database is constituted of 3909 examples corresponding to 3876 species</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># and 0 counterexamples corresponding to 0 species</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">#</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># amplifiat length between [30</span>,<span style="color:#e6db74">150] bp</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># DB sequences are considered as circular</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"># Pairs having specificity less than 0.60 will be ignored</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">#</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 0 AGAGTGACGGGCGGTGTG CGTCAGGTCGAGGTGTAG 62.8 42.4 57.5 34.1 12 11 GG 3864 0 0.988 3832 0 0.989 2731 0.713 134 146 138.22</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 1 CGTCAGGTCGAGGTGTAG GAGTGACGGGCGGTGTGT 57.5 34.1 63.1 42.9 11 12 GG 3863 0 0.988 3831 0 0.988 2730 0.713 133 145 137.22</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 2 CGTCAGGTCGAGGTGTAG GGGAGAGTGACGGGCGGT 57.5 34.1 64.5 37.0 11 13 GG 3811 0 0.975 3779 0 0.975 2689 0.712 137 149 141.22</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 3 CGTCAGGTCGAGGTGTAG GGGGAGAGTGACGGGCGG 57.5 34.1 65.5 38.4 11 14 GG 3804 0 0.973 3772 0 0.973 2682 0.711 138 149 142.22</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 4 ACACCGCCCGTCACTCTC ACCTTCCGGTACACTTAC 62.5 36.8 54.0 16.6 12 9 GG 3850 0 0.985 3818 0 0.985 2658 0.696 46 132 66.51</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 5 AACGTCAGGTCGAGGTGT AGAGTGACGGGCGGTGTG 58.8 28.4 62.8 41.7 10 12 GG 3779 0 0.967 3746 0 0.966 2653 0.708 137 148 140.23</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 6 ACACCGCCCGTCACTCTC CACCTTCCGGTACACTTA 62.5 36.8 54.0 16.6 12 9 GG 3846 0 0.984 3814 0 0.984 2654 0.696 47 133 67.51</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 7 AACGTCAGGTCGAGGTGT GAGTGACGGGCGGTGTGT 58.8 28.4 63.1 42.1 10 12 GG 3778 0 0.966 3745 0 0.966 2652 0.708 136 147 139.23</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 8 ACCTTCCGGTACACTTAC CACACCGCCCGTCACTCT 54.0 16.6 62.8 37.3 9 12 GG 3845 0 0.984 3813 0 0.984 2653 0.696 47 133 67.51</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 9 ACACCGCCCGTCACTCTC TCCGGTACACTTACCATG 62.5 36.8 54.1 18.1 12 9 GG 3851 0 0.985 3819 0 0.985 2651 0.694 42 128 62.51</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 10 ACACCGCCCGTCACTCTC CCGGTACACTTACCATGT 62.5 36.8 54.4 18.6 12 9 GG 3851 0 0.985 3819 0 0.985 2651 0.694 41 127 61.51</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 11 ACACCGCCCGTCACTCTC CCAAGTGCACCTTCCGGT 62.5 36.8 60.7 28.9 12 11 GG 3837 0 0.982 3805 0 0.982 2650 0.696 54 140 74.51</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 12 ACACCGCCCGTCACTCTC GCACCTTCCGGTACACTT 62.5 36.8 57.7 22.5 12 10 GG 3842 0 0.983 3810 0 0.983 2650 0.696 48 134 68.51</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 13 ACACCGCCCGTCACTCTC CGGTACACTTACCATGTT 62.5 36.8 52.4 15.7 12 8 GG 3850 0 0.985 3818 0 0.985 2650 0.694 40 126 60.51</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74"> 14 ACACCGCCCGTCACTCTC CACTTACCATGTTACGAC 62.5 36.8 51.1 27.7 12 8 GG 3850 0 0.985 3817 0 0.985 2649 0.694 35 121 55.51</span>
</span></span></code></pre></div><p>The result file consists of two parts. The header, consisting of lines starting with the <code>#</code> character, contains all the parameters used by the <code>ecoPrimer</code> algorithms and some statistics about the database and the
current search.</p>
<p>The second part is a tabular text describing all potential primer pairs identified. Immediately below this is a detailed description of the information contained in each column.</p>
<p>Table result description :</p>
<blockquote>
<ul>
<li><strong>column 1</strong> : serial number</li>
<li><strong>column 2</strong> : primer1</li>
<li><strong>column 3</strong> : primer2</li>
<li><strong>column 4</strong> : primer1 Tm without mismatch</li>
<li><strong>column 5</strong> : primer1 lowest Tm against exemple sequences</li>
<li><strong>column 6</strong> : primer2 Tm without mismatch</li>
<li><strong>column 7</strong> : primer2 lowest Tm against exemple sequences</li>
<li><strong>column 8</strong> : primer1 G+C count</li>
<li><strong>column 9</strong> : primer2 G+C count</li>
<li><strong>column 10</strong> : good/bad</li>
<li><strong>column 11</strong> : amplified example sequence count</li>
<li><strong>column 12</strong> : amplified counterexample sequence count</li>
<li><strong>column 13</strong> : yule</li>
<li><strong>column 14</strong> : amplified example taxa count</li>
<li><strong>column 15</strong> : amplified counterexample taxa count</li>
<li><strong>column 16</strong> : ratio of amplified example taxa versus all example taxa (Bc index)</li>
<li><strong>column 17</strong> : unambiguously identified example taxa count</li>
<li><strong>column 18</strong> : ratio of specificity unambiguously identified example taxa versus all example taxa (Bs index)</li>
<li><strong>column 19</strong> : minimum amplified length</li>
<li><strong>column 20</strong> : maximum amplified length</li>
<li><strong>column 21</strong> : average amplified length</li>
</ul>
</blockquote>
<p>Suppose we decide to focus on the 11<sup>th</sup> pair because it seems to have relatively good properties and, in particular, a relatively balanced melting temperature between the two primers.</p>
<ul>
<li>Primer ID : 11
<table>
<thead>
<tr>
<th>Primer</th>
<th>sequence</th>
<th>tm max</th>
<th>tm min</th>
<th>GC count</th>
</tr>
</thead>
<tbody>
<tr>
<td>Forward</td>
<td>ACACCGCCCGTCACTCTC</td>
<td>62.5</td>
<td>36.8</td>
<td>12</td>
</tr>
<tr>
<td>Reverse</td>
<td>CCAAGTGCACCTTCCGGT</td>
<td>60.7</td>
<td>28.9</td>
<td>11</td>
</tr>
</tbody>
</table>
</li>
</ul>
<ul>
<li>amplifying 3837/3909 sequences</li>
<li>identify 2650/3876 Species</li>
<li>Size ranging from 54bp to 140bp (mean: 74.75 bp)</li>
</ul>
<h2 id="testing-the-new-primer-pair">
Testing the new primer pair
<a class="anchor" href="#testing-the-new-primer-pair">#</a>
</h2>
<p>To better characterise this pair, we can now use the <code>obipcr</code> tool to extract the barcode sequence corresponding to this pair from the learning database.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obipcr --forward ACACCGCCCGTCACTCTC <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --reverse CCAAGTGCACCTTCCGGT <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -e <span style="color:#ae81ff">5</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -l <span style="color:#ae81ff">30</span> -L <span style="color:#ae81ff">150</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -c <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> mito.vert.fasta <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> &gt; Teleostei_11.fasta
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>head Teleostei_11.fasta
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-csv" data-lang="csv"><span style="display:flex;"><span><span style="color:#e6db74">&gt;NC_022183_sub[925..998] {&#34;count&#34;:1</span>,<span style="color:#e6db74">&#34;definition&#34;</span><span style="color:#e6db74">:&#34;Acrossocheilus hemispinus mitochondrion</span>,<span style="color:#e6db74"> complete genome.&#34;</span>,<span style="color:#e6db74">&#34;direction&#34;</span><span style="color:#e6db74">:&#34;forward&#34;</span>,<span style="color:#e6db74">&#34;forward_error&#34;</span><span style="color:#e6db74">:1</span>,<span style="color:#e6db74">&#34;forward_match&#34;</span><span style="color:#e6db74">:&#34;acaccgcccgtcaccctc&#34;</span>,<span style="color:#e6db74">&#34;forward_primer&#34;</span><span style="color:#e6db74">:&#34;ACACCGCCCGTCACTCTC&#34;</span>,<span style="color:#e6db74">&#34;ori_taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:356810 [Acrossocheilus hemispinus]@species&#34;</span>,<span style="color:#e6db74">&#34;reverse_error&#34;</span><span style="color:#e6db74">:0</span>,<span style="color:#e6db74">&#34;reverse_match&#34;</span><span style="color:#e6db74">:&#34;ccaagtgcaccttccggt&#34;</span>,<span style="color:#e6db74">&#34;reverse_primer&#34;</span><span style="color:#e6db74">:&#34;CCAAGTGCACCTTCCGGT&#34;</span>,<span style="color:#e6db74">&#34;scientific_name&#34;</span><span style="color:#e6db74">:&#34;mitochondrion Acrossocheilus hemispinus&#34;</span>,<span style="color:#e6db74">&#34;species_name&#34;</span><span style="color:#e6db74">:&#34;Acrossocheilus hemispinus&#34;</span>,<span style="color:#e6db74">&#34;species_taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:356810 [Acrossocheilus hemispinus]@species&#34;</span>,<span style="color:#e6db74">&#34;taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:356810 [Acrossocheilus hemispinus]@species&#34;}</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">cccgtcaaaatacaccaaaaatacttaatacaataacactaacaaggggaggcaagtcgt</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">aacatggtaagtgt</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">&gt;NC_018560_sub[916..988] {&#34;count&#34;:1</span>,<span style="color:#e6db74">&#34;definition&#34;</span><span style="color:#e6db74">:&#34;Astatotilapia calliptera mitochondrion</span>,<span style="color:#e6db74"> complete genome.&#34;</span>,<span style="color:#e6db74">&#34;direction&#34;</span><span style="color:#e6db74">:&#34;forward&#34;</span>,<span style="color:#e6db74">&#34;forward_error&#34;</span><span style="color:#e6db74">:0</span>,<span style="color:#e6db74">&#34;forward_match&#34;</span><span style="color:#e6db74">:&#34;acaccgcccgtcactctc&#34;</span>,<span style="color:#e6db74">&#34;forward_primer&#34;</span><span style="color:#e6db74">:&#34;ACACCGCCCGTCACTCTC&#34;</span>,<span style="color:#e6db74">&#34;ori_taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:8154 [Astatotilapia calliptera]@species&#34;</span>,<span style="color:#e6db74">&#34;reverse_error&#34;</span><span style="color:#e6db74">:1</span>,<span style="color:#e6db74">&#34;reverse_match&#34;</span><span style="color:#e6db74">:&#34;ccaagtacaccttccggt&#34;</span>,<span style="color:#e6db74">&#34;reverse_primer&#34;</span><span style="color:#e6db74">:&#34;CCAAGTGCACCTTCCGGT&#34;</span>,<span style="color:#e6db74">&#34;scientific_name&#34;</span><span style="color:#e6db74">:&#34;mitochondrion Astatotilapia calliptera (eastern happy)&#34;</span>,<span style="color:#e6db74">&#34;species_name&#34;</span><span style="color:#e6db74">:&#34;Astatotilapia calliptera&#34;</span>,<span style="color:#e6db74">&#34;species_taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:8154 [Astatotilapia calliptera]@species&#34;</span>,<span style="color:#e6db74">&#34;taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:8154 [Astatotilapia calliptera]@species&#34;}</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">cccaagccaacaacatcctataaataatacattttaccggtaaaggggaggcaagtcgta</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">acatggtaagtgt</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">&gt;NC_056117_sub[923..997] {&#34;count&#34;:1</span>,<span style="color:#e6db74">&#34;definition&#34;</span><span style="color:#e6db74">:&#34;Pseudocrossocheilus tridentis mitochondrion</span>,<span style="color:#e6db74"> complete genome.&#34;</span>,<span style="color:#e6db74">&#34;direction&#34;</span><span style="color:#e6db74">:&#34;forward&#34;</span>,<span style="color:#e6db74">&#34;forward_error&#34;</span><span style="color:#e6db74">:0</span>,<span style="color:#e6db74">&#34;forward_match&#34;</span><span style="color:#e6db74">:&#34;acaccgcccgtcactctc&#34;</span>,<span style="color:#e6db74">&#34;forward_primer&#34;</span><span style="color:#e6db74">:&#34;ACACCGCCCGTCACTCTC&#34;</span>,<span style="color:#e6db74">&#34;ori_taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:887881 [Pseudocrossocheilus tridentis]@species&#34;</span>,<span style="color:#e6db74">&#34;reverse_error&#34;</span><span style="color:#e6db74">:0</span>,<span style="color:#e6db74">&#34;reverse_match&#34;</span><span style="color:#e6db74">:&#34;ccaagtgcaccttccggt&#34;</span>,<span style="color:#e6db74">&#34;reverse_primer&#34;</span><span style="color:#e6db74">:&#34;CCAAGTGCACCTTCCGGT&#34;</span>,<span style="color:#e6db74">&#34;scientific_name&#34;</span><span style="color:#e6db74">:&#34;mitochondrion Pseudocrossocheilus tridentis&#34;</span>,<span style="color:#e6db74">&#34;species_name&#34;</span><span style="color:#e6db74">:&#34;Pseudocrossocheilus tridentis&#34;</span>,<span style="color:#e6db74">&#34;species_taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:887881 [Pseudocrossocheilus tridentis]@species&#34;</span>,<span style="color:#e6db74">&#34;taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:887881 [Pseudocrossocheilus tridentis]@species&#34;}</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">ccctgtcaaaaagcatcaaatatatataataaattagcaatgacaaggggaggcaagtcg</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">taacacggtaagtgt</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">&gt;NC_045904_sub[919..997] {&#34;count&#34;:1</span>,<span style="color:#e6db74">&#34;definition&#34;</span><span style="color:#e6db74">:&#34;Eospalax fontanierii mitochondrion</span>,<span style="color:#e6db74"> complete genome.&#34;</span>,<span style="color:#e6db74">&#34;direction&#34;</span><span style="color:#e6db74">:&#34;forward&#34;</span>,<span style="color:#e6db74">&#34;forward_error&#34;</span><span style="color:#e6db74">:1</span>,<span style="color:#e6db74">&#34;forward_match&#34;</span><span style="color:#e6db74">:&#34;acaccgcccgtcgctctc&#34;</span>,<span style="color:#e6db74">&#34;forward_primer&#34;</span><span style="color:#e6db74">:&#34;ACACCGCCCGTCACTCTC&#34;</span>,<span style="color:#e6db74">&#34;ori_taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:146134 [Eospalax fontanierii]@species&#34;</span>,<span style="color:#e6db74">&#34;reverse_error&#34;</span><span style="color:#e6db74">:4</span>,<span style="color:#e6db74">&#34;reverse_match&#34;</span><span style="color:#e6db74">:&#34;ccaagcacactttccagt&#34;</span>,<span style="color:#e6db74">&#34;reverse_primer&#34;</span><span style="color:#e6db74">:&#34;CCAAGTGCACCTTCCGGT&#34;</span>,<span style="color:#e6db74">&#34;scientific_name&#34;</span><span style="color:#e6db74">:&#34;mitochondrion Eospalax fontanierii&#34;</span>,<span style="color:#e6db74">&#34;species_name&#34;</span><span style="color:#e6db74">:&#34;Eospalax fontanierii&#34;</span>,<span style="color:#e6db74">&#34;species_taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:146134 [Eospalax fontanierii]@species&#34;</span>,<span style="color:#e6db74">&#34;taxid&#34;</span><span style="color:#e6db74">:&#34;taxon:146134 [Eospalax fontanierii]@species&#34;}</span>
</span></span></code></pre></div><p>To be able to process the fasta file with R and produce some statistics describing the conservation of barcodes between taxa and the ability of the barcode to discriminate between taxa, we need to convert the fasta file to CSV format. This can be done with the command <a href="http://metabar:8888/obidoc/obitools/obicsv/">
<abbr title="obicsv: convert a sequence file to a CSV file"><code>obicsv</code></abbr>
</a>. The command, when run with the <code>--auto</code> option, will automatically identify all tags present in the annotations of the first few records and create a CSV file with the corresponding columns.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obicsv --auto -s -i Teleostei_11.fasta &gt; Teleostei_11.csv
</span></span></code></pre></div><p>It is now possible to view the first few lines of the generated CSV file using a combination of the <code>head</code> and <code>csvlook</code> commands.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>head Teleostei_11.csv | csvlook
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-csv" data-lang="csv"><span style="display:flex;"><span><span style="color:#e6db74">| id | count | direction | forward_error | forward_match | forward_primer | ori_taxid | reverse_error | reverse_match | reverse_primer | scientific_name | species_name | species_taxid | taxid | sequence |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| ------------------------- | ----- | --------- | ------------- | ------------------ | ------------------ | ---------------------------------------------------- | ------------- | ------------------ | ------------------ | ------------------------------------------------------ | ----------------------------- | ---------------------------------------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------- |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_022183_sub[925..998] | True | forward | True | acaccgcccgtcaccctc | ACACCGCCCGTCACTCTC | taxon:356810 [Acrossocheilus hemispinus]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Acrossocheilus hemispinus | Acrossocheilus hemispinus | taxon:356810 [Acrossocheilus hemispinus]@species | taxon:356810 [Acrossocheilus hemispinus]@species | cccgtcaaaatacaccaaaaatacttaatacaataacactaacaaggggaggcaagtcgtaacatggtaagtgt |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_018560_sub[916..988] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:8154 [Astatotilapia calliptera]@species | 1 | ccaagtacaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Astatotilapia calliptera (eastern happy) | Astatotilapia calliptera | taxon:8154 [Astatotilapia calliptera]@species | taxon:8154 [Astatotilapia calliptera]@species | cccaagccaacaacatcctataaataatacattttaccggtaaaggggaggcaagtcgtaacatggtaagtgt |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_056117_sub[923..997] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:887881 [Pseudocrossocheilus tridentis]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Pseudocrossocheilus tridentis | Pseudocrossocheilus tridentis | taxon:887881 [Pseudocrossocheilus tridentis]@species | taxon:887881 [Pseudocrossocheilus tridentis]@species | ccctgtcaaaaagcatcaaatatatataataaattagcaatgacaaggggaggcaagtcgtaacacggtaagtgt |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_045904_sub[919..997] | True | forward | True | acaccgcccgtcgctctc | ACACCGCCCGTCACTCTC | taxon:146134 [Eospalax fontanierii]@species | 4 | ccaagcacactttccagt | CCAAGTGCACCTTCCGGT | mitochondrion Eospalax fontanierii | Eospalax fontanierii | taxon:146134 [Eospalax fontanierii]@species | taxon:146134 [Eospalax fontanierii]@species | ctcaagtacataaacttggatatattcttaataacccaacaaaaatattagaggagataagtcgtaacaaggtaagcat |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_018546_sub[916..987] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:30732 [Oryzias melastigma]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Oryzias melastigma (Indian medaka) | Oryzias melastigma | taxon:30732 [Oryzias melastigma]@species | taxon:30732 [Oryzias melastigma]@species | cccgacccattttaaaaattaaataaaagatttcaggaactaaggggaggcaagtcgtaacatggtaagtgt |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_044151_sub[922..993] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:2597641 [Sicyopterus squamosissimus]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Sicyopterus squamosissimus (cling goby) | Sicyopterus squamosissimus | taxon:2597641 [Sicyopterus squamosissimus]@species | taxon:2597641 [Sicyopterus squamosissimus]@species | cccaaaacaaacacacacataaataagaaaaaatgaaaataaaggggaggcaagtcgtaacatggtaagtgt |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_044152_sub[922..994] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:2597642 [Sicyopterus stiphodonoides]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Sicyopterus stiphodonoides (cling goby) | Sicyopterus stiphodonoides | taxon:2597642 [Sicyopterus stiphodonoides]@species | taxon:2597642 [Sicyopterus stiphodonoides]@species | cccaaaacaaacacacacataaataagaaaaaantgaaaataaaggggaggcaagtcgtaacatggtaagtgt |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_026976_sub[1453..1531] | True | forward | True | acaccgcccgtcactccc | ACACCGCCCGTCACTCTC | taxon:9545 [Macaca nemestrina]@species | 1 | ccaagtgcaccttccagt | CCAAGTGCACCTTCCGGT | mitochondrion Macaca nemestrina (pig-tailed macaque) | Macaca nemestrina | taxon:9545 [Macaca nemestrina]@species | taxon:9545 [Macaca nemestrina]@species | ctcaaatatatttaaggaacatcttaactaaacgccctaatatttatatagaggggataagtcgtaacatggtaagtgt |</span>
</span></span><span style="display:flex;"><span><span style="color:#e6db74">| NC_031553_sub[921..995] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:643337 [Puntioplites proctozystron]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Puntioplites proctozystron | Puntioplites proctozystron | taxon:643337 [Puntioplites proctozystron]@species | taxon:643337 [Puntioplites proctozystron]@species | ccctgtcaaaacgcactaaaaatatctaatacaaaagcaccgacaaggggaggcaagtcgtaacacggtaagtgt |</span>
</span></span></code></pre></div><h2 id="references">
References
<a class="anchor" href="#references">#</a>
</h2>
<section class="hugo-cite-bibliography">
<dl>
<div id="riaz2011-gn">
<dt>
Riaz,&#32;
Shehzad,&#32;
Viari,&#32;
Pompanon,&#32;
Taberlet&#32;&amp;&#32;Coissac
(2011)</dt>
<dd>
<span itemscope
itemtype="https://schema.org/Article"
data-type="article"><span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Riaz</span>,&#32;
<meta itemprop="givenName" content="Tiayyba" />
T.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Shehzad</span>,&#32;
<meta itemprop="givenName" content="Wasim" />
W.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Viari</span>,&#32;
<meta itemprop="givenName" content="Alain" />
A.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Pompanon</span>,&#32;
<meta itemprop="givenName" content="François" />
F.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Taberlet</span>,&#32;
<meta itemprop="givenName" content="Pierre" />
P.</span>&#32;&amp;&#32;<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Coissac</span>,&#32;
<meta itemprop="givenName" content="Eric" />
E.</span>
&#32;
(<span itemprop="datePublished">2011</span>).
&#32;<span itemprop="name">ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis</span>.<i>
<span itemprop="about">Nucleic acids research</span>,&#32;39(21)</i>.&#32;<span itemprop="pagination">e145</span>.
<a href="https://doi.org/10.1093/nar/gkr732"
itemprop="identifier"
itemtype="https://schema.org/URL">https://doi.org/10.1093/nar/gkr732</a></span>
</dd>
</div>
</dl>
</section>
</article>
<footer class="book-footer">
<div class="flex flex-wrap justify-between">
</div>
<script>(function(){function e(e){const t=window.getSelection(),n=document.createRange();n.selectNodeContents(e),t.removeAllRanges(),t.addRange(n)}document.querySelectorAll("pre code").forEach(t=>{t.addEventListener("click",function(){if(window.getSelection().toString())return;e(t.parentElement),navigator.clipboard&&navigator.clipboard.writeText(t.parentElement.textContent)})})})()</script>
</footer>
<div class="book-comments">
</div>
<label for="menu-control" class="hidden book-menu-overlay"></label>
</div>
<aside class="book-toc">
<div class="book-toc-content">
<nav id="TableOfContents">
<ul>
<li><a href="#designing-new-barcodes-with-ecoprimers">Designing new barcodes with ecoPrimers</a>
<ul>
<li><a href="#installation-of-ecoprimers">Installation of <code>ecoPrimers</code></a></li>
<li><a href="#preparing-the-data">Preparing the data</a>
<ul>
<li><a href="#what-do-we-need-">What do we need ?</a></li>
</ul>
</li>
<li><a href="#preparing-the-set-of-complete-genomes">Preparing the set of complete genomes</a></li>
<li><a href="#preparing-a-database-for-new-barcode-inference">Preparing a database for new barcode inference</a>
<ul>
<li><a href="#searching-for-the-taxid-of-vertebrates">Searching for the taxid of vertebrates.</a></li>
<li><a href="#re-annotation-of-sequences-to-species-level-and-selection-of-genomes">Re-annotation of sequences to species level and selection of genomes</a></li>
<li><a href="#look-at-the-evenness-of-the-species-representation">Look at the evenness of the species representation</a></li>
<li><a href="#selection-of-vertebrate-genomes">Selection of <em>vertebrate</em> genomes</a></li>
<li><a href="#formatting-data-for-ecoprimers">Formatting data for <code>ecoPrimers</code></a></li>
<li><a href="#indexing-the-mitochondrial-learning-database">Indexing the mitochondrial learning database</a></li>
</ul>
</li>
<li><a href="#selecting-the-best-primer-pairs">Selecting the best primer pairs</a>
<ul>
<li><a href="#searching-the-teleostei-taxid">Searching the <em>Teleostei</em> <code>taxid</code></a></li>
<li><a href="#running-the-ecoprimers-program">Running the <code>ecoPrimers</code> program</a></li>
</ul>
</li>
<li><a href="#testing-the-new-primer-pair">Testing the new primer pair</a></li>
<li><a href="#references">References</a></li>
</ul>
</li>
</ul>
</nav>
</div>
</aside>
</main>
</body>
</html>