Files
OBIJupyterHub/jupyterhub_volumes/web/obidoc/docs/cookbook/reference_db/index.html
Eric Coissac 30b7175702 Make cleaning
2025-11-17 14:18:13 +01:00

1703 lines
32 KiB
HTML

<!DOCTYPE html>
<html lang="en-us" dir="ltr">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="
Build a reference database
#
One of the crucial steps in the analysis of environmental DNA data is the taxonomic assignment of sequences,
i.e. assigning a species, genus or other taxonomic rank to the sequences present in the collected samples.
Taxonomic assignment requires annotated reference sequences, against which the sequences of interest are compared.
These reference sequences form what is known as a reference database, which is a sequence file in
fasta
format,
for a given marker of metabarcoding.">
<meta name="theme-color" media="(prefers-color-scheme: light)" content="#ffffff">
<meta name="theme-color" media="(prefers-color-scheme: dark)" content="#343a40">
<meta name="color-scheme" content="light dark"><meta property="og:url" content="http://metabar:8888/obidoc/docs/cookbook/reference_db/">
<meta property="og:site_name" content="OBITools4 documentation">
<meta property="og:title" content="Build a reference database">
<meta property="og:description" content="Build a reference database # One of the crucial steps in the analysis of environmental DNA data is the taxonomic assignment of sequences, i.e. assigning a species, genus or other taxonomic rank to the sequences present in the collected samples.
Taxonomic assignment requires annotated reference sequences, against which the sequences of interest are compared. These reference sequences form what is known as a reference database, which is a sequence file in fasta format, for a given marker of metabarcoding.">
<meta property="og:locale" content="en_us">
<meta property="og:type" content="website">
<title>Build a reference database | OBITools4 documentation</title>
<link rel="icon" href="/obidoc/favicon.png" >
<link rel="manifest" href="/obidoc/manifest.json">
<link rel="canonical" href="http://metabar:8888/obidoc/docs/cookbook/reference_db/">
<link rel="stylesheet" href="/obidoc/book.min.5fd7b8e2d1c0ae15da279c52ff32731130386f71b58f011468f20d0056fe6b78.css" integrity="sha256-X9e44tHArhXaJ5xS/zJzETA4b3G1jwEUaPINAFb&#43;a3g=" crossorigin="anonymous">
<script defer src="/obidoc/fuse.min.js"></script>
<script defer src="/obidoc/en.search.min.4da51bdd2d833922fdbc0e19df517221387fc625ffb68ee140d605b3c5b68058.js" integrity="sha256-TaUb3S2DOSL9vA4Z31FyITh/xiX/to7hQNYFs8W2gFg=" crossorigin="anonymous"></script>
<script defer src="/obidoc/sw.min.32af8eafce4180aa1c5dea66d99fb26ba9043ea7c7a4c706138c91d9051b285e.js" integrity="sha256-Mq&#43;Or85BgKocXepm2Z&#43;ya6kEPqfHpMcGE4yR2QUbKF4=" crossorigin="anonymous"></script>
<link rel="alternate" type="application/rss+xml" href="http://metabar:8888/obidoc/docs/cookbook/reference_db/index.xml" title="OBITools4 documentation" />
<!--
Made with Book Theme
https://github.com/alex-shpak/hugo-book
-->
<link rel="stylesheet" type="text/css" href="http://metabar:8888/obidoc/hugo-cite.css" />
</head>
<body dir="ltr">
<input type="checkbox" class="hidden toggle" id="menu-control" />
<input type="checkbox" class="hidden toggle" id="toc-control" />
<main class="container flex">
<aside class="book-menu">
<div class="book-menu-content">
<nav>
<h2 class="book-brand">
<a class="flex align-center" href="/obidoc/"><img src="/obidoc/obitools_logo.jpg" alt="Logo" class="book-icon" /><span>OBITools4 documentation</span>
</a>
</h2>
<div class="book-search hidden">
<input type="text" id="book-search-input" placeholder="Search" aria-label="Search" maxlength="64" data-hotkeys="s/" />
<div class="book-search-spinner hidden"></div>
<ul id="book-search-results"></ul>
</div>
<script>document.querySelector(".book-search").classList.remove("hidden")</script>
<ul>
<li>
<span>Docs</span>
<ul>
<li>
<a href="/obidoc/docs/about/" class="">About</a>
</li>
<li>
<a href="/obidoc/docs/installation/" class="">Installation</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/principles/" class="">General operating principles</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-08756b4c1f14be6ee584ece005b9f621" class="toggle" />
<label for="section-08756b4c1f14be6ee584ece005b9f621" class="flex justify-between">
<a role="button" class="">File formats</a>
</label>
<ul>
<li>
<input type="checkbox" id="section-933c2e64b905b84e22aa5273cea2d0bd" class="toggle" />
<label for="section-933c2e64b905b84e22aa5273cea2d0bd" class="flex justify-between">
<a role="button" class="">Sequence file formats</a>
</label>
<ul>
<li>
<a href="/obidoc/formats/fasta/" class="">FASTA file format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/fastq/" class="">FASTQ file format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/genbank/" class="">GenBank Flat File format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/embl/" class="">EMBL Flat File format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/sequence_files/csv/" class="">CSV format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/json/" class="">JSON format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/sequence_files/annotations/" class="">Annotation of sequences</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-0258ae1c222f9a38cc1b75254c93b0f4" class="toggle" />
<label for="section-0258ae1c222f9a38cc1b75254c93b0f4" class="flex justify-between">
<a role="button" class="">Taxonomy file formats</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/file_format/taxonomy_file/csv_taxdump/" class="">CSV formatted taxdump</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/taxonomy_file/ncbi_taxdump/" class="">NCBI taxdump</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/formats/csv/" class="">The CSV format</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-70b1e6e5ec7f3ccab643155fa50659b6" class="toggle" />
<label for="section-70b1e6e5ec7f3ccab643155fa50659b6" class="flex justify-between">
<a role="button" class="">Patterns</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/patterns/regular/" class="">Regular Expressions</a>
</li>
<li>
<a href="/obidoc/docs/patterns/dnagrep/" class="">DNA Patterns</a>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-8223f464911a1fe6c655972143684e93" class="toggle" />
<label for="section-8223f464911a1fe6c655972143684e93" class="flex justify-between">
<a role="button" class="">The OBITools4 commands</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/commands/options/" class="">Shared command options</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-8921ea65523c266b128dd4263232b0fc" class="toggle" />
<label for="section-8921ea65523c266b128dd4263232b0fc" class="flex justify-between">
<a role="button" class="">Basics</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiannotate/" class="">obiannotate</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicomplement/" class="">obicomplement</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiconvert/" class="">obiconvert</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicount/" class="">obicount</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicsv/" class="">obicsv</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obidemerge/" class="">obidemerge</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obidistribute/" class="">obidistribute</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obigrep/" class="">obigrep</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obijoin/" class="">obijoin</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obimatrix/" class="">obimatrix</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obisplit/" class="">obisplit</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obisummary/" class="">obisummary</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiuniq/" class="">obiuniq</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-dbdf1bb5377572439394e60e08c30f50" class="toggle" />
<label for="section-dbdf1bb5377572439394e60e08c30f50" class="flex justify-between">
<a role="button" class="">Demultiplexing samples</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obimultiplex/" class="">obimultiplex</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obitagpcr/" class="">obitagpcr</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-aa98fedd067b51150db59691a8ea8edd" class="toggle" />
<label for="section-aa98fedd067b51150db59691a8ea8edd" class="flex justify-between">
<a role="button" class="">Sequence alignments</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiclean/" class="">obiclean</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-7433746525d8c2b29b033f765c869acd" class="toggle" />
<label for="section-7433746525d8c2b29b033f765c869acd" class="flex justify-between">
<a href="/obidoc/obitools/obipairing/" class="">obipairing</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/commands/alignments/obipairing/fasta-like/" class="">The FASTA-like alignment</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/commands/alignments/obipairing/exact-alignment/" class="">Exact alignment</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obipcr/" class="">obipcr</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obirefidx/" class="">obirefidx</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obitag/" class="">obitag</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-5746f699d10490780dec8e30ab2dd3ce" class="toggle" />
<label for="section-5746f699d10490780dec8e30ab2dd3ce" class="flex justify-between">
<a role="button" class="">Taxonomy</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obitaxonomy/" class="">obitaxonomy</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-3f50c4fe7ab436a56ae92897d5444956" class="toggle" />
<label for="section-3f50c4fe7ab436a56ae92897d5444956" class="flex justify-between">
<a role="button" class="">Advanced tools</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiscript/" class="">obiscript</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-549be3934679fcb82a232f6bd5435563" class="toggle" />
<label for="section-549be3934679fcb82a232f6bd5435563" class="flex justify-between">
<a role="button" class="">Others</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obimicrosat/" class="">obimicrosat</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-ceca4455173761e30cbc0a6dc2327167" class="toggle" />
<label for="section-ceca4455173761e30cbc0a6dc2327167" class="flex justify-between">
<a role="button" class="">Experimentals</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obicleandb/" class="">obicleandb</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiconsensus/" class="">obiconsensus</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obilandmark/" class="">obilandmark</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/docs/commands/tags/" class="">Glossary of tags</a>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-9b1bcd52530c59dc4819b1f61c128f54" class="toggle" checked />
<label for="section-9b1bcd52530c59dc4819b1f61c128f54" class="flex justify-between">
<a role="button" class="">Cookbook</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/cookbook/illumina/" class="">Analysing an Illumina data set</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/ecoprimers/" class="">Designing new barcodes</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/local_genbank/" class="">Prepare a local copy of Genbank</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/reference_db/" class="active">Build a reference database</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/minion/" class="">Oxford Nanopore data analysis</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<span>Programming OBITools</span>
<ul>
<li>
<a href="/obidoc/docs/programming/expression/" class="">Expression language</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-6d580829a667b5cca790b286d99a10fe" class="toggle" />
<label for="section-6d580829a667b5cca790b286d99a10fe" class="flex justify-between">
<a href="/obidoc/docs/programming/lua/" class="">Lua: for scripting OBITools</a>
</label>
<ul>
<li>
<input type="checkbox" id="section-2fb081dac812d624eea5f4268fca9e26" class="toggle" />
<label for="section-2fb081dac812d624eea5f4268fca9e26" class="flex justify-between">
<a role="button" class="">Obitools Classes</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/biosequence/" class="">BioSequence</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/biosequenceslice/" class="">BioSequenceSlice</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/taxonomy/" class="">Taxonomy</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/taxon/" class="">Taxon</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/mutex/" class="">Mutex</a>
<ul>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</nav>
<script>(function(){var e=document.querySelector("aside .book-menu-content");addEventListener("beforeunload",function(){localStorage.setItem("menu.scrollTop",e.scrollTop)}),e.scrollTop=localStorage.getItem("menu.scrollTop")})()</script>
</div>
</aside>
<div class="book-page">
<header class="book-header">
<div class="flex align-center justify-between">
<label for="menu-control">
<img src="/obidoc/svg/menu.svg" class="book-icon" alt="Menu" />
</label>
<h3>Build a reference database</h3>
<label for="toc-control">
<img src="/obidoc/svg/toc.svg" class="book-icon" alt="Table of Contents" />
</label>
</div>
<aside class="hidden clearfix">
<nav id="TableOfContents">
<ul>
<li><a href="#build-a-reference-database">Build a reference database</a>
<ul>
<li><a href="#download-the-sequences">Download the sequences</a></li>
<li><a href="#perform-a-in-silico-pcr-amplification">Perform a <em>in silico</em> PCR amplification</a></li>
<li><a href="#clean-the-database">Clean the database</a>
<ul>
<li></li>
</ul>
</li>
</ul>
</li>
</ul>
</nav>
</aside>
</header>
<article class="markdown book-article"><h1 id="build-a-reference-database">
Build a reference database
<a class="anchor" href="#build-a-reference-database">#</a>
</h1>
<p>One of the crucial steps in the analysis of environmental DNA data is the taxonomic assignment of sequences,
<em>i.e.</em> assigning a species, genus or other taxonomic rank to the sequences present in the collected samples.</p>
<p>Taxonomic assignment requires annotated reference sequences, against which the sequences of interest are compared.
These reference sequences form what is known as a <em>reference database</em>, which is a sequence file in
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
format,
for a given marker of metabarcoding.</p>
<p>Here is a quick step-by-step guide to creating a reference database, here for assigning sequences from wolf fecal
samples to study its diet, a dataset used in the
<a href="https://obitools4.metabarcoding.org/docs/cookbook/wolf-tutorial/">metabarcoding analysis tutorial here</a>.</p>
<p>One way to build a reference database is to use the <a href="http://metabar:8888/obidoc/obitools/obipcr/">
<abbr title="obipcr: the electronic PCR tool"><code>obipcr</code></abbr>
</a> program to simulate a PCR and extract all sequences
from a general purpose DNA database such as
<a href="https://www.ncbi.nlm.nih.gov/nucleotide/">GenBank</a> or
<a href="https://www.ebi.ac.uk/ena/browser/home">EMBL</a>
that can be amplified <em>in silico</em> by the two primers used for PCR amplification.</p>
<p>The steps to create a reference database are:</p>
<ol>
<li>Download sequences from a public database such as GenBank or EMBL</li>
<li>Perform an <em>in silico</em> PCR amplification of these sequences with a given marker with <a href="http://metabar:8888/obidoc/obitools/obipcr/">
<abbr title="obipcr: the electronic PCR tool"><code>obipcr</code></abbr>
</a></li>
<li>Clean up the database by deleting sequences that do not provide sufficient taxonomic information and are redundant</li>
</ol>
<p>Since Genbank and the taxonomy associated with sequences are constantly evolving, you may not get exactly the same results when using the following commands.</p>
<h2 id="download-the-sequences">
Download the sequences
<a class="anchor" href="#download-the-sequences">#</a>
</h2>
<p>In this example, the sequences are downloaded from the
<a href="https://ftp.ncbi.nlm.nih.gov/genbank/">GenBank FTP server</a>.
Please note that the download takes more than a day and currently occupies around 1.5 TB,
so make sure you have the necessary storage capacity before launching it.
To have a local copy of GenBank sequences, please go to the
<a href="https://obitools4.metabarcoding.org/docs/cookbook/local_genbank/">Prepare a local copy of GenBank</a> page.</p>
<h2 id="perform-a-in-silico-pcr-amplification">
Perform a <em>in silico</em> PCR amplification
<a class="anchor" href="#perform-a-in-silico-pcr-amplification">#</a>
</h2>
<p>In this example, we amplify the <em>12S-V5</em> region [@Riaz2011-gn] with the forward primer <strong>TTAGATACCCCACTATGC</strong>
and the reverse primer <strong>TAGAACAGGCTCCTCTAG</strong>, with the following command, to study the wolf diet
(see the
<a href="https://obitools4.metabarcoding.org/docs/cookbook/wolf-tutorial/">tutorial</a>).
Do not forget to update the release number of GenBank in the command line.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obipcr -e <span style="color:#ae81ff">3</span> -l <span style="color:#ae81ff">50</span> -L <span style="color:#ae81ff">150</span> <span style="color:#ae81ff">\ </span>
</span></span><span style="display:flex;"><span> --forward TTAGATACCCCACTATGC <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --reverse TAGAACAGGCTCCTCTAG <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --no-order <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> genbank/Release_264/fasta/*
</span></span><span style="display:flex;"><span> &gt; v05_pcr.fasta
</span></span></code></pre></div><p>The <code>-l</code> and <code>-L</code> options define the minimum and maximum sizes of sequence fragments to be amplified.
Three mismatches with primer sequences are allowed here (-e 3), and we recommend using the <code>--no-order</code> option
to speed up the program (see <a href="http://metabar:8888/obidoc/obitools/obipcr/">
<abbr title="obipcr: the electronic PCR tool"><code>obipcr</code></abbr>
</a> documentation).</p>
<p>This previous command produces a
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
file, with the computed amplified sequences.</p>
<h2 id="clean-the-database">
Clean the database
<a class="anchor" href="#clean-the-database">#</a>
</h2>
<p>We choose to apply these different steps of filtering to clean up the sequences obtained with <a href="http://metabar:8888/obidoc/obitools/obipcr/">
<abbr title="obipcr: the electronic PCR tool"><code>obipcr</code></abbr>
</a>:</p>
<ol>
<li>Keep the sequences with a taxid and a taxonomic description to family, genus and species ranks (<a href="http://metabar:8888/obidoc/obitools/obigrep/">
<abbr title="obigrep: filter a sequence file"><code>obigrep</code></abbr>
</a>)</li>
<li>Remove redundant sequences (dereplicate)</li>
<li>Ensure that the dereplicated sequences have a taxid (taxon identifier) at the family level</li>
<li>Ensure that sequences each have a unique identification ID with <a href="http://metabar:8888/obidoc/obitools/obiannotate/">
<abbr title="obiannotate: edit sequence annotations"><code>obiannotate</code></abbr>
</a></li>
<li>Index the database</li>
</ol>
<h4 id="keep-annotated-sequences">
Keep annotated sequences
<a class="anchor" href="#keep-annotated-sequences">#</a>
</h4>
<p>To use the <code>-t</code> taxonomy option on all <em>OBITools</em> commands,
you can either enter the path to the taxonomy if you have downloaded
the sequences from the help page
<a href="https://obitools4.metabarcoding.org/docs/cookbook/local_genbank/">here</a>
which looks like <code>Release_264/taxonomy</code>, or download the taxdump file online with <code>curl</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>curl http://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
</span></span></code></pre></div><p>The <a href="http://metabar:8888/obidoc/obitools/obigrep/">
<abbr title="obigrep: filter a sequence file"><code>obigrep</code></abbr>
</a> program allows to filter sequences, to keep only those with a taxid and a sufficient taxonomic description.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obigrep -t taxdump.tar.gz <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -A taxid <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --require-rank species <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --require-rank genus <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --require-rank family <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> v05_pcr.fasta &gt; v05_clean.fasta
</span></span></code></pre></div><h4 id="dereplicate-sequences">
Dereplicate sequences
<a class="anchor" href="#dereplicate-sequences">#</a>
</h4>
<p>The <a href="http://metabar:8888/obidoc/obitools/obiuniq/">
<abbr title="obiuniq: dereplicate a sequence file"><code>obiuniq</code></abbr>
</a> program is able to dereplicate the sequences.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obiuniq -c taxid v05_clean.fasta &gt; v05_clean_uniq.fasta
</span></span></code></pre></div><h4 id="ensure-that-the-dereplicated-sequences-have-a-taxid-at-the-family-level">
Ensure that the dereplicated sequences have a taxid at the family level
<a class="anchor" href="#ensure-that-the-dereplicated-sequences-have-a-taxid-at-the-family-level">#</a>
</h4>
<p>Some sequences lose taxonomic information at the dereplication stage if certain versions
of the sequence did not have this information beforehand. So we apply a second filter of this type.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obigrep -t taxdump.tar.gz --require-rank<span style="color:#f92672">=</span>family v05_clean_uniq.fasta &gt; v05_clean_uniq.fasta
</span></span></code></pre></div><h4 id="ensure-that-sequences-each-have-a-unique-identifier">
Ensure that sequences each have a unique identifier
<a class="anchor" href="#ensure-that-sequences-each-have-a-unique-identifier">#</a>
</h4>
<h4 id="index-the-database">
Index the database
<a class="anchor" href="#index-the-database">#</a>
</h4>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obirefidx -t taxdump.tar.gz v05_clean_uniq.fasta &gt; v05_clean_uniq_indexed.fasta
</span></span></code></pre></div><p>The database provided in the
<a href="https://obitools4.metabarcoding.org/docs/cookbook/wolf-tutorial/">tutorial</a>
is called <code>wolf_data/db_v05_r117_indexed.fasta</code>.</p>
</article>
<footer class="book-footer">
<div class="flex flex-wrap justify-between">
</div>
<script>(function(){function e(e){const t=window.getSelection(),n=document.createRange();n.selectNodeContents(e),t.removeAllRanges(),t.addRange(n)}document.querySelectorAll("pre code").forEach(t=>{t.addEventListener("click",function(){if(window.getSelection().toString())return;e(t.parentElement),navigator.clipboard&&navigator.clipboard.writeText(t.parentElement.textContent)})})})()</script>
</footer>
<div class="book-comments">
</div>
<label for="menu-control" class="hidden book-menu-overlay"></label>
</div>
<aside class="book-toc">
<div class="book-toc-content">
<nav id="TableOfContents">
<ul>
<li><a href="#build-a-reference-database">Build a reference database</a>
<ul>
<li><a href="#download-the-sequences">Download the sequences</a></li>
<li><a href="#perform-a-in-silico-pcr-amplification">Perform a <em>in silico</em> PCR amplification</a></li>
<li><a href="#clean-the-database">Clean the database</a>
<ul>
<li></li>
</ul>
</li>
</ul>
</li>
</ul>
</nav>
</div>
</aside>
</main>
</body>
</html>