1703 lines
32 KiB
HTML
1703 lines
32 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en-us" dir="ltr">
|
|
<head>
|
|
<meta charset="UTF-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
<meta name="description" content="
|
|
Build a reference database
|
|
#
|
|
|
|
One of the crucial steps in the analysis of environmental DNA data is the taxonomic assignment of sequences,
|
|
i.e. assigning a species, genus or other taxonomic rank to the sequences present in the collected samples.
|
|
Taxonomic assignment requires annotated reference sequences, against which the sequences of interest are compared.
|
|
These reference sequences form what is known as a reference database, which is a sequence file in
|
|
fasta
|
|
format,
|
|
for a given marker of metabarcoding.">
|
|
<meta name="theme-color" media="(prefers-color-scheme: light)" content="#ffffff">
|
|
<meta name="theme-color" media="(prefers-color-scheme: dark)" content="#343a40">
|
|
<meta name="color-scheme" content="light dark"><meta property="og:url" content="http://metabar:8888/obidoc/docs/cookbook/reference_db/">
|
|
<meta property="og:site_name" content="OBITools4 documentation">
|
|
<meta property="og:title" content="Build a reference database">
|
|
<meta property="og:description" content="Build a reference database # One of the crucial steps in the analysis of environmental DNA data is the taxonomic assignment of sequences, i.e. assigning a species, genus or other taxonomic rank to the sequences present in the collected samples.
|
|
Taxonomic assignment requires annotated reference sequences, against which the sequences of interest are compared. These reference sequences form what is known as a reference database, which is a sequence file in fasta format, for a given marker of metabarcoding.">
|
|
<meta property="og:locale" content="en_us">
|
|
<meta property="og:type" content="website">
|
|
<title>Build a reference database | OBITools4 documentation</title>
|
|
<link rel="icon" href="/obidoc/favicon.png" >
|
|
<link rel="manifest" href="/obidoc/manifest.json">
|
|
<link rel="canonical" href="http://metabar:8888/obidoc/docs/cookbook/reference_db/">
|
|
<link rel="stylesheet" href="/obidoc/book.min.5fd7b8e2d1c0ae15da279c52ff32731130386f71b58f011468f20d0056fe6b78.css" integrity="sha256-X9e44tHArhXaJ5xS/zJzETA4b3G1jwEUaPINAFb+a3g=" crossorigin="anonymous">
|
|
<script defer src="/obidoc/fuse.min.js"></script>
|
|
<script defer src="/obidoc/en.search.min.4da51bdd2d833922fdbc0e19df517221387fc625ffb68ee140d605b3c5b68058.js" integrity="sha256-TaUb3S2DOSL9vA4Z31FyITh/xiX/to7hQNYFs8W2gFg=" crossorigin="anonymous"></script>
|
|
|
|
<script defer src="/obidoc/sw.min.32af8eafce4180aa1c5dea66d99fb26ba9043ea7c7a4c706138c91d9051b285e.js" integrity="sha256-Mq+Or85BgKocXepm2Z+ya6kEPqfHpMcGE4yR2QUbKF4=" crossorigin="anonymous"></script>
|
|
<link rel="alternate" type="application/rss+xml" href="http://metabar:8888/obidoc/docs/cookbook/reference_db/index.xml" title="OBITools4 documentation" />
|
|
<!--
|
|
Made with Book Theme
|
|
https://github.com/alex-shpak/hugo-book
|
|
-->
|
|
<link rel="stylesheet" type="text/css" href="http://metabar:8888/obidoc/hugo-cite.css" />
|
|
</head>
|
|
<body dir="ltr">
|
|
<input type="checkbox" class="hidden toggle" id="menu-control" />
|
|
<input type="checkbox" class="hidden toggle" id="toc-control" />
|
|
<main class="container flex">
|
|
<aside class="book-menu">
|
|
<div class="book-menu-content">
|
|
|
|
<nav>
|
|
<h2 class="book-brand">
|
|
<a class="flex align-center" href="/obidoc/"><img src="/obidoc/obitools_logo.jpg" alt="Logo" class="book-icon" /><span>OBITools4 documentation</span>
|
|
</a>
|
|
</h2>
|
|
|
|
|
|
<div class="book-search hidden">
|
|
<input type="text" id="book-search-input" placeholder="Search" aria-label="Search" maxlength="64" data-hotkeys="s/" />
|
|
<div class="book-search-spinner hidden"></div>
|
|
<ul id="book-search-results"></ul>
|
|
</div>
|
|
<script>document.querySelector(".book-search").classList.remove("hidden")</script>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<span>Docs</span>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/about/" class="">About</a>
|
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/installation/" class="">Installation</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/principles/" class="">General operating principles</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-08756b4c1f14be6ee584ece005b9f621" class="toggle" />
|
|
<label for="section-08756b4c1f14be6ee584ece005b9f621" class="flex justify-between">
|
|
<a role="button" class="">File formats</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-933c2e64b905b84e22aa5273cea2d0bd" class="toggle" />
|
|
<label for="section-933c2e64b905b84e22aa5273cea2d0bd" class="flex justify-between">
|
|
<a role="button" class="">Sequence file formats</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/formats/fasta/" class="">FASTA file format</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/formats/fastq/" class="">FASTQ file format</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/formats/genbank/" class="">GenBank Flat File format</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/formats/embl/" class="">EMBL Flat File format</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/file_format/sequence_files/csv/" class="">CSV format</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/formats/json/" class="">JSON format</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/file_format/sequence_files/annotations/" class="">Annotation of sequences</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-0258ae1c222f9a38cc1b75254c93b0f4" class="toggle" />
|
|
<label for="section-0258ae1c222f9a38cc1b75254c93b0f4" class="flex justify-between">
|
|
<a role="button" class="">Taxonomy file formats</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/file_format/taxonomy_file/csv_taxdump/" class="">CSV formatted taxdump</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/file_format/taxonomy_file/ncbi_taxdump/" class="">NCBI taxdump</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/formats/csv/" class="">The CSV format</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-70b1e6e5ec7f3ccab643155fa50659b6" class="toggle" />
|
|
<label for="section-70b1e6e5ec7f3ccab643155fa50659b6" class="flex justify-between">
|
|
<a role="button" class="">Patterns</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/patterns/regular/" class="">Regular Expressions</a>
|
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/patterns/dnagrep/" class="">DNA Patterns</a>
|
|
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-8223f464911a1fe6c655972143684e93" class="toggle" />
|
|
<label for="section-8223f464911a1fe6c655972143684e93" class="flex justify-between">
|
|
<a role="button" class="">The OBITools4 commands</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/commands/options/" class="">Shared command options</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-8921ea65523c266b128dd4263232b0fc" class="toggle" />
|
|
<label for="section-8921ea65523c266b128dd4263232b0fc" class="flex justify-between">
|
|
<a role="button" class="">Basics</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obiannotate/" class="">obiannotate</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obicomplement/" class="">obicomplement</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obiconvert/" class="">obiconvert</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obicount/" class="">obicount</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obicsv/" class="">obicsv</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obidemerge/" class="">obidemerge</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obidistribute/" class="">obidistribute</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obigrep/" class="">obigrep</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obijoin/" class="">obijoin</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obimatrix/" class="">obimatrix</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obisplit/" class="">obisplit</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obisummary/" class="">obisummary</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obiuniq/" class="">obiuniq</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-dbdf1bb5377572439394e60e08c30f50" class="toggle" />
|
|
<label for="section-dbdf1bb5377572439394e60e08c30f50" class="flex justify-between">
|
|
<a role="button" class="">Demultiplexing samples</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obimultiplex/" class="">obimultiplex</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obitagpcr/" class="">obitagpcr</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-aa98fedd067b51150db59691a8ea8edd" class="toggle" />
|
|
<label for="section-aa98fedd067b51150db59691a8ea8edd" class="flex justify-between">
|
|
<a role="button" class="">Sequence alignments</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obiclean/" class="">obiclean</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-7433746525d8c2b29b033f765c869acd" class="toggle" />
|
|
<label for="section-7433746525d8c2b29b033f765c869acd" class="flex justify-between">
|
|
<a href="/obidoc/obitools/obipairing/" class="">obipairing</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/commands/alignments/obipairing/fasta-like/" class="">The FASTA-like alignment</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/commands/alignments/obipairing/exact-alignment/" class="">Exact alignment</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obipcr/" class="">obipcr</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obirefidx/" class="">obirefidx</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obitag/" class="">obitag</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-5746f699d10490780dec8e30ab2dd3ce" class="toggle" />
|
|
<label for="section-5746f699d10490780dec8e30ab2dd3ce" class="flex justify-between">
|
|
<a role="button" class="">Taxonomy</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obitaxonomy/" class="">obitaxonomy</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-3f50c4fe7ab436a56ae92897d5444956" class="toggle" />
|
|
<label for="section-3f50c4fe7ab436a56ae92897d5444956" class="flex justify-between">
|
|
<a role="button" class="">Advanced tools</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obiscript/" class="">obiscript</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-549be3934679fcb82a232f6bd5435563" class="toggle" />
|
|
<label for="section-549be3934679fcb82a232f6bd5435563" class="flex justify-between">
|
|
<a role="button" class="">Others</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obimicrosat/" class="">obimicrosat</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-ceca4455173761e30cbc0a6dc2327167" class="toggle" />
|
|
<label for="section-ceca4455173761e30cbc0a6dc2327167" class="flex justify-between">
|
|
<a role="button" class="">Experimentals</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obicleandb/" class="">obicleandb</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obiconsensus/" class="">obiconsensus</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/obitools/obilandmark/" class="">obilandmark</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/commands/tags/" class="">Glossary of tags</a>
|
|
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-9b1bcd52530c59dc4819b1f61c128f54" class="toggle" checked />
|
|
<label for="section-9b1bcd52530c59dc4819b1f61c128f54" class="flex justify-between">
|
|
<a role="button" class="">Cookbook</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/cookbook/illumina/" class="">Analysing an Illumina data set</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/cookbook/ecoprimers/" class="">Designing new barcodes</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/cookbook/local_genbank/" class="">Prepare a local copy of Genbank</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/cookbook/reference_db/" class="active">Build a reference database</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/cookbook/minion/" class="">Oxford Nanopore data analysis</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<span>Programming OBITools</span>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/programming/expression/" class="">Expression language</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-6d580829a667b5cca790b286d99a10fe" class="toggle" />
|
|
<label for="section-6d580829a667b5cca790b286d99a10fe" class="flex justify-between">
|
|
<a href="/obidoc/docs/programming/lua/" class="">Lua: for scripting OBITools</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<input type="checkbox" id="section-2fb081dac812d624eea5f4268fca9e26" class="toggle" />
|
|
<label for="section-2fb081dac812d624eea5f4268fca9e26" class="flex justify-between">
|
|
<a role="button" class="">Obitools Classes</a>
|
|
</label>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/programming/lua/obitools_classes/biosequence/" class="">BioSequence</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/programming/lua/obitools_classes/biosequenceslice/" class="">BioSequenceSlice</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/programming/lua/obitools_classes/taxonomy/" class="">Taxonomy</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/programming/lua/obitools_classes/taxon/" class="">Taxon</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
|
|
|
|
|
|
|
<a href="/obidoc/docs/programming/lua/obitools_classes/mutex/" class="">Mutex</a>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</nav>
|
|
|
|
|
|
|
|
|
|
<script>(function(){var e=document.querySelector("aside .book-menu-content");addEventListener("beforeunload",function(){localStorage.setItem("menu.scrollTop",e.scrollTop)}),e.scrollTop=localStorage.getItem("menu.scrollTop")})()</script>
|
|
|
|
|
|
|
|
</div>
|
|
</aside>
|
|
|
|
<div class="book-page">
|
|
<header class="book-header">
|
|
|
|
<div class="flex align-center justify-between">
|
|
<label for="menu-control">
|
|
<img src="/obidoc/svg/menu.svg" class="book-icon" alt="Menu" />
|
|
</label>
|
|
|
|
<h3>Build a reference database</h3>
|
|
|
|
<label for="toc-control">
|
|
|
|
<img src="/obidoc/svg/toc.svg" class="book-icon" alt="Table of Contents" />
|
|
|
|
</label>
|
|
</div>
|
|
|
|
|
|
|
|
<aside class="hidden clearfix">
|
|
|
|
|
|
<nav id="TableOfContents">
|
|
<ul>
|
|
<li><a href="#build-a-reference-database">Build a reference database</a>
|
|
<ul>
|
|
<li><a href="#download-the-sequences">Download the sequences</a></li>
|
|
<li><a href="#perform-a-in-silico-pcr-amplification">Perform a <em>in silico</em> PCR amplification</a></li>
|
|
<li><a href="#clean-the-database">Clean the database</a>
|
|
<ul>
|
|
<li></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</nav>
|
|
|
|
|
|
|
|
</aside>
|
|
|
|
|
|
</header>
|
|
|
|
|
|
|
|
<article class="markdown book-article"><h1 id="build-a-reference-database">
|
|
Build a reference database
|
|
<a class="anchor" href="#build-a-reference-database">#</a>
|
|
</h1>
|
|
<p>One of the crucial steps in the analysis of environmental DNA data is the taxonomic assignment of sequences,
|
|
<em>i.e.</em> assigning a species, genus or other taxonomic rank to the sequences present in the collected samples.</p>
|
|
<p>Taxonomic assignment requires annotated reference sequences, against which the sequences of interest are compared.
|
|
These reference sequences form what is known as a <em>reference database</em>, which is a sequence file in
|
|
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
|
|
format,
|
|
for a given marker of metabarcoding.</p>
|
|
<p>Here is a quick step-by-step guide to creating a reference database, here for assigning sequences from wolf fecal
|
|
samples to study its diet, a dataset used in the
|
|
<a href="https://obitools4.metabarcoding.org/docs/cookbook/wolf-tutorial/">metabarcoding analysis tutorial here</a>.</p>
|
|
<p>One way to build a reference database is to use the <a href="http://metabar:8888/obidoc/obitools/obipcr/">
|
|
<abbr title="obipcr: the electronic PCR tool"><code>obipcr</code></abbr>
|
|
</a> program to simulate a PCR and extract all sequences
|
|
from a general purpose DNA database such as
|
|
<a href="https://www.ncbi.nlm.nih.gov/nucleotide/">GenBank</a> or
|
|
<a href="https://www.ebi.ac.uk/ena/browser/home">EMBL</a>
|
|
that can be amplified <em>in silico</em> by the two primers used for PCR amplification.</p>
|
|
<p>The steps to create a reference database are:</p>
|
|
<ol>
|
|
<li>Download sequences from a public database such as GenBank or EMBL</li>
|
|
<li>Perform an <em>in silico</em> PCR amplification of these sequences with a given marker with <a href="http://metabar:8888/obidoc/obitools/obipcr/">
|
|
<abbr title="obipcr: the electronic PCR tool"><code>obipcr</code></abbr>
|
|
</a></li>
|
|
<li>Clean up the database by deleting sequences that do not provide sufficient taxonomic information and are redundant</li>
|
|
</ol>
|
|
<p>Since Genbank and the taxonomy associated with sequences are constantly evolving, you may not get exactly the same results when using the following commands.</p>
|
|
<h2 id="download-the-sequences">
|
|
Download the sequences
|
|
<a class="anchor" href="#download-the-sequences">#</a>
|
|
</h2>
|
|
<p>In this example, the sequences are downloaded from the
|
|
<a href="https://ftp.ncbi.nlm.nih.gov/genbank/">GenBank FTP server</a>.
|
|
Please note that the download takes more than a day and currently occupies around 1.5 TB,
|
|
so make sure you have the necessary storage capacity before launching it.
|
|
To have a local copy of GenBank sequences, please go to the
|
|
|
|
<a href="https://obitools4.metabarcoding.org/docs/cookbook/local_genbank/">Prepare a local copy of GenBank</a> page.</p>
|
|
<h2 id="perform-a-in-silico-pcr-amplification">
|
|
Perform a <em>in silico</em> PCR amplification
|
|
<a class="anchor" href="#perform-a-in-silico-pcr-amplification">#</a>
|
|
</h2>
|
|
<p>In this example, we amplify the <em>12S-V5</em> region [@Riaz2011-gn] with the forward primer <strong>TTAGATACCCCACTATGC</strong>
|
|
and the reverse primer <strong>TAGAACAGGCTCCTCTAG</strong>, with the following command, to study the wolf diet
|
|
(see the
|
|
<a href="https://obitools4.metabarcoding.org/docs/cookbook/wolf-tutorial/">tutorial</a>).
|
|
Do not forget to update the release number of GenBank in the command line.</p>
|
|
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obipcr -e <span style="color:#ae81ff">3</span> -l <span style="color:#ae81ff">50</span> -L <span style="color:#ae81ff">150</span> <span style="color:#ae81ff">\ </span>
|
|
</span></span><span style="display:flex;"><span> --forward TTAGATACCCCACTATGC <span style="color:#ae81ff">\
|
|
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --reverse TAGAACAGGCTCCTCTAG <span style="color:#ae81ff">\
|
|
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --no-order <span style="color:#ae81ff">\
|
|
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> genbank/Release_264/fasta/*
|
|
</span></span><span style="display:flex;"><span> > v05_pcr.fasta
|
|
</span></span></code></pre></div><p>The <code>-l</code> and <code>-L</code> options define the minimum and maximum sizes of sequence fragments to be amplified.
|
|
Three mismatches with primer sequences are allowed here (-e 3), and we recommend using the <code>--no-order</code> option
|
|
to speed up the program (see <a href="http://metabar:8888/obidoc/obitools/obipcr/">
|
|
<abbr title="obipcr: the electronic PCR tool"><code>obipcr</code></abbr>
|
|
</a> documentation).</p>
|
|
<p>This previous command produces a
|
|
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
|
|
file, with the computed amplified sequences.</p>
|
|
<h2 id="clean-the-database">
|
|
Clean the database
|
|
<a class="anchor" href="#clean-the-database">#</a>
|
|
</h2>
|
|
<p>We choose to apply these different steps of filtering to clean up the sequences obtained with <a href="http://metabar:8888/obidoc/obitools/obipcr/">
|
|
<abbr title="obipcr: the electronic PCR tool"><code>obipcr</code></abbr>
|
|
</a>:</p>
|
|
<ol>
|
|
<li>Keep the sequences with a taxid and a taxonomic description to family, genus and species ranks (<a href="http://metabar:8888/obidoc/obitools/obigrep/">
|
|
<abbr title="obigrep: filter a sequence file"><code>obigrep</code></abbr>
|
|
</a>)</li>
|
|
<li>Remove redundant sequences (dereplicate)</li>
|
|
<li>Ensure that the dereplicated sequences have a taxid (taxon identifier) at the family level</li>
|
|
<li>Ensure that sequences each have a unique identification ID with <a href="http://metabar:8888/obidoc/obitools/obiannotate/">
|
|
<abbr title="obiannotate: edit sequence annotations"><code>obiannotate</code></abbr>
|
|
</a></li>
|
|
<li>Index the database</li>
|
|
</ol>
|
|
<h4 id="keep-annotated-sequences">
|
|
Keep annotated sequences
|
|
<a class="anchor" href="#keep-annotated-sequences">#</a>
|
|
</h4>
|
|
<p>To use the <code>-t</code> taxonomy option on all <em>OBITools</em> commands,
|
|
you can either enter the path to the taxonomy if you have downloaded
|
|
the sequences from the help page
|
|
<a href="https://obitools4.metabarcoding.org/docs/cookbook/local_genbank/">here</a>
|
|
which looks like <code>Release_264/taxonomy</code>, or download the taxdump file online with <code>curl</code>.</p>
|
|
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>curl http://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
|
|
</span></span></code></pre></div><p>The <a href="http://metabar:8888/obidoc/obitools/obigrep/">
|
|
<abbr title="obigrep: filter a sequence file"><code>obigrep</code></abbr>
|
|
</a> program allows to filter sequences, to keep only those with a taxid and a sufficient taxonomic description.</p>
|
|
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obigrep -t taxdump.tar.gz <span style="color:#ae81ff">\
|
|
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> -A taxid <span style="color:#ae81ff">\
|
|
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --require-rank species <span style="color:#ae81ff">\
|
|
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --require-rank genus <span style="color:#ae81ff">\
|
|
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> --require-rank family <span style="color:#ae81ff">\
|
|
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> v05_pcr.fasta > v05_clean.fasta
|
|
</span></span></code></pre></div><h4 id="dereplicate-sequences">
|
|
Dereplicate sequences
|
|
<a class="anchor" href="#dereplicate-sequences">#</a>
|
|
</h4>
|
|
<p>The <a href="http://metabar:8888/obidoc/obitools/obiuniq/">
|
|
<abbr title="obiuniq: dereplicate a sequence file"><code>obiuniq</code></abbr>
|
|
</a> program is able to dereplicate the sequences.</p>
|
|
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obiuniq -c taxid v05_clean.fasta > v05_clean_uniq.fasta
|
|
</span></span></code></pre></div><h4 id="ensure-that-the-dereplicated-sequences-have-a-taxid-at-the-family-level">
|
|
Ensure that the dereplicated sequences have a taxid at the family level
|
|
<a class="anchor" href="#ensure-that-the-dereplicated-sequences-have-a-taxid-at-the-family-level">#</a>
|
|
</h4>
|
|
<p>Some sequences lose taxonomic information at the dereplication stage if certain versions
|
|
of the sequence did not have this information beforehand. So we apply a second filter of this type.</p>
|
|
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obigrep -t taxdump.tar.gz --require-rank<span style="color:#f92672">=</span>family v05_clean_uniq.fasta > v05_clean_uniq.fasta
|
|
</span></span></code></pre></div><h4 id="ensure-that-sequences-each-have-a-unique-identifier">
|
|
Ensure that sequences each have a unique identifier
|
|
<a class="anchor" href="#ensure-that-sequences-each-have-a-unique-identifier">#</a>
|
|
</h4>
|
|
<h4 id="index-the-database">
|
|
Index the database
|
|
<a class="anchor" href="#index-the-database">#</a>
|
|
</h4>
|
|
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obirefidx -t taxdump.tar.gz v05_clean_uniq.fasta > v05_clean_uniq_indexed.fasta
|
|
</span></span></code></pre></div><p>The database provided in the
|
|
<a href="https://obitools4.metabarcoding.org/docs/cookbook/wolf-tutorial/">tutorial</a>
|
|
is called <code>wolf_data/db_v05_r117_indexed.fasta</code>.</p>
|
|
</article>
|
|
|
|
|
|
|
|
<footer class="book-footer">
|
|
|
|
<div class="flex flex-wrap justify-between">
|
|
|
|
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<script>(function(){function e(e){const t=window.getSelection(),n=document.createRange();n.selectNodeContents(e),t.removeAllRanges(),t.addRange(n)}document.querySelectorAll("pre code").forEach(t=>{t.addEventListener("click",function(){if(window.getSelection().toString())return;e(t.parentElement),navigator.clipboard&&navigator.clipboard.writeText(t.parentElement.textContent)})})})()</script>
|
|
|
|
|
|
|
|
|
|
</footer>
|
|
|
|
|
|
|
|
<div class="book-comments">
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<label for="menu-control" class="hidden book-menu-overlay"></label>
|
|
</div>
|
|
|
|
|
|
<aside class="book-toc">
|
|
<div class="book-toc-content">
|
|
|
|
|
|
<nav id="TableOfContents">
|
|
<ul>
|
|
<li><a href="#build-a-reference-database">Build a reference database</a>
|
|
<ul>
|
|
<li><a href="#download-the-sequences">Download the sequences</a></li>
|
|
<li><a href="#perform-a-in-silico-pcr-amplification">Perform a <em>in silico</em> PCR amplification</a></li>
|
|
<li><a href="#clean-the-database">Clean the database</a>
|
|
<ul>
|
|
<li></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</nav>
|
|
|
|
|
|
|
|
</div>
|
|
</aside>
|
|
|
|
</main>
|
|
|
|
|
|
</body>
|
|
</html>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|