Files
OBIJupyterHub/jupyterhub_volumes/web/obidoc/formats/fastq/index.html
Eric Coissac 30b7175702 Make cleaning
2025-11-17 14:18:13 +01:00

1879 lines
39 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en-us" dir="ltr">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="
The FASTQ sequence file format
#
The
FASTQ sequence file format is widely used for storing biological sequences and their corresponding quality scores. It was originally developed at the
Wellcome Trust Sanger Institute to bundle a
fasta
sequence together with its quality data
(
Citation: Cock,&#32;Fields
&amp; al.,&#32;2010
Cock,&#32;
P.,&#32;
Fields,&#32;
C.,&#32;
Goto,&#32;
N.,&#32;
Heuer,&#32;
M.&#32;&amp;&#32;Rice,&#32;
P.
&#32;
(2010).
&#32;The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Nucleic acids research,&#32;38(6).&#32;17671771.
https://doi.org/10.1093/nar/gkp1137
)
. The format has become the de facto standard for storing the output of high-throughput sequencing instruments.">
<meta name="theme-color" media="(prefers-color-scheme: light)" content="#ffffff">
<meta name="theme-color" media="(prefers-color-scheme: dark)" content="#343a40">
<meta name="color-scheme" content="light dark"><meta property="og:url" content="http://metabar:8888/obidoc/formats/fastq/">
<meta property="og:site_name" content="OBITools4 documentation">
<meta property="og:title" content="FASTQ file format">
<meta property="og:description" content="The FASTQ sequence file format # The FASTQ sequence file format is widely used for storing biological sequences and their corresponding quality scores. It was originally developed at the Wellcome Trust Sanger Institute to bundle a fasta sequence together with its quality data ( Citation: Cock, Fields &amp; al., 2010 Cock, P., Fields, C., Goto, N., Heuer, M. &amp; Rice, P. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic acids research, 38(6). 17671771. https://doi.org/10.1093/nar/gkp1137 ) . The format has become the de facto standard for storing the output of high-throughput sequencing instruments.">
<meta property="og:locale" content="en_us">
<meta property="og:type" content="website">
<title>FASTQ file format | OBITools4 documentation</title>
<link rel="icon" href="/obidoc/favicon.png" >
<link rel="manifest" href="/obidoc/manifest.json">
<link rel="canonical" href="http://metabar:8888/obidoc/formats/fastq/">
<link rel="stylesheet" href="/obidoc/book.min.5fd7b8e2d1c0ae15da279c52ff32731130386f71b58f011468f20d0056fe6b78.css" integrity="sha256-X9e44tHArhXaJ5xS/zJzETA4b3G1jwEUaPINAFb&#43;a3g=" crossorigin="anonymous">
<script defer src="/obidoc/fuse.min.js"></script>
<script defer src="/obidoc/en.search.min.4da51bdd2d833922fdbc0e19df517221387fc625ffb68ee140d605b3c5b68058.js" integrity="sha256-TaUb3S2DOSL9vA4Z31FyITh/xiX/to7hQNYFs8W2gFg=" crossorigin="anonymous"></script>
<script defer src="/obidoc/sw.min.32af8eafce4180aa1c5dea66d99fb26ba9043ea7c7a4c706138c91d9051b285e.js" integrity="sha256-Mq&#43;Or85BgKocXepm2Z&#43;ya6kEPqfHpMcGE4yR2QUbKF4=" crossorigin="anonymous"></script>
<link rel="alternate" type="application/rss+xml" href="http://metabar:8888/obidoc/formats/fastq/index.xml" title="OBITools4 documentation" />
<!--
Made with Book Theme
https://github.com/alex-shpak/hugo-book
-->
<link rel="stylesheet" type="text/css" href="http://metabar:8888/obidoc/hugo-cite.css" />
</head>
<body dir="ltr">
<input type="checkbox" class="hidden toggle" id="menu-control" />
<input type="checkbox" class="hidden toggle" id="toc-control" />
<main class="container flex">
<aside class="book-menu">
<div class="book-menu-content">
<nav>
<h2 class="book-brand">
<a class="flex align-center" href="/obidoc/"><img src="/obidoc/obitools_logo.jpg" alt="Logo" class="book-icon" /><span>OBITools4 documentation</span>
</a>
</h2>
<div class="book-search hidden">
<input type="text" id="book-search-input" placeholder="Search" aria-label="Search" maxlength="64" data-hotkeys="s/" />
<div class="book-search-spinner hidden"></div>
<ul id="book-search-results"></ul>
</div>
<script>document.querySelector(".book-search").classList.remove("hidden")</script>
<ul>
<li>
<span>Docs</span>
<ul>
<li>
<a href="/obidoc/docs/about/" class="">About</a>
</li>
<li>
<a href="/obidoc/docs/installation/" class="">Installation</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/principles/" class="">General operating principles</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-08756b4c1f14be6ee584ece005b9f621" class="toggle" checked />
<label for="section-08756b4c1f14be6ee584ece005b9f621" class="flex justify-between">
<a role="button" class="">File formats</a>
</label>
<ul>
<li>
<input type="checkbox" id="section-933c2e64b905b84e22aa5273cea2d0bd" class="toggle" checked />
<label for="section-933c2e64b905b84e22aa5273cea2d0bd" class="flex justify-between">
<a role="button" class="">Sequence file formats</a>
</label>
<ul>
<li>
<a href="/obidoc/formats/fasta/" class="">FASTA file format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/fastq/" class="active">FASTQ file format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/genbank/" class="">GenBank Flat File format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/embl/" class="">EMBL Flat File format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/sequence_files/csv/" class="">CSV format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/formats/json/" class="">JSON format</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/sequence_files/annotations/" class="">Annotation of sequences</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-0258ae1c222f9a38cc1b75254c93b0f4" class="toggle" />
<label for="section-0258ae1c222f9a38cc1b75254c93b0f4" class="flex justify-between">
<a role="button" class="">Taxonomy file formats</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/file_format/taxonomy_file/csv_taxdump/" class="">CSV formatted taxdump</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/file_format/taxonomy_file/ncbi_taxdump/" class="">NCBI taxdump</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/formats/csv/" class="">The CSV format</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-70b1e6e5ec7f3ccab643155fa50659b6" class="toggle" />
<label for="section-70b1e6e5ec7f3ccab643155fa50659b6" class="flex justify-between">
<a role="button" class="">Patterns</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/patterns/regular/" class="">Regular Expressions</a>
</li>
<li>
<a href="/obidoc/docs/patterns/dnagrep/" class="">DNA Patterns</a>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-8223f464911a1fe6c655972143684e93" class="toggle" />
<label for="section-8223f464911a1fe6c655972143684e93" class="flex justify-between">
<a role="button" class="">The OBITools4 commands</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/commands/options/" class="">Shared command options</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-8921ea65523c266b128dd4263232b0fc" class="toggle" />
<label for="section-8921ea65523c266b128dd4263232b0fc" class="flex justify-between">
<a role="button" class="">Basics</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiannotate/" class="">obiannotate</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicomplement/" class="">obicomplement</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiconvert/" class="">obiconvert</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicount/" class="">obicount</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obicsv/" class="">obicsv</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obidemerge/" class="">obidemerge</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obidistribute/" class="">obidistribute</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obigrep/" class="">obigrep</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obijoin/" class="">obijoin</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obimatrix/" class="">obimatrix</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obisplit/" class="">obisplit</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obisummary/" class="">obisummary</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiuniq/" class="">obiuniq</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-dbdf1bb5377572439394e60e08c30f50" class="toggle" />
<label for="section-dbdf1bb5377572439394e60e08c30f50" class="flex justify-between">
<a role="button" class="">Demultiplexing samples</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obimultiplex/" class="">obimultiplex</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obitagpcr/" class="">obitagpcr</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-aa98fedd067b51150db59691a8ea8edd" class="toggle" />
<label for="section-aa98fedd067b51150db59691a8ea8edd" class="flex justify-between">
<a role="button" class="">Sequence alignments</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiclean/" class="">obiclean</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-7433746525d8c2b29b033f765c869acd" class="toggle" />
<label for="section-7433746525d8c2b29b033f765c869acd" class="flex justify-between">
<a href="/obidoc/obitools/obipairing/" class="">obipairing</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/commands/alignments/obipairing/fasta-like/" class="">The FASTA-like alignment</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/commands/alignments/obipairing/exact-alignment/" class="">Exact alignment</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obipcr/" class="">obipcr</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obirefidx/" class="">obirefidx</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obitag/" class="">obitag</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-5746f699d10490780dec8e30ab2dd3ce" class="toggle" />
<label for="section-5746f699d10490780dec8e30ab2dd3ce" class="flex justify-between">
<a role="button" class="">Taxonomy</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obitaxonomy/" class="">obitaxonomy</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-3f50c4fe7ab436a56ae92897d5444956" class="toggle" />
<label for="section-3f50c4fe7ab436a56ae92897d5444956" class="flex justify-between">
<a role="button" class="">Advanced tools</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obiscript/" class="">obiscript</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-549be3934679fcb82a232f6bd5435563" class="toggle" />
<label for="section-549be3934679fcb82a232f6bd5435563" class="flex justify-between">
<a role="button" class="">Others</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obimicrosat/" class="">obimicrosat</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-ceca4455173761e30cbc0a6dc2327167" class="toggle" />
<label for="section-ceca4455173761e30cbc0a6dc2327167" class="flex justify-between">
<a role="button" class="">Experimentals</a>
</label>
<ul>
<li>
<a href="/obidoc/obitools/obicleandb/" class="">obicleandb</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obiconsensus/" class="">obiconsensus</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/obitools/obilandmark/" class="">obilandmark</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<a href="/obidoc/docs/commands/tags/" class="">Glossary of tags</a>
</li>
</ul>
</li>
<li>
<input type="checkbox" id="section-9b1bcd52530c59dc4819b1f61c128f54" class="toggle" />
<label for="section-9b1bcd52530c59dc4819b1f61c128f54" class="flex justify-between">
<a role="button" class="">Cookbook</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/cookbook/illumina/" class="">Analysing an Illumina data set</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/ecoprimers/" class="">Designing new barcodes</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/local_genbank/" class="">Prepare a local copy of Genbank</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/reference_db/" class="">Build a reference database</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/cookbook/minion/" class="">Oxford Nanopore data analysis</a>
<ul>
</ul>
</li>
</ul>
</li>
<li>
<span>Programming OBITools</span>
<ul>
<li>
<a href="/obidoc/docs/programming/expression/" class="">Expression language</a>
<ul>
</ul>
</li>
<li>
<input type="checkbox" id="section-6d580829a667b5cca790b286d99a10fe" class="toggle" />
<label for="section-6d580829a667b5cca790b286d99a10fe" class="flex justify-between">
<a href="/obidoc/docs/programming/lua/" class="">Lua: for scripting OBITools</a>
</label>
<ul>
<li>
<input type="checkbox" id="section-2fb081dac812d624eea5f4268fca9e26" class="toggle" />
<label for="section-2fb081dac812d624eea5f4268fca9e26" class="flex justify-between">
<a role="button" class="">Obitools Classes</a>
</label>
<ul>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/biosequence/" class="">BioSequence</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/biosequenceslice/" class="">BioSequenceSlice</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/taxonomy/" class="">Taxonomy</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/taxon/" class="">Taxon</a>
<ul>
</ul>
</li>
<li>
<a href="/obidoc/docs/programming/lua/obitools_classes/mutex/" class="">Mutex</a>
<ul>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</nav>
<script>(function(){var e=document.querySelector("aside .book-menu-content");addEventListener("beforeunload",function(){localStorage.setItem("menu.scrollTop",e.scrollTop)}),e.scrollTop=localStorage.getItem("menu.scrollTop")})()</script>
</div>
</aside>
<div class="book-page">
<header class="book-header">
<div class="flex align-center justify-between">
<label for="menu-control">
<img src="/obidoc/svg/menu.svg" class="book-icon" alt="Menu" />
</label>
<h3>FASTQ file format</h3>
<label for="toc-control">
<img src="/obidoc/svg/toc.svg" class="book-icon" alt="Table of Contents" />
</label>
</div>
<aside class="hidden clearfix">
<nav id="TableOfContents">
<ul>
<li><a href="#the-fastq-sequence-file-format">The <em>FASTQ</em> sequence file format</a>
<ul>
<li><a href="#references">References</a></li>
</ul>
</li>
</ul>
</nav>
</aside>
</header>
<article class="markdown book-article"><h1 id="the-fastq-sequence-file-format">
The <em>FASTQ</em> sequence file format
<a class="anchor" href="#the-fastq-sequence-file-format">#</a>
</h1>
<p>The
<a href="https://en.wikipedia.org/wiki/FASTQ_format">FASTQ</a> sequence file format is widely used for storing biological sequences and their corresponding quality scores. It was originally developed at the
<a href="https://www.sanger.ac.uk/">Wellcome Trust Sanger Institute</a> to bundle a
<a href="http://metabar:8888/obidoc/formats/fasta/">fasta</a>
sequence together with its quality data
<span class="hugo-cite-intext"
itemprop="citation">(<span class="hugo-cite-group">
<a href="#cock2010-wl"><span class="visually-hidden">Citation: </span><span itemprop="author" itemscope itemtype="https://schema.org/Person"><meta itemprop="givenName" content="Peter J A"><span itemprop="familyName">Cock</span></span>,&#32;<span itemprop="author" itemscope itemtype="https://schema.org/Person"><meta itemprop="givenName" content="Christopher J"><span itemprop="familyName">Fields</span></span>
<em>&amp; al.</em>,&#32;<span itemprop="datePublished">2010</span></a><span class="hugo-cite-citation">
<span itemscope
itemtype="https://schema.org/Article"
data-type="article"><span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Cock</span>,&#32;
<meta itemprop="givenName" content="Peter J A" />
P.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Fields</span>,&#32;
<meta itemprop="givenName" content="Christopher J" />
C.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Goto</span>,&#32;
<meta itemprop="givenName" content="Naohisa" />
N.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Heuer</span>,&#32;
<meta itemprop="givenName" content="Michael L" />
M.</span>&#32;&amp;&#32;<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Rice</span>,&#32;
<meta itemprop="givenName" content="Peter M" />
P.</span>
&#32;
(<span itemprop="datePublished">2010</span>).
&#32;<span itemprop="name">The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants</span>.<i>
<span itemprop="about">Nucleic acids research</span>,&#32;38(6)</i>.&#32;<span itemprop="pagination">17671771</span>.
<a href="https://doi.org/10.1093/nar/gkp1137"
itemprop="identifier"
itemtype="https://schema.org/URL">https://doi.org/10.1093/nar/gkp1137</a></span>
</span></span>)</span>
. The format has become the <em>de facto</em> standard for storing the output of high-throughput sequencing instruments.</p>
<p>In <em>FASTQ</em> format, each sequence entry consists of four lines:</p>
<ol>
<li>A sequence identifier line beginning with an <strong>@</strong> character</li>
<li>The raw sequence letters using the
<a href="http://metabar:8888/obidoc/docs/patterns/dnagrep/#iupac-codes-for-ambiguous-bases"><code>iupac</code></a> code</li>
<li>A separator line beginning with a <strong>+</strong> character (optionally followed by the same sequence identifier)</li>
<li>The quality scores encoded in ASCII format</li>
</ol>
<pre tabindex="0"><code>@my_sequence this is my pretty sequence
ACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGT
+
CCCCCCC&lt;CcCccbe[`F`accXV&lt;TA\RYU\\ee_e[XZ
</code></pre><p>The first word after the &lsquo;@&rsquo; symbol in the identifier line is the sequence identifier. The rest of the line is a description of the sequence.</p>
<p>The qualities line gives information about the quality scores assigned to each base by the sequencing machine during the sequencing process. It indicates the probability that the base read is incorrectly sequenced.</p>
<link rel="stylesheet" href="/obidoc/katex/katex.min.css" />
<script defer src="/obidoc/katex/katex.min.js"></script>
<script defer src="/obidoc/katex/auto-render.min.js" onload="renderMathInElement(document.body);"></script><span>
\[
P(error) = 10^{-\frac{Q}{10}}
\]
</span>
<p>Sequencers typically provide quality scores in the range of <span>
\(0\)
</span>
to <span>
\(40\)
</span>
, which corresponds to a probability of error <span>
\(P(Error)\)
</span>
in the range of <span>
\(10^{0} = 1\)
</span>
to <span>
\(10^{-4}\)
</span>
. The higher the score, the lower the probability of error.</p>
<!--
quality <- ggplot() + geom_function(fun = function(x) 10^(-x/10)) + xlim(0,40) + xlab("Quality") + ylab(expression(P(Error) == 10^{-Q/10})) + theme_minimal() + scale_y_log10()
ggsave("qality.png",quality)
-->
<figure id="quality-score"style="border: solid; border-radius: 30px; box-shadow: 0 0 0 10px #f3f5f6 inset; padding: 1em;"
><img src="/obidoc/formats/fastq/quality.png"
style="display: block; margin: 0 auto"
alt="Quality scores to error probability relationship"><figcaption style="width: 90%; display: block; margin: 0 auto">
<h4>Quality scores and chance of sequencing error</h4><p>Figure showing the relationship between FASTQ quality scores and error probability</p>
</figcaption>
</figure>
<p>In <em>FASTQ</em> format, the sequence of quality score is encoded as an ASCII string where each score is mapped to an ASCII character. The quality score <span>
\(0\)
</span>
is encoded as the character <code>!</code>. The quality score <span>
\(40\)
</span>
is encoded as the character <code>I</code> (uppercase <code>i</code>).</p>
<span>
\[ASCII\,CODE = Q + 33 \]
</span>
<p>The <em>OBITools</em> extend this format by adding structured data to the identifier line. In the previous version of the <em>OBITools</em>, the structured data was stored after the sequence identifier in a <code>key=value;</code> format, as shown below. The sequence definition was stored as free text after the last <code>key=value;</code> pair.</p>
<a style="padding: 10px 20px; background-color: #cacaca; border: 1px solid #8e8080; border-bottom: none; border-radius: 5px 5px 0 0; box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1)"
href="two_sequences_obi2.fastq" download="two_sequences_obi2.fastq">📄 two_sequences_obi2.fastq</a>
<DIV style="border: 2px solid #8e8080; border-radius: 0 0 5px 5px; padding: 20px; background-color: white; ">
<pre tabindex="0"><code class="language-fastq" data-lang="fastq">@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/1 ali_length=62; mode=alignment; pairing_mismatches={&#39;(T:26)-&gt;(G:13)&#39;:62,&#39;(T:34)-&gt;(G:18)&#39;:48}; score=484; seq_b_single=46; ali_dir=left; score_norm=0.968; seq_a_single=46; seq_ab_match=60; sequence definition here
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
CCCCCCCBCCCCCCCCCCCCCCCCCCCCCCBCCCCCBCCCCCCC&lt;CcCccbe[`F`accXV&lt;TA\RYU\\ee_e[XZ[XEEEEEEEEEE?EEEEEEEEEEDEEEEEEECCCCCCCCCCCCCCCCCCCCCCCACCCCCACCCCCCCCCCCCCCCC
@HELIUM_000100422_612GNAAXX:7:97:14311:19299#0/1 mode=alignment; seq_a_single=46; seq_ab_match=52; score=283; score_norm=0.839; seq_b_single=46; ali_dir=left; ali_length=62; pairing_mismatches={&#39;(A:02)-&gt;(G:30)&#39;:104,&#39;(A:34)-&gt;(G:14)&#39;:64,&#39;(C:02)-&gt;(A:30)&#39;:86,&#39;(C:02)-&gt;(T:20)&#39;:108,&#39;(C:27)-&gt;(G:32)&#39;:83,&#39;(C:34)-&gt;(G:18)&#39;:57,&#39;(T:02)-&gt;(G:26)&#39;:87,&#39;(T:22)-&gt;(G:14)&#39;:66,&#39;(T:29)-&gt;(G:11)&#39;:62,&#39;(T:32)-&gt;(G:30)&#39;:48}; sequence definition here
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaagagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
CCCCCCCCCCCCCCCCCCCCCCCBBCCC?BCCCCCBC?CCCC@@;AVA`cWeb_TYC\UIN?IDP8QJMKRPVGLQAFPPc`AbAFB5A4&gt;AAA56A&gt;&lt;&gt;8&gt;&gt;F@A&gt;&lt;8??@BB+&lt;?;?C@9CCCCCC&lt;CC=CCCCCCCCCBC?CBCCCCC@CC
</code></pre></td>
</DIV>
<p>With <em>OBITools4</em> a new format has been introduced to store structured data in the identifier line. The <em>key</em>/<em>value</em> annotation pairs are now formatted as a
<a href="https://en.wikipedia.org/wiki/JSON">JSON</a> map object. The definition is stored as an additional <em>key</em>/<em>value</em> pair using the <em>key</em> `definition&rsquo;.</p>
<a style="padding: 10px 20px; background-color: #cacaca; border: 1px solid #8e8080; border-bottom: none; border-radius: 5px 5px 0 0; box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1)"
href="two_sequences_obi4.fastq" download="two_sequences_obi4.fastq">📄 two_sequences_obi4.fastq</a>
<DIV style="border: 2px solid #8e8080; border-radius: 0 0 5px 5px; padding: 20px; background-color: white; ">
<pre tabindex="0"><code class="language-fastq" data-lang="fastq">@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/1 {&#34;ali_dir&#34;:&#34;left&#34;,&#34;ali_length&#34;:62,&#34;mode&#34;:&#34;alignment&#34;,&#34;pairing_mismatches&#34;:{&#34;(T:26)-&gt;(G:13)&#34;:62,&#34;(T:34)-&gt;(G:18)&#34;:48},&#34;score&#34;:484,&#34;score_norm&#34;:0.968,&#34;seq_a_single&#34;:46,&#34;seq_ab_match&#34;:60,&#34;seq_b_single&#34;:46,&#34;definition&#34;:&#34;sequence definition here&#34;}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
CCCCCCCBCCCCCCCCCCCCCCCCCCCCCCBCCCCCBCCCCCCC&lt;CcCccbe[`F`accXV&lt;TA\RYU\\ee_e[XZ[XEEEEEEEEEE?EEEEEEEEEEDEEEEEEECCCCCCCCCCCCCCCCCCCCCCCACCCCCACCCCCCCCCCCCCCCC
@HELIUM_000100422_612GNAAXX:7:97:14311:19299#0/1 {&#34;ali_dir&#34;:&#34;left&#34;,&#34;ali_length&#34;:62,&#34;mode&#34;:&#34;alignment&#34;,&#34;pairing_mismatches&#34;:{&#34;(A:02)-&gt;(G:30)&#34;:104,&#34;(A:34)-&gt;(G:14)&#34;:64,&#34;(C:02)-&gt;(A:30)&#34;:86,&#34;(C:02)-&gt;(T:20)&#34;:108,&#34;(C:27)-&gt;(G:32)&#34;:83,&#34;(C:34)-&gt;(G:18)&#34;:57,&#34;(T:02)-&gt;(G:26)&#34;:87,&#34;(T:22)-&gt;(G:14)&#34;:66,&#34;(T:29)-&gt;(G:11)&#34;:62,&#34;(T:32)-&gt;(G:30)&#34;:48},&#34;score&#34;:283,&#34;score_norm&#34;:0.839,&#34;seq_a_single&#34;:46,&#34;seq_ab_match&#34;:52,&#34;seq_b_single&#34;:46,&#34;definition&#34;:&#34;sequence definition here&#34;}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaagagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
CCCCCCCCCCCCCCCCCCCCCCCBBCCC?BCCCCCBC?CCCC@@;AVA`cWeb_TYC\UIN?IDP8QJMKRPVGLQAFPPc`AbAFB5A4&gt;AAA56A&gt;&lt;&gt;8&gt;&gt;F@A&gt;&lt;8??@BB+&lt;?;?C@9CCCCCC&lt;CC=CCCCCCCCCBC?CBCCCCC@CC
</code></pre></td>
</DIV>
<p>The <a href="http://metabar:8888/obidoc/obitools/obiconvert/">
<abbr title="obiconvert: convert format of a sequence file"><code>obiconvert</code></abbr>
</a> command, like all other <em>OBITools4</em> commands, has two options <code>--output-json-header</code> and <code>--output-OBI-header</code> to choose between the new
<a href="https://en.wikipedia.org/wiki/JSON">JSON</a> format and the old <em>OBITools</em> format. The <code>--output-OBI-header</code> option can be abbreviated to <code>-O</code>. By default, the new
<a href="https://en.wikipedia.org/wiki/JSON">JSON</a> <em>OBITools4</em> format is used, so only the <code>-O</code> option is really useful if the old format is required for compatibility with another software.</p>
<p>Converting from the new
<a href="https://en.wikipedia.org/wiki/JSON">JSON</a> format to the old <em>OBITools</em> format:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obiconvert -O two_sequences_obi4.fastq
</span></span></code></pre></div>
<pre tabindex="0"><code class="language-fastq" data-lang="fastq">@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/1 ali_length=62; mode=alignment; pairing_mismatches={&#39;(T:26)-&gt;(G:13)&#39;:62,&#39;(T:34)-&gt;(G:18)&#39;:48}; score=484; seq_b_single=46; ali_dir=left; score_norm=0.968; seq_a_single=46; seq_ab_match=60; sequence definition here
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
CCCCCCCBCCCCCCCCCCCCCCCCCCCCCCBCCCCCBCCCCCCC&lt;CcCccbe[`F`accXV&lt;TA\RYU\\ee_e[XZ[XEEEEEEEEEE?EEEEEEEEEEDEEEEEEECCCCCCCCCCCCCCCCCCCCCCCACCCCCACCCCCCCCCCCCCCCC
@HELIUM_000100422_612GNAAXX:7:97:14311:19299#0/1 mode=alignment; seq_a_single=46; seq_ab_match=52; score=283; score_norm=0.839; seq_b_single=46; ali_dir=left; ali_length=62; pairing_mismatches={&#39;(A:02)-&gt;(G:30)&#39;:104,&#39;(A:34)-&gt;(G:14)&#39;:64,&#39;(C:02)-&gt;(A:30)&#39;:86,&#39;(C:02)-&gt;(T:20)&#39;:108,&#39;(C:27)-&gt;(G:32)&#39;:83,&#39;(C:34)-&gt;(G:18)&#39;:57,&#39;(T:02)-&gt;(G:26)&#39;:87,&#39;(T:22)-&gt;(G:14)&#39;:66,&#39;(T:29)-&gt;(G:11)&#39;:62,&#39;(T:32)-&gt;(G:30)&#39;:48}; sequence definition here
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaagagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
CCCCCCCCCCCCCCCCCCCCCCCBBCCC?BCCCCCBC?CCCC@@;AVA`cWeb_TYC\UIN?IDP8QJMKRPVGLQAFPPc`AbAFB5A4&gt;AAA56A&gt;&lt;&gt;8&gt;&gt;F@A&gt;&lt;8??@BB+&lt;?;?C@9CCCCCC&lt;CC=CCCCCCCCCBC?CBCCCCC@CC
</code></pre></td>
<p>Converting from the old <em>OBITools</em> format to the new
<a href="https://en.wikipedia.org/wiki/JSON">JSON</a> format:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>obiconvert two_sequences_obi2.fastq
</span></span></code></pre></div>
<pre tabindex="0"><code class="language-fastq" data-lang="fastq">@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/1 {&#34;ali_dir&#34;:&#34;left&#34;,&#34;ali_length&#34;:62,&#34;mode&#34;:&#34;alignment&#34;,&#34;pairing_mismatches&#34;:{&#34;(T:26)-&gt;(G:13)&#34;:62,&#34;(T:34)-&gt;(G:18)&#34;:48},&#34;score&#34;:484,&#34;score_norm&#34;:0.968,&#34;seq_a_single&#34;:46,&#34;seq_ab_match&#34;:60,&#34;seq_b_single&#34;:46,&#34;definition&#34;:&#34;sequence definition here&#34;}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattgttcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
CCCCCCCBCCCCCCCCCCCCCCCCCCCCCCBCCCCCBCCCCCCC&lt;CcCccbe[`F`accXV&lt;TA\RYU\\ee_e[XZ[XEEEEEEEEEE?EEEEEEEEEEDEEEEEEECCCCCCCCCCCCCCCCCCCCCCCACCCCCACCCCCCCCCCCCCCCC
@HELIUM_000100422_612GNAAXX:7:97:14311:19299#0/1 {&#34;ali_dir&#34;:&#34;left&#34;,&#34;ali_length&#34;:62,&#34;mode&#34;:&#34;alignment&#34;,&#34;pairing_mismatches&#34;:{&#34;(A:02)-&gt;(G:30)&#34;:104,&#34;(A:34)-&gt;(G:14)&#34;:64,&#34;(C:02)-&gt;(A:30)&#34;:86,&#34;(C:02)-&gt;(T:20)&#34;:108,&#34;(C:27)-&gt;(G:32)&#34;:83,&#34;(C:34)-&gt;(G:18)&#34;:57,&#34;(T:02)-&gt;(G:26)&#34;:87,&#34;(T:22)-&gt;(G:14)&#34;:66,&#34;(T:29)-&gt;(G:11)&#34;:62,&#34;(T:32)-&gt;(G:30)&#34;:48},&#34;score&#34;:283,&#34;score_norm&#34;:0.839,&#34;seq_a_single&#34;:46,&#34;seq_ab_match&#34;:52,&#34;seq_b_single&#34;:46,&#34;definition&#34;:&#34;sequence definition here&#34;}
ccgcctcctttagataccccactatgcttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaagagcttaaaactcaaaggacttggcggtgctttatacccttctagaggagcctgttctaaggaggcgg
+
CCCCCCCCCCCCCCCCCCCCCCCBBCCC?BCCCCCBC?CCCC@@;AVA`cWeb_TYC\UIN?IDP8QJMKRPVGLQAFPPc`AbAFB5A4&gt;AAA56A&gt;&lt;&gt;8&gt;&gt;F@A&gt;&lt;8??@BB+&lt;?;?C@9CCCCCC&lt;CC=CCCCCCCCCBC?CBCCCCC@CC
</code></pre></td>
<p>The actual format of the header is automatically detected when <em>OBITools4</em> commands read a FASTQ file.</p>
<h2 id="references">
References
<a class="anchor" href="#references">#</a>
</h2>
<section class="hugo-cite-bibliography">
<dl>
<div id="cock2010-wl">
<dt>
Cock,&#32;
Fields,&#32;
Goto,&#32;
Heuer&#32;&amp;&#32;Rice
(2010)</dt>
<dd>
<span itemscope
itemtype="https://schema.org/Article"
data-type="article"><span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Cock</span>,&#32;
<meta itemprop="givenName" content="Peter J A" />
P.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Fields</span>,&#32;
<meta itemprop="givenName" content="Christopher J" />
C.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Goto</span>,&#32;
<meta itemprop="givenName" content="Naohisa" />
N.</span>,&#32;
<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Heuer</span>,&#32;
<meta itemprop="givenName" content="Michael L" />
M.</span>&#32;&amp;&#32;<span itemprop="author" itemscope itemtype="https://schema.org/Person"><span itemprop="familyName">Rice</span>,&#32;
<meta itemprop="givenName" content="Peter M" />
P.</span>
&#32;
(<span itemprop="datePublished">2010</span>).
&#32;<span itemprop="name">The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants</span>.<i>
<span itemprop="about">Nucleic acids research</span>,&#32;38(6)</i>.&#32;<span itemprop="pagination">17671771</span>.
<a href="https://doi.org/10.1093/nar/gkp1137"
itemprop="identifier"
itemtype="https://schema.org/URL">https://doi.org/10.1093/nar/gkp1137</a></span>
</dd>
</div>
</dl>
</section>
</article>
<footer class="book-footer">
<div class="flex flex-wrap justify-between">
</div>
<script>(function(){function e(e){const t=window.getSelection(),n=document.createRange();n.selectNodeContents(e),t.removeAllRanges(),t.addRange(n)}document.querySelectorAll("pre code").forEach(t=>{t.addEventListener("click",function(){if(window.getSelection().toString())return;e(t.parentElement),navigator.clipboard&&navigator.clipboard.writeText(t.parentElement.textContent)})})})()</script>
</footer>
<div class="book-comments">
</div>
<label for="menu-control" class="hidden book-menu-overlay"></label>
</div>
<aside class="book-toc">
<div class="book-toc-content">
<nav id="TableOfContents">
<ul>
<li><a href="#the-fastq-sequence-file-format">The <em>FASTQ</em> sequence file format</a>
<ul>
<li><a href="#references">References</a></li>
</ul>
</li>
</ul>
</nav>
</div>
</aside>
</main>
</body>
</html>