Refactoring codes for removing buffer size options. An some other changes...

Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
This commit is contained in:
2023-03-07 11:12:13 +07:00
parent 9811e440b8
commit d88de15cdc
52 changed files with 1172 additions and 421 deletions

View File

@ -1,16 +1,11 @@
# Annexes
### Sequence attributes
## Sequence attributes
#### Reserved sequence attributes
**ali_dir (`string`)**
##### `ali_dir`
###### Type : `string`
The attribute can contain 2 string values `"left"` or `"right".`
###### Set by the *obipairing* tool
- Set by the *obipairing* tool
- The attribute can contain 2 string values `left` or `right`.
The alignment generated by *obipairing* is a 3'-end gap free algorithm.
Two cases can occur when aligning the forward and reverse reads. If the
@ -20,32 +15,31 @@ barcode is shorter than the read length, the paired reads overlap by
their 5' ends, and the complete barcode is sequenced by both the reads.
In that later case, `ali_dir` is set to *right*.
##### `ali_length`
**ali_length (`int`)**
###### Set by the *obipairing* tool
- Set by the *obipairing* tool
Length of the aligned parts when merging forward and reverse reads
##### `count` : the number of sequence occurrences
###### Set by the *obiuniq* tool
**count (`int`)**
The `count` attribute indicates how-many strictly identical sequences
- Set by the *obiuniq* tool
- Getter : method `Count()`
- Setter : method `SetCount(int)`
The `count` attribute indicates how-many strictly identical reads
have been merged in a single record. It contains an integer value. If it
is absent this means that the sequence record represents a single
occurrence of the sequence.
###### Getter : method `Count()`
The `Count()` method allows to access to the count attribute as an
integer value. If the `count` attribute is not defined for the given
sequence, the value *1* is returned
##### `merged_*`
**merged_* (`map[string]int`)**
###### Type : `map[string]int`
###### Set by the *obiuniq* tool
- Set by the *obiuniq* tool
The `-m` option of the *obiuniq* tools allows for keeping track of the
distribution of the values stored in given attribute of interest. Often
@ -55,28 +49,59 @@ actual name of the attribute depends on the name of the monitored
attribute. If `-m` option is used with the attribute *sample*, then this
attribute names *merged_sample*.
##### `mode`
**mode (`string`)**
###### Set by the *obipairing* tool
- Set by the *obipairing* tool
- The attribute can contain 2 string values `join` or `alignment`.
**`obitag_ref_index`**
###### Set by the *obirefidx* tool.
**obitag_ref_index (`map[string]string`)**
- Set by the *obirefidx* tool.
It resumes to which taxonomic annotation a match to that sequence must
lead according to the number of differences existing between the query
sequence and the reference sequence having that tag.
###### Getter : method `Count()`
```json
{"0":"9606@Homo sapiens@species",
"2":"207598@Homininae@subfamily",
"3":"9604@Hominidae@family",
"8":"314295@Hominoidea@superfamily",
"10":"9526@Catarrhini@parvorder",
"12":"1437010@Boreoeutheria@clade",
"16":"9347@Eutheria@clade",
"17":"40674@Mammalia@class",
"22":"117571@Euteleostomi@clade",
"25":"7776@Gnathostomata@clade",
"29":"33213@Bilateria@clade",
"30":"6072@Eumetazoa@clade"}
```
##### `pairing_mismatches`
**pairing_mismatches (`map[string]string`)**
###### Set by the *obipairing* tool
- Set by the *obipairing* tool
##### `score`
**seq_a_single (`int`)**
###### Set by the *obipairing* tool
- Set by the *obipairing* tool
##### `score_norm`
**seq_ab_match (`int`)**
- Set by the *obipairing* tool
**seq_b_single (`int`)**
- Set by the *obipairing* tool
**score (`int`)**
- Set by the *obipairing* tool
**score_norm (`float`)**
- Set by the *obipairing* tool
- The value ranges between 0 and 1.
Score of the alignment between forward and reverse reads expressed as a fraction of identity.
###### Set by the *obipairing* tool

View File

@ -10,13 +10,39 @@
Sequences can be selected on several of their caracteristics, their length, their id, their sequence. Options allow for specifying the condition if selection.
**Selection based on the sequence**
Sequence records can be selected according if they match or not with a pattern. The simplest pattern is as short sequence (*e.g* `AACCTT`). But the usage of regular patterns allows for looking for more complex pattern. As example, `A[TG]C+G` matches a `A`, followed by a `T` or a `G`, then one or several `C` and endly a `G`.
{{< include ../lib/options/selection/_sequence.qmd >}}
*Examples:*
: Selects only the sequence records that contain an *EcoRI* restriction site.
```bash
obigrep -s 'GAATTC' seq1.fasta > seq2.fasta
```
: Selects only the sequence records that contain a stretch of at least 10 ``A``.
```bash
obigrep -s 'A{10,}' seq1.fasta > seq2.fasta
```
: Selects only the sequence records that do not contain ambiguous nucleotides.
```bash
obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta
```
{{< include ../lib/options/selection/_min-count.qmd >}}
{{< include ../lib/options/selection/_max-count.qmd >}}
Example
*Examples*
: Selecting sequence records representing at least five reads in the dataset.

View File

@ -11,26 +11,64 @@ Several OBITools (*e.g.* obigrep, obiannotate) allow the user to specify some si
### Instrospection functions {.unnumbered}
- `len(x)`is a generic function allowing to retreive the size of a object. It returns
**`len(x)`**
: It is a generic function allowing to retreive the size of a object. It returns
the length of a sequences, the number of element in a map like `annotations`, the number
of elements in an array. The reurned value is an `int`.
### Cast functions {.unnumbered}
- `int(x)` converts if possible the `x` value to an integer value. The function
**`int(x)`**
: Converts if possible the `x` value to an integer value. The function
returns an `int`.
- `numeric(x)` converts if possible the `x` value to a float value. The function
**`numeric(x)`**
: Converts if possible the `x` value to a float value. The function
returns a `float`.
- `bool(x)` converts if possible the `x` value to a boolean value. The function
**`bool(x)`**
: Converts if possible the `x` value to a boolean value. The function
returns a `bool`.
### String related functions {.unnumbered}
- `printf(format,...)` allows to combine several values to build a string. `format` follows the
**`printf(format,...)`**
: Allows to combine several values to build a string. `format` follows the
classical C `printf` syntax. The function returns a `string`.
- `subspc(x)` substitutes every space in the `x` string by the underscore (`_`) character. The function
**`subspc(x)`**
: substitutes every space in the `x` string by the underscore (`_`) character. The function
returns a `string`.
### Condition function {.unnumbered}
**`ifelse(condition,val1,val2)`**
: The `condition` value has to be a `bool` value. If it is `true` the function returns `val1`,
otherwise, it is returning `val2`.
### Sequence analysis related function
**`composition(sequence)`**
: The nucleotide composition of the sequence is returned as as map indexed by `a`, `c`, `g`, or `t` and
each value is the number of occurrences of that nucleotide. A fifth key `others` accounts for
all others symboles.
**`gcskew(sequence)`**
: Computes the excess of g compare to c of the sequence, known as the GC skew.
$$
Skew_{GC}=\frac{G-C}{G+C}
$$
## Accessing to the sequence annotations
The `annotations` variable is a map object containing all the annotations associated to the currently processed sequence. Index of the map are the attribute names. It exists to possibillities to retreive
@ -53,4 +91,7 @@ Special attributes of the sequence are accessible only by dedicated methods of t
- The sequence identifier : `Id()`
- THe sequence definition : `Definition()`
```go
sequence.Id()
```

Binary file not shown.

View File

@ -20,6 +20,69 @@ ul.task-list li input[type="checkbox"] {
margin: 0 0.8em 0.2em -1.6em;
vertical-align: middle;
}
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { color: #008000; } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { color: #008000; font-weight: bold; } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>
@ -215,7 +278,7 @@ ul.task-list li input[type="checkbox"] {
<h2 id="toc-title">Table of contents</h2>
<ul>
<li><a href="#sequence-attributes" id="toc-sequence-attributes" class="nav-link active" data-scroll-target="#sequence-attributes"><span class="toc-section-number">A.0.1</span> Sequence attributes</a></li>
<li><a href="#sequence-attributes" id="toc-sequence-attributes" class="nav-link active" data-scroll-target="#sequence-attributes"><span class="toc-section-number">A.1</span> Sequence attributes</a></li>
</ul>
</nav>
</div>
@ -239,84 +302,82 @@ ul.task-list li input[type="checkbox"] {
</header>
<section id="sequence-attributes" class="level3" data-number="A.0.1">
<h3 data-number="A.0.1" class="anchored" data-anchor-id="sequence-attributes"><span class="header-section-number">A.0.1</span> Sequence attributes</h3>
<section id="reserved-sequence-attributes" class="level4" data-number="A.0.1.1">
<h4 data-number="A.0.1.1" class="anchored" data-anchor-id="reserved-sequence-attributes"><span class="header-section-number">A.0.1.1</span> Reserved sequence attributes</h4>
<section id="ali_dir" class="level5" data-number="A.0.1.1.1">
<h5 data-number="A.0.1.1.1" class="anchored" data-anchor-id="ali_dir"><span class="header-section-number">A.0.1.1.1</span> <code>ali_dir</code></h5>
<section id="type-string" class="level6" data-number="A.0.1.1.1.1">
<h6 data-number="A.0.1.1.1.1" class="anchored" data-anchor-id="type-string"><span class="header-section-number">A.0.1.1.1.1</span> Type : <code>string</code></h6>
<p>The attribute can contain 2 string values <code>"left"</code> or <code>"right".</code></p>
</section>
<section id="set-by-the-obipairing-tool" class="level6" data-number="A.0.1.1.1.2">
<h6 data-number="A.0.1.1.1.2" class="anchored" data-anchor-id="set-by-the-obipairing-tool"><span class="header-section-number">A.0.1.1.1.2</span> Set by the <em>obipairing</em> tool</h6>
<section id="sequence-attributes" class="level2" data-number="A.1">
<h2 data-number="A.1" class="anchored" data-anchor-id="sequence-attributes"><span class="header-section-number">A.1</span> Sequence attributes</h2>
<p><strong>ali_dir (<code>string</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
<li>The attribute can contain 2 string values <code>left</code> or <code>right</code>.</li>
</ul>
<p>The alignment generated by <em>obipairing</em> is a 3-end gap free algorithm. Two cases can occur when aligning the forward and reverse reads. If the barcode is long enough, both the reads overlap only on their 3 ends. In such case, the alignment direction <code>ali_dir</code> is set to <em>left</em>. If the barcode is shorter than the read length, the paired reads overlap by their 5 ends, and the complete barcode is sequenced by both the reads. In that later case, <code>ali_dir</code> is set to <em>right</em>.</p>
</section>
</section>
<section id="ali_length" class="level5" data-number="A.0.1.1.2">
<h5 data-number="A.0.1.1.2" class="anchored" data-anchor-id="ali_length"><span class="header-section-number">A.0.1.1.2</span> <code>ali_length</code></h5>
<section id="set-by-the-obipairing-tool-1" class="level6" data-number="A.0.1.1.2.1">
<h6 data-number="A.0.1.1.2.1" class="anchored" data-anchor-id="set-by-the-obipairing-tool-1"><span class="header-section-number">A.0.1.1.2.1</span> Set by the <em>obipairing</em> tool</h6>
<p><strong>ali_length (<code>int</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
</ul>
<p>Length of the aligned parts when merging forward and reverse reads</p>
</section>
</section>
<section id="count-the-number-of-sequence-occurrences" class="level5" data-number="A.0.1.1.3">
<h5 data-number="A.0.1.1.3" class="anchored" data-anchor-id="count-the-number-of-sequence-occurrences"><span class="header-section-number">A.0.1.1.3</span> <code>count</code> : the number of sequence occurrences</h5>
<section id="set-by-the-obiuniq-tool" class="level6" data-number="A.0.1.1.3.1">
<h6 data-number="A.0.1.1.3.1" class="anchored" data-anchor-id="set-by-the-obiuniq-tool"><span class="header-section-number">A.0.1.1.3.1</span> Set by the <em>obiuniq</em> tool</h6>
<p>The <code>count</code> attribute indicates how-many strictly identical sequences have been merged in a single record. It contains an integer value. If it is absent this means that the sequence record represents a single occurrence of the sequence.</p>
</section>
<section id="getter-method-count" class="level6" data-number="A.0.1.1.3.2">
<h6 data-number="A.0.1.1.3.2" class="anchored" data-anchor-id="getter-method-count"><span class="header-section-number">A.0.1.1.3.2</span> Getter : method <code>Count()</code></h6>
<p><strong>count (<code>int</code>)</strong></p>
<ul>
<li>Set by the <em>obiuniq</em> tool</li>
<li>Getter : method <code>Count()</code></li>
<li>Setter : method <code>SetCount(int)</code></li>
</ul>
<p>The <code>count</code> attribute indicates how-many strictly identical reads have been merged in a single record. It contains an integer value. If it is absent this means that the sequence record represents a single occurrence of the sequence.</p>
<p>The <code>Count()</code> method allows to access to the count attribute as an integer value. If the <code>count</code> attribute is not defined for the given sequence, the value <em>1</em> is returned</p>
</section>
</section>
<section id="merged_" class="level5" data-number="A.0.1.1.4">
<h5 data-number="A.0.1.1.4" class="anchored" data-anchor-id="merged_"><span class="header-section-number">A.0.1.1.4</span> <code>merged_*</code></h5>
<section id="type-mapstringint" class="level6" data-number="A.0.1.1.4.1">
<h6 data-number="A.0.1.1.4.1" class="anchored" data-anchor-id="type-mapstringint"><span class="header-section-number">A.0.1.1.4.1</span> Type : <code>map[string]int</code></h6>
</section>
<section id="set-by-the-obiuniq-tool-1" class="level6" data-number="A.0.1.1.4.2">
<h6 data-number="A.0.1.1.4.2" class="anchored" data-anchor-id="set-by-the-obiuniq-tool-1"><span class="header-section-number">A.0.1.1.4.2</span> Set by the <em>obiuniq</em> tool</h6>
<p><strong>merged_* (<code>map[string]int</code>)</strong></p>
<ul>
<li>Set by the <em>obiuniq</em> tool</li>
</ul>
<p>The <code>-m</code> option of the <em>obiuniq</em> tools allows for keeping track of the distribution of the values stored in given attribute of interest. Often this option is used to summarise distribution of a sequence variant accross samples when <em>obiuniq</em> is run after running <em>obimultiplex</em>. The actual name of the attribute depends on the name of the monitored attribute. If <code>-m</code> option is used with the attribute <em>sample</em>, then this attribute names <em>merged_sample</em>.</p>
</section>
</section>
<section id="mode" class="level5" data-number="A.0.1.1.5">
<h5 data-number="A.0.1.1.5" class="anchored" data-anchor-id="mode"><span class="header-section-number">A.0.1.1.5</span> <code>mode</code></h5>
<section id="set-by-the-obipairing-tool-2" class="level6" data-number="A.0.1.1.5.1">
<h6 data-number="A.0.1.1.5.1" class="anchored" data-anchor-id="set-by-the-obipairing-tool-2"><span class="header-section-number">A.0.1.1.5.1</span> Set by the <em>obipairing</em> tool</h6>
<p><strong><code>obitag_ref_index</code></strong></p>
</section>
<section id="set-by-the-obirefidx-tool." class="level6" data-number="A.0.1.1.5.2">
<h6 data-number="A.0.1.1.5.2" class="anchored" data-anchor-id="set-by-the-obirefidx-tool."><span class="header-section-number">A.0.1.1.5.2</span> Set by the <em>obirefidx</em> tool.</h6>
<p><strong>mode (<code>string</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
<li>The attribute can contain 2 string values <code>join</code> or <code>alignment</code>.</li>
</ul>
<p><strong>obitag_ref_index (<code>map[string]string</code>)</strong></p>
<ul>
<li>Set by the <em>obirefidx</em> tool.</li>
</ul>
<p>It resumes to which taxonomic annotation a match to that sequence must lead according to the number of differences existing between the query sequence and the reference sequence having that tag.</p>
</section>
<section id="getter-method-count-1" class="level6" data-number="A.0.1.1.5.3">
<h6 data-number="A.0.1.1.5.3" class="anchored" data-anchor-id="getter-method-count-1"><span class="header-section-number">A.0.1.1.5.3</span> Getter : method <code>Count()</code></h6>
</section>
</section>
<section id="pairing_mismatches" class="level5" data-number="A.0.1.1.6">
<h5 data-number="A.0.1.1.6" class="anchored" data-anchor-id="pairing_mismatches"><span class="header-section-number">A.0.1.1.6</span> <code>pairing_mismatches</code></h5>
<section id="set-by-the-obipairing-tool-3" class="level6" data-number="A.0.1.1.6.1">
<h6 data-number="A.0.1.1.6.1" class="anchored" data-anchor-id="set-by-the-obipairing-tool-3"><span class="header-section-number">A.0.1.1.6.1</span> Set by the <em>obipairing</em> tool</h6>
</section>
</section>
<section id="score" class="level5" data-number="A.0.1.1.7">
<h5 data-number="A.0.1.1.7" class="anchored" data-anchor-id="score"><span class="header-section-number">A.0.1.1.7</span> <code>score</code></h5>
<section id="set-by-the-obipairing-tool-4" class="level6" data-number="A.0.1.1.7.1">
<h6 data-number="A.0.1.1.7.1" class="anchored" data-anchor-id="set-by-the-obipairing-tool-4"><span class="header-section-number">A.0.1.1.7.1</span> Set by the <em>obipairing</em> tool</h6>
</section>
</section>
<section id="score_norm" class="level5" data-number="A.0.1.1.8">
<h5 data-number="A.0.1.1.8" class="anchored" data-anchor-id="score_norm"><span class="header-section-number">A.0.1.1.8</span> <code>score_norm</code></h5>
<section id="set-by-the-obipairing-tool-5" class="level6" data-number="A.0.1.1.8.1">
<h6 data-number="A.0.1.1.8.1" class="anchored" data-anchor-id="set-by-the-obipairing-tool-5"><span class="header-section-number">A.0.1.1.8.1</span> Set by the <em>obipairing</em> tool</h6>
<div class="sourceCode" id="cb1"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"0"</span><span class="fu">:</span><span class="st">"9606@Homo sapiens@species"</span><span class="fu">,</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"2"</span><span class="fu">:</span><span class="st">"207598@Homininae@subfamily"</span><span class="fu">,</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"3"</span><span class="fu">:</span><span class="st">"9604@Hominidae@family"</span><span class="fu">,</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">"8"</span><span class="fu">:</span><span class="st">"314295@Hominoidea@superfamily"</span><span class="fu">,</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"10"</span><span class="fu">:</span><span class="st">"9526@Catarrhini@parvorder"</span><span class="fu">,</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">"12"</span><span class="fu">:</span><span class="st">"1437010@Boreoeutheria@clade"</span><span class="fu">,</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">"16"</span><span class="fu">:</span><span class="st">"9347@Eutheria@clade"</span><span class="fu">,</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">"17"</span><span class="fu">:</span><span class="st">"40674@Mammalia@class"</span><span class="fu">,</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">"22"</span><span class="fu">:</span><span class="st">"117571@Euteleostomi@clade"</span><span class="fu">,</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">"25"</span><span class="fu">:</span><span class="st">"7776@Gnathostomata@clade"</span><span class="fu">,</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">"29"</span><span class="fu">:</span><span class="st">"33213@Bilateria@clade"</span><span class="fu">,</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">"30"</span><span class="fu">:</span><span class="st">"6072@Eumetazoa@clade"</span><span class="fu">}</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p><strong>pairing_mismatches (<code>map[string]string</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
</ul>
<p><strong>seq_a_single (<code>int</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
</ul>
<p><strong>seq_ab_match (<code>int</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
</ul>
<p><strong>seq_b_single (<code>int</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
</ul>
<p><strong>score (<code>int</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
</ul>
<p><strong>score_norm (<code>float</code>)</strong></p>
<ul>
<li>Set by the <em>obipairing</em> tool</li>
<li>The value ranges between 0 and 1.</li>
</ul>
<p>Score of the alignment between forward and reverse reads expressed as a fraction of identity.</p>
</section>
</section>
</section>
</section>
</main> <!-- /main -->

View File

@ -314,6 +314,23 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<section id="selecting-sequences-based-on-their-caracteristics" class="level4" data-number="12.1.1.1">
<h4 data-number="12.1.1.1" class="anchored" data-anchor-id="selecting-sequences-based-on-their-caracteristics"><span class="header-section-number">12.1.1.1</span> Selecting sequences based on their caracteristics</h4>
<p>Sequences can be selected on several of their caracteristics, their length, their id, their sequence. Options allow for specifying the condition if selection.</p>
<p><strong>Selection based on the sequence</strong></p>
<p>Sequence records can be selected according if they match or not with a pattern. The simplest pattern is as short sequence (<em>e.g</em> <code>AACCTT</code>). But the usage of regular patterns allows for looking for more complex pattern. As example, <code>A[TG]C+G</code> matches a <code>A</code>, followed by a <code>T</code> or a <code>G</code>, then one or several <code>C</code> and endly a <code>G</code>.</p>
<dl>
<dt><strong>--sequence</strong>|<strong>-s</strong> <em>PATTERN</em></dt>
<dd>
<p>Regular expression pattern to be tested against the sequence itself. The pattern is case insensitive. A complete description of the regular pattern grammar is available <a href="https://yourbasic.org/golang/regexp-cheat-sheet/#cheat-sheet">here</a>.</p>
</dd>
<dt><em>Examples:</em></dt>
<dd>
<p>Selects only the sequence records that contain an <em>EcoRI</em> restriction site.</p>
</dd>
</dl>
<div class="sourceCode" id="cb1"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ex">obigrep</span> <span class="at">-s</span> <span class="st">'GAATTC'</span> seq1.fasta <span class="op">&gt;</span> seq2.fasta</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>: Selects only the sequence records that contain a stretch of at least 10 <code>A</code>.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ex">obigrep</span> <span class="at">-s</span> <span class="st">'A{10,}'</span> seq1.fasta <span class="op">&gt;</span> seq2.fasta</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>: Selects only the sequence records that do not contain ambiguous nucleotides.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ex">obigrep</span> <span class="at">-s</span> <span class="st">'^[ACGT]+$'</span> seq1.fasta <span class="op">&gt;</span> seq2.fasta</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<dl>
<dt><strong>--min-count</strong> | <strong>-c</strong> <em>COUNT</em></dt>
<dd>
@ -323,12 +340,12 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<dd>
<p>only sequences reprensenting no more than <em>COUNT</em> reads will be selected. That option rely on the <code>count</code> attribute. If the <code>count</code> attribute is not defined for a sequence record, it is assumed equal to <span class="math inline">\(1\)</span>.</p>
</dd>
<dt>Example</dt>
<dt><em>Examples</em></dt>
<dd>
<p>Selecting sequence records representing at least five reads in the dataset.</p>
</dd>
</dl>
<div class="sourceCode" id="cb1"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ex">obigrep</span> <span class="at">-c</span> 5 data_SPER01.fasta <span class="op">&gt;</span> data_norare_SPER01.fasta</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="sourceCode" id="cb4"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ex">obigrep</span> <span class="at">-c</span> 5 data_SPER01.fasta <span class="op">&gt;</span> data_norare_SPER01.fasta</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</section>

View File

@ -124,6 +124,7 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
}
}</script>
<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js" type="text/javascript"></script>
</head>
@ -284,6 +285,8 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li><a href="#instrospection-functions" id="toc-instrospection-functions" class="nav-link" data-scroll-target="#instrospection-functions">Instrospection functions</a></li>
<li><a href="#cast-functions" id="toc-cast-functions" class="nav-link" data-scroll-target="#cast-functions">Cast functions</a></li>
<li><a href="#string-related-functions" id="toc-string-related-functions" class="nav-link" data-scroll-target="#string-related-functions">String related functions</a></li>
<li><a href="#condition-function" id="toc-condition-function" class="nav-link" data-scroll-target="#condition-function">Condition function</a></li>
<li><a href="#sequence-analysis-related-function" id="toc-sequence-analysis-related-function" class="nav-link" data-scroll-target="#sequence-analysis-related-function"><span class="toc-section-number">7.2.1</span> Sequence analysis related function</a></li>
</ul></li>
<li><a href="#accessing-to-the-sequence-annotations" id="toc-accessing-to-the-sequence-annotations" class="nav-link" data-scroll-target="#accessing-to-the-sequence-annotations"><span class="toc-section-number">7.3</span> Accessing to the sequence annotations</a></li>
</ul>
@ -321,24 +324,67 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<h2 data-number="7.2" class="anchored" data-anchor-id="function-defined-in-the-language"><span class="header-section-number">7.2</span> Function defined in the language</h2>
<section id="instrospection-functions" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="instrospection-functions">Instrospection functions</h3>
<ul>
<li><code>len(x)</code>is a generic function allowing to retreive the size of a object. It returns the length of a sequences, the number of element in a map like <code>annotations</code>, the number of elements in an array. The reurned value is an <code>int</code>.</li>
</ul>
<dl>
<dt><strong><code>len(x)</code></strong></dt>
<dd>
<p>It is a generic function allowing to retreive the size of a object. It returns the length of a sequences, the number of element in a map like <code>annotations</code>, the number of elements in an array. The reurned value is an <code>int</code>.</p>
</dd>
</dl>
</section>
<section id="cast-functions" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="cast-functions">Cast functions</h3>
<ul>
<li><code>int(x)</code> converts if possible the <code>x</code> value to an integer value. The function returns an <code>int</code>.</li>
<li><code>numeric(x)</code> converts if possible the <code>x</code> value to a float value. The function returns a <code>float</code>.</li>
<li><code>bool(x)</code> converts if possible the <code>x</code> value to a boolean value. The function returns a <code>bool</code>.</li>
</ul>
<dl>
<dt><strong><code>int(x)</code></strong></dt>
<dd>
<p>Converts if possible the <code>x</code> value to an integer value. The function returns an <code>int</code>.</p>
</dd>
<dt><strong><code>numeric(x)</code></strong></dt>
<dd>
<p>Converts if possible the <code>x</code> value to a float value. The function returns a <code>float</code>.</p>
</dd>
<dt><strong><code>bool(x)</code></strong></dt>
<dd>
<p>Converts if possible the <code>x</code> value to a boolean value. The function returns a <code>bool</code>.</p>
</dd>
</dl>
</section>
<section id="string-related-functions" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="string-related-functions">String related functions</h3>
<ul>
<li><code>printf(format,...)</code> allows to combine several values to build a string. <code>format</code> follows the classical C <code>printf</code> syntax. The function returns a <code>string</code>.</li>
<li><code>subspc(x)</code> substitutes every space in the <code>x</code> string by the underscore (<code>_</code>) character. The function returns a <code>string</code>.</li>
</ul>
<dl>
<dt><strong><code>printf(format,...)</code></strong></dt>
<dd>
<p>Allows to combine several values to build a string. <code>format</code> follows the classical C <code>printf</code> syntax. The function returns a <code>string</code>.</p>
</dd>
<dt><strong><code>subspc(x)</code></strong></dt>
<dd>
<p>substitutes every space in the <code>x</code> string by the underscore (<code>_</code>) character. The function returns a <code>string</code>.</p>
</dd>
</dl>
</section>
<section id="condition-function" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="condition-function">Condition function</h3>
<dl>
<dt><strong><code>ifelse(condition,val1,val2)</code></strong></dt>
<dd>
<p>The <code>condition</code> value has to be a <code>bool</code> value. If it is <code>true</code> the function returns <code>val1</code>, otherwise, it is returning <code>val2</code>.</p>
</dd>
</dl>
</section>
<section id="sequence-analysis-related-function" class="level3" data-number="7.2.1">
<h3 data-number="7.2.1" class="anchored" data-anchor-id="sequence-analysis-related-function"><span class="header-section-number">7.2.1</span> Sequence analysis related function</h3>
<dl>
<dt><strong><code>composition(sequence)</code></strong></dt>
<dd>
<p>The nucleotide composition of the sequence is returned as as map indexed by <code>a</code>, <code>c</code>, <code>g</code>, or <code>t</code> and each value is the number of occurrences of that nucleotide. A fifth key <code>others</code> accounts for all others symboles.</p>
</dd>
<dt><strong><code>gcskew(sequence)</code></strong></dt>
<dd>
<p>Computes the excess of g compare to c of the sequence, known as the GC skew.</p>
<p><span class="math display">\[
Skew_{GC}=\frac{G-C}{G+C}
\]</span></p>
</dd>
</dl>
</section>
</section>
<section id="accessing-to-the-sequence-annotations" class="level2" data-number="7.3">
@ -352,6 +398,7 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
<li>The sequence identifier : <code>Id()</code></li>
<li>THe sequence definition : <code>Definition()</code></li>
</ul>
<div class="sourceCode" id="cb3"><pre class="sourceCode go code-with-copy"><code class="sourceCode go"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>sequence<span class="op">.</span>Id<span class="op">()</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</section>

View File

@ -174,14 +174,22 @@ selected.
That option rely on the \f[V]count\f[R] attribute.
If the \f[V]count\f[R] attribute is not defined for a sequence record,
it is assumed equal to 1.
.PP
.TP
\f[B]--max-length\f[R] | \f[B]-L\f[R] \f[I]LENGTH\f[R]
.PP
Keeps sequence records whose sequence length is equal or shorter than
\f[I]LENGTH\f[R].
.TP
\f[B]--min-length\f[R] | \f[B]-l\f[R] \f[I]LENGTH\f[R]
Keeps sequence records whose sequence length is equal or longer than
\f[I]LENGTH\f[R].
.PP
\f[B]--predicate\f[R]|\f[B]-p\f[R] \f[I]EXPRESSION\f[R]
.PP
.TP
\f[B]--sequence\f[R]|\f[B]-s\f[R] \f[I]PATTERN\f[R]
Regular expression pattern to be tested against the sequence itself.
The pattern is case insensitive.
A complete description of the regular pattern grammar is available
here (https://yourbasic.org/golang/regexp-cheat-sheet/#cheat-sheet).
.PP
\f[B]--inverse-match\f[R] | \f[B]-v\f[R]
.PP

View File

@ -0,0 +1,3 @@
**\--max-length** | **-L** _LENGTH_
: Keeps sequence records whose sequence length is equal or shorter than _LENGTH_.

View File

@ -0,0 +1,3 @@
**\--min-length** | **-l** _LENGTH_
: Keeps sequence records whose sequence length is equal or longer than _LENGTH_.

View File

@ -0,0 +1,7 @@
**\--sequence**|**-s** _PATTERN_
: Regular expression pattern to be tested against the
sequence itself. The pattern is case insensitive. A
complete description of the regular pattern grammar
is available [here](https://yourbasic.org/golang/regexp-cheat-sheet/#cheat-sheet).

View File

@ -99,13 +99,13 @@ The OBITools are centered around the [FASTA] (https://en.wikipedia.org/wiki/FAST
{{< include ../lib/options/selection/_min-count.qmd >}}
**\--max-length** | **-L** _LENGTH_
{{< include ../lib/options/selection/_max-length.qmd >}}
**\--min-length** | **-l** _LENGTH_
{{< include ../lib/options/selection/_min-length.qmd >}}
**\--predicate**|**-p** _EXPRESSION_
**\--sequence**|**-s** _PATTERN_
{{< include ../lib/options/selection/_sequence.qmd >}}
**\--inverse-match** | **-v**

View File

@ -13,6 +13,7 @@ import (
"github.com/barkimedes/go-deepcopy"
)
// InterfaceToInt converts a interface{} to an integer value if possible.
// If not a "NotAnInteger" error is returned via the err
// return value and val is set to 0.
@ -302,15 +303,6 @@ func ReadLines(path string) (lines []string, err error) {
return
}
func Contains[T comparable](arr []T, x T) bool {
for _, v := range arr {
if v == x {
return true
}
}
return false
}
func AtomicCounter(initial ...int) func() int {
counterMutex := sync.Mutex{}
counter := 0

24
pkg/goutils/slices.go Normal file
View File

@ -0,0 +1,24 @@
package goutils
func Contains[T comparable](arr []T, x T) bool {
for _, v := range arr {
if v == x {
return true
}
}
return false
}
func LookFor[T comparable](arr []T, x T) int {
for i, v := range arr {
if v == x {
return i
}
}
return -1
}
func RemoveIndex[T comparable](s []T, index int) []T {
return append(s[:index], s[index+1:]...)
}

View File

@ -13,7 +13,6 @@ type _Options struct {
circular bool
forwardError int
reverseError int
bufferSize int
batchSize int
parallelWorkers int
forward ApatPattern
@ -66,12 +65,6 @@ func (options Options) Circular() bool {
return options.pointer.circular
}
// BufferSize returns the size of the channel
// buffer specified by the options
func (options Options) BufferSize() int {
return options.pointer.bufferSize
}
// BatchSize returns the size of the
// sequence batch used by the PCR algorithm
func (options Options) BatchSize() int {
@ -95,7 +88,6 @@ func MakeOptions(setters []WithOption) Options {
circular: false,
parallelWorkers: 4,
batchSize: 100,
bufferSize: 100,
forward: NilApatPattern,
cfwd: NilApatPattern,
reverse: NilApatPattern,
@ -188,16 +180,6 @@ func OptionCircular(circular bool) WithOption {
return f
}
// OptionBufferSize sets the requested channel
// buffer size.
func OptionBufferSize(size int) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.bufferSize = size
})
return f
}
// OptionParallelWorkers sets how many search
// jobs will be run in parallel.
func OptionParallelWorkers(nworkers int) WithOption {

View File

@ -36,20 +36,14 @@ func find(root, ext string) []string {
}
func ISequenceChunkOnDisk(iterator obiiter.IBioSequence,
classifier *obiseq.BioSequenceClassifier,
sizes ...int) (obiiter.IBioSequence, error) {
classifier *obiseq.BioSequenceClassifier) (obiiter.IBioSequence, error) {
dir, err := tempDir()
if err != nil {
return obiiter.NilIBioSequence, err
}
bufferSize := iterator.BufferSize()
if len(sizes) > 0 {
bufferSize = sizes[0]
}
newIter := obiiter.MakeIBioSequence(bufferSize)
newIter := obiiter.MakeIBioSequence()
newIter.Add(1)

View File

@ -10,16 +10,9 @@ import (
)
func ISequenceChunk(iterator obiiter.IBioSequence,
classifier *obiseq.BioSequenceClassifier,
sizes ...int) (obiiter.IBioSequence, error) {
classifier *obiseq.BioSequenceClassifier) (obiiter.IBioSequence, error) {
bufferSize := iterator.BufferSize()
if len(sizes) > 0 {
bufferSize = sizes[0]
}
newIter := obiiter.MakeIBioSequence(bufferSize)
newIter := obiiter.MakeIBioSequence()
newIter.Add(1)

View File

@ -6,7 +6,6 @@ type __options__ struct {
navalue string
cacheOnDisk bool
batchCount int
bufferSize int
batchSize int
parallelWorkers int
noSingleton bool
@ -25,7 +24,6 @@ func MakeOptions(setters []WithOption) Options {
navalue: "NA",
cacheOnDisk: false,
batchCount: 100,
bufferSize: 2,
batchSize: 5000,
parallelWorkers: 4,
noSingleton: false,
@ -65,10 +63,6 @@ func (opt Options) BatchCount() int {
return opt.pointer.batchCount
}
func (opt Options) BufferSize() int {
return opt.pointer.bufferSize
}
func (opt Options) BatchSize() int {
return opt.pointer.batchSize
}
@ -148,14 +142,6 @@ func OptionsBatchSize(size int) WithOption {
return f
}
func OptionsBufferSize(size int) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.bufferSize = size
})
return f
}
func OptionsNoSingleton() WithOption {
f := WithOption(func(opt Options) {
opt.pointer.noSingleton = true

View File

@ -58,20 +58,13 @@ func (by _By) Sort(seqs []sSS) {
func ISequenceSubChunk(iterator obiiter.IBioSequence,
classifier *obiseq.BioSequenceClassifier,
sizes ...int) (obiiter.IBioSequence, error) {
nworkers int) (obiiter.IBioSequence, error) {
bufferSize := iterator.BufferSize()
nworkers := 4
if len(sizes) > 0 {
nworkers = sizes[0]
if nworkers <=0 {
nworkers = 4
}
if len(sizes) > 1 {
bufferSize = sizes[1]
}
newIter := obiiter.MakeIBioSequence(bufferSize)
newIter := obiiter.MakeIBioSequence()
newIter.Add(nworkers)

View File

@ -19,7 +19,7 @@ func IUniqueSequence(iterator obiiter.IBioSequence,
opts := MakeOptions(options)
nworkers := opts.ParallelWorkers()
iUnique := obiiter.MakeIBioSequence(opts.BufferSize())
iUnique := obiiter.MakeIBioSequence()
iterator = iterator.Speed("Splitting data set")
@ -28,8 +28,7 @@ func IUniqueSequence(iterator obiiter.IBioSequence,
if opts.SortOnDisk() {
nworkers = 1
iterator, err = ISequenceChunkOnDisk(iterator,
obiseq.HashClassifier(opts.BatchCount()),
0)
obiseq.HashClassifier(opts.BatchCount()))
if err != nil {
return obiiter.NilIBioSequence, err
@ -37,8 +36,7 @@ func IUniqueSequence(iterator obiiter.IBioSequence,
} else {
iterator, err = ISequenceChunk(iterator,
obiseq.HashClassifier(opts.BatchCount()),
opts.BufferSize())
obiseq.HashClassifier(opts.BatchCount()))
if err != nil {
return obiiter.NilIBioSequence, err
@ -78,12 +76,11 @@ func IUniqueSequence(iterator obiiter.IBioSequence,
icat--
input, err = ISequenceSubChunk(input,
classifier,
1,
opts.BufferSize())
1)
var next obiiter.IBioSequence
if icat >= 0 {
next = obiiter.MakeIBioSequence(opts.BufferSize())
next = obiiter.MakeIBioSequence()
iUnique.Add(1)
@ -130,7 +127,6 @@ func IUniqueSequence(iterator obiiter.IBioSequence,
iMerged := iUnique.IMergeSequenceBatch(opts.NAValue(),
opts.StatsOn(),
opts.BufferSize(),
)
return iMerged, nil

View File

@ -0,0 +1,248 @@
package obiformats
import (
"bytes"
"encoding/csv"
"fmt"
"io"
"os"
"sync"
"time"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/goutils"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiiter"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiseq"
log "github.com/sirupsen/logrus"
)
func CSVRecord(sequence *obiseq.BioSequence, opt Options) []string {
keys := opt.CSVKeys()
record := make([]string, 0, len(keys)+4)
if opt.CSVId() {
record = append(record, sequence.Id())
}
if opt.CSVCount() {
record = append(record, fmt.Sprint(sequence.Count()))
}
if opt.CSVTaxon() {
taxid := sequence.Taxid()
sn, ok := sequence.GetAttribute("scientific_name")
if !ok {
if taxid == 1 {
sn = "root"
} else {
sn = opt.CSVNAValue()
}
}
record = append(record, fmt.Sprint(taxid), fmt.Sprint(sn))
}
if opt.CSVDefinition() {
record = append(record, sequence.Definition())
}
for _, key := range opt.CSVKeys() {
value, ok := sequence.GetAttribute(key)
if !ok {
value = opt.CSVNAValue()
}
svalue, _ := goutils.InterfaceToString(value)
record = append(record, svalue)
}
if opt.CSVSequence() {
record = append(record, string(sequence.Sequence()))
}
if opt.CSVQuality() {
if sequence.HasQualities() {
l := sequence.Len()
q := sequence.Qualities()
ascii := make([]byte, l)
quality_shift := opt.QualityShift()
for j := 0; j < l; j++ {
ascii[j] = uint8(q[j]) + uint8(quality_shift)
}
record = append(record, string(ascii))
} else {
record = append(record, opt.CSVNAValue())
}
}
return record
}
func CSVHeader(opt Options) []string {
keys := opt.CSVKeys()
record := make([]string, 0, len(keys)+4)
if opt.CSVId() {
record = append(record, "id")
}
if opt.CSVCount() {
record = append(record, "count")
}
if opt.CSVTaxon() {
record = append(record, "taxid", "scientific_name")
}
if opt.CSVDefinition() {
record = append(record, "definition")
}
record = append(record, opt.CSVKeys()...)
if opt.CSVSequence() {
record = append(record, "sequence")
}
if opt.CSVQuality() {
record = append(record, "quality")
}
return record
}
func FormatCVSBatch(batch obiiter.BioSequenceBatch, opt Options) []byte {
buff := new(bytes.Buffer)
csv := csv.NewWriter(buff)
if batch.Order() == 0 {
csv.Write(CSVHeader(opt))
}
for _, s := range batch.Slice() {
csv.Write(CSVRecord(s, opt))
}
csv.Flush()
return buff.Bytes()
}
func WriteCSV(iterator obiiter.IBioSequence,
file io.WriteCloser,
options ...WithOption) (obiiter.IBioSequence, error) {
opt := MakeOptions(options)
file, _ = goutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
newIter := obiiter.MakeIBioSequence()
nwriters := opt.ParallelWorkers()
obiiter.RegisterAPipe()
chunkchan := make(chan FileChunck)
newIter.Add(nwriters)
var waitWriter sync.WaitGroup
go func() {
newIter.WaitAndClose()
for len(chunkchan) > 0 {
time.Sleep(time.Millisecond)
}
close(chunkchan)
waitWriter.Wait()
}()
ff := func(iterator obiiter.IBioSequence) {
for iterator.Next() {
batch := iterator.Get()
chunkchan <- FileChunck{
FormatCVSBatch(batch, opt),
batch.Order(),
}
newIter.Push(batch)
}
newIter.Done()
}
log.Debugln("Start of the CSV file writing")
go ff(iterator)
for i := 0; i < nwriters-1; i++ {
go ff(iterator.Split())
}
next_to_send := 0
received := make(map[int]FileChunck, 100)
waitWriter.Add(1)
go func() {
for chunk := range chunkchan {
if chunk.order == next_to_send {
file.Write(chunk.text)
next_to_send++
chunk, ok := received[next_to_send]
for ok {
file.Write(chunk.text)
delete(received, next_to_send)
next_to_send++
chunk, ok = received[next_to_send]
}
} else {
received[chunk.order] = chunk
}
}
file.Close()
log.Debugln("End of the CSV file writing")
obiiter.UnregisterPipe()
waitWriter.Done()
}()
return newIter, nil
}
func WriteCSVToStdout(iterator obiiter.IBioSequence,
options ...WithOption) (obiiter.IBioSequence, error) {
options = append(options, OptionDontCloseFile())
return WriteCSV(iterator, os.Stdout, options...)
}
func WriteCSVToFile(iterator obiiter.IBioSequence,
filename string,
options ...WithOption) (obiiter.IBioSequence, error) {
opt := MakeOptions(options)
flags := os.O_WRONLY | os.O_CREATE
if opt.AppendFile() {
flags |= os.O_APPEND
}
file, err := os.OpenFile(filename, flags, 0660)
if err != nil {
log.Fatalf("open file error: %v", err)
return obiiter.NilIBioSequence, err
}
options = append(options, OptionCloseFile())
iterator, err = WriteCSV(iterator, file, options...)
if opt.HaveToSavePaired() {
var revfile *os.File
revfile, err = os.OpenFile(opt.PairedFileName(), flags, 0660)
if err != nil {
log.Fatalf("open file error: %v", err)
return obiiter.NilIBioSequence, err
}
iterator, err = WriteCSV(iterator.PairedWith(), revfile, options...)
}
return iterator, err
}

View File

@ -166,7 +166,7 @@ func ReadEcoPCR(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
opt := MakeOptions(options)
newIter := obiiter.MakeIBioSequence(opt.BufferSize())
newIter := obiiter.MakeIBioSequence()
newIter.Add(1)
go func() {

View File

@ -244,9 +244,9 @@ func _ReadFlatFileChunk(reader io.Reader, readers chan _FileChunk) {
// <CR>?<LF>//<CR>?<LF>
func ReadEMBL(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
opt := MakeOptions(options)
entry_channel := make(chan _FileChunk, opt.BufferSize())
entry_channel := make(chan _FileChunk)
newIter := obiiter.MakeIBioSequence(opt.BufferSize())
newIter := obiiter.MakeIBioSequence()
nworkers := opt.ParallelWorkers()
newIter.Add(nworkers)

View File

@ -19,6 +19,5 @@ func IParseFastSeqHeaderBatch(iterator obiiter.IBioSequence,
options ...WithOption) obiiter.IBioSequence {
opt := MakeOptions(options)
return iterator.MakeIWorker(obiseq.AnnotatorToSeqWorker(opt.ParseFastSeqHeader()),
opt.ParallelWorkers(),
opt.BufferSize())
opt.ParallelWorkers())
}

View File

@ -105,7 +105,7 @@ func ReadFastSeqFromFile(filename string, options ...WithOption) (obiiter.IBioSe
size = -1
}
newIter := obiiter.MakeIBioSequence(opt.BufferSize())
newIter := obiiter.MakeIBioSequence()
newIter.Add(1)
go func() {
@ -127,7 +127,7 @@ func ReadFastSeqFromFile(filename string, options ...WithOption) (obiiter.IBioSe
func ReadFastSeqFromStdin(options ...WithOption) obiiter.IBioSequence {
opt := MakeOptions(options)
newIter := obiiter.MakeIBioSequence(opt.BufferSize())
newIter := obiiter.MakeIBioSequence()
newIter.Add(1)

View File

@ -71,8 +71,7 @@ func WriteFasta(iterator obiiter.IBioSequence,
file, _ = goutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
buffsize := iterator.BufferSize()
newIter := obiiter.MakeIBioSequence(buffsize)
newIter := obiiter.MakeIBioSequence()
nwriters := opt.ParallelWorkers()

View File

@ -60,8 +60,7 @@ func WriteFastq(iterator obiiter.IBioSequence,
file, _ = goutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
buffsize := iterator.BufferSize()
newIter := obiiter.MakeIBioSequence(buffsize)
newIter := obiiter.MakeIBioSequence()
nwriters := opt.ParallelWorkers()

View File

@ -113,9 +113,9 @@ func _ParseGenbankFile(input <-chan _FileChunk, out obiiter.IBioSequence) {
func ReadGenbank(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
opt := MakeOptions(options)
entry_channel := make(chan _FileChunk, opt.BufferSize())
entry_channel := make(chan _FileChunk)
newIter := obiiter.MakeIBioSequence(opt.BufferSize())
newIter := obiiter.MakeIBioSequence()
nworkers := opt.ParallelWorkers()
newIter.Add(nworkers)

View File

@ -15,10 +15,15 @@ type __options__ struct {
closefile bool
appendfile bool
compressed bool
csv_ids bool
cvs_sequence bool
csv_id bool
csv_sequence bool
csv_quality bool
csv_definition bool
csv_count bool
csv_taxon bool
csv_keys []string
csv_separator string
csv_navalue string
paired_filename string
}
@ -40,10 +45,15 @@ func MakeOptions(setters []WithOption) Options {
closefile: false,
appendfile: false,
compressed: false,
csv_ids: true,
csv_id: true,
csv_definition: false,
cvs_sequence: true,
csv_count: false,
csv_taxon: false,
csv_sequence: true,
csv_quality: false,
csv_separator: ",",
csv_navalue: "NA",
csv_keys: make([]string, 0),
paired_filename: "",
}
@ -60,10 +70,6 @@ func (opt Options) QualityShift() int {
return opt.pointer.quality_shift
}
func (opt Options) BufferSize() int {
return opt.pointer.buffer_size
}
func (opt Options) BatchSize() int {
return opt.pointer.batch_size
}
@ -96,8 +102,40 @@ func (opt Options) CompressedFile() bool {
return opt.pointer.compressed
}
func (opt Options) CSVIds() bool {
return opt.pointer.csv_ids
func (opt Options) CSVId() bool {
return opt.pointer.csv_id
}
func (opt Options) CSVDefinition() bool {
return opt.pointer.csv_definition
}
func (opt Options) CSVCount() bool {
return opt.pointer.csv_count
}
func (opt Options) CSVTaxon() bool {
return opt.pointer.csv_taxon
}
func (opt Options) CSVSequence() bool {
return opt.pointer.csv_sequence
}
func (opt Options) CSVQuality() bool {
return opt.pointer.csv_quality
}
func (opt Options) CSVKeys() []string {
return opt.pointer.csv_keys
}
func (opt Options) CSVSeparator() string {
return opt.pointer.csv_separator
}
func (opt Options) CSVNAValue() string {
return opt.pointer.csv_navalue
}
func (opt Options) HaveToSavePaired() bool {
@ -108,14 +146,6 @@ func (opt Options) PairedFileName() string {
return opt.pointer.paired_filename
}
func OptionsBufferSize(size int) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.buffer_size = size
})
return f
}
func OptionCloseFile() WithOption {
f := WithOption(func(opt Options) {
opt.pointer.closefile = true
@ -247,3 +277,82 @@ func WritePairedReadsTo(filename string) WithOption {
return f
}
func CSVId(include bool) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_id = include
})
return f
}
func CSVSequence(include bool) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_sequence = include
})
return f
}
func CSVQuality(include bool) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_quality = include
})
return f
}
func CSVDefinition(include bool) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_definition = include
})
return f
}
func CSVCount(include bool) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_count = include
})
return f
}
func CSVTaxon(include bool) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_taxon = include
})
return f
}
func CSVKey(key string) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_keys = append(opt.pointer.csv_keys, key)
})
return f
}
func CSVKeys(keys []string) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_keys = append(opt.pointer.csv_keys, keys...)
})
return f
}
func CSVSeparator(separator string) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_separator = separator
})
return f
}
func CSVNAValue(navalue string) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.csv_navalue = navalue
})
return f
}

View File

@ -60,17 +60,11 @@ type IBioSequence struct {
var NilIBioSequence = IBioSequence{pointer: nil}
func MakeIBioSequence(sizes ...int) IBioSequence {
buffsize := int32(0)
if len(sizes) > 0 {
buffsize = int32(sizes[0])
}
i := _IBioSequence{
channel: make(chan BioSequenceBatch, buffsize),
channel: make(chan BioSequenceBatch),
current: NilBioSequenceBatch,
pushBack: abool.New(),
buffer_size: buffsize,
batch_size: -1,
sequence_format: "",
finished: abool.New(),
@ -160,14 +154,6 @@ func (iterator IBioSequence) IsNil() bool {
return iterator.pointer == nil
}
func (iterator IBioSequence) BufferSize() int {
if iterator.pointer == nil {
log.Panic("call of IBioSequenceBatch.BufferSize method on NilIBioSequenceBatch")
}
return int(atomic.LoadInt32(&iterator.pointer.buffer_size))
}
func (iterator IBioSequence) BatchSize() int {
if iterator.pointer == nil {
log.Panic("call of IBioSequenceBatch.BatchSize method on NilIBioSequenceBatch")
@ -279,13 +265,8 @@ func (iterator IBioSequence) Finished() bool {
// Sorting the batches of sequences.
func (iterator IBioSequence) SortBatches(sizes ...int) IBioSequence {
buffsize := iterator.BufferSize()
if len(sizes) > 0 {
buffsize = sizes[0]
}
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(1)
@ -338,8 +319,7 @@ func (iterator IBioSequence) Concat(iterators ...IBioSequence) IBioSequence {
allPaired = allPaired && i.IsPaired()
}
buffsize := iterator.BufferSize()
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(1)
@ -396,8 +376,7 @@ func (iterator IBioSequence) Pool(iterators ...IBioSequence) IBioSequence {
}
nextCounter := goutils.AtomicCounter()
buffsize := iterator.BufferSize()
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(niterator)
@ -431,13 +410,8 @@ func (iterator IBioSequence) Pool(iterators ...IBioSequence) IBioSequence {
// indicated in parameter. Rebatching implies to sort the
// source IBioSequenceBatch.
func (iterator IBioSequence) Rebatch(size int, sizes ...int) IBioSequence {
buffsize := iterator.BufferSize()
if len(sizes) > 0 {
buffsize = sizes[0]
}
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(1)
@ -532,14 +506,9 @@ func (iterator IBioSequence) Count(recycle bool) (int, int, int) {
// iterator following the predicate value.
func (iterator IBioSequence) DivideOn(predicate obiseq.SequencePredicate,
size int, sizes ...int) (IBioSequence, IBioSequence) {
buffsize := iterator.BufferSize()
if len(sizes) > 0 {
buffsize = sizes[0]
}
trueIter := MakeIBioSequence(buffsize)
falseIter := MakeIBioSequence(buffsize)
trueIter := MakeIBioSequence()
falseIter := MakeIBioSequence()
trueIter.Add(1)
falseIter.Add(1)
@ -604,18 +573,13 @@ func (iterator IBioSequence) DivideOn(predicate obiseq.SequencePredicate,
// A function that takes a predicate and a batch of sequences and returns a filtered batch of sequences.
func (iterator IBioSequence) FilterOn(predicate obiseq.SequencePredicate,
size int, sizes ...int) IBioSequence {
buffsize := iterator.BufferSize()
nworkers := 4
if len(sizes) > 0 {
nworkers = sizes[0]
}
if len(sizes) > 1 {
buffsize = sizes[1]
}
trueIter := MakeIBioSequence(buffsize)
trueIter := MakeIBioSequence()
trueIter.Add(nworkers)
@ -661,18 +625,13 @@ func (iterator IBioSequence) FilterOn(predicate obiseq.SequencePredicate,
func (iterator IBioSequence) FilterAnd(predicate obiseq.SequencePredicate,
size int, sizes ...int) IBioSequence {
buffsize := iterator.BufferSize()
nworkers := 4
if len(sizes) > 0 {
nworkers = sizes[0]
}
if len(sizes) > 1 {
buffsize = sizes[1]
}
trueIter := MakeIBioSequence(buffsize)
trueIter := MakeIBioSequence()
trueIter.Add(nworkers)
@ -740,13 +699,7 @@ func (iterator IBioSequence) Load() obiseq.BioSequenceSlice {
func IBatchOver(data obiseq.BioSequenceSlice,
size int, sizes ...int) IBioSequence {
buffsize := 0
if len(sizes) > 0 {
buffsize = sizes[0]
}
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(1)

View File

@ -36,7 +36,6 @@ func (dist *IDistribute) Classifier() *obiseq.BioSequenceClassifier {
func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, sizes ...int) IDistribute {
batchsize := 5000
buffsize := 2
outputs := make(map[int]IBioSequence, 100)
slices := make(map[int]*obiseq.BioSequenceSlice, 100)
@ -47,9 +46,7 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
batchsize = sizes[0]
}
if len(sizes) > 1 {
buffsize = sizes[1]
}
jobDone := sync.WaitGroup{}
lock := sync.Mutex{}
@ -80,7 +77,7 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
orders[key] = 0
lock.Lock()
outputs[key] = MakeIBioSequence(buffsize)
outputs[key] = MakeIBioSequence()
lock.Unlock()
news <- key

View File

@ -4,16 +4,12 @@ import "git.metabarcoding.org/lecasofts/go/obitools/pkg/obiseq"
func (iterator IBioSequence) IMergeSequenceBatch(na string, statsOn []string, sizes ...int) IBioSequence {
batchsize := 100
buffsize := iterator.BufferSize()
if len(sizes) > 0 {
batchsize = sizes[0]
}
if len(sizes) > 1 {
buffsize = sizes[1]
}
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(1)

View File

@ -6,7 +6,6 @@ import (
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiseq"
)
// That method allows for applying a SeqWorker function on every sequences.
//
// Sequences are provided by the iterator and modified sequences are pushed
@ -17,17 +16,12 @@ import (
// - The second the size of the chanel buffer. By default set to the same value than the input buffer.
func (iterator IBioSequence) MakeIWorker(worker obiseq.SeqWorker, sizes ...int) IBioSequence {
nworkers := 4
buffsize := iterator.BufferSize()
if len(sizes) > 0 {
nworkers = sizes[0]
}
if len(sizes) > 1 {
buffsize = sizes[1]
}
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(nworkers)
@ -64,17 +58,12 @@ func (iterator IBioSequence) MakeIWorker(worker obiseq.SeqWorker, sizes ...int)
func (iterator IBioSequence) MakeIConditionalWorker(predicate obiseq.SequencePredicate,
worker obiseq.SeqWorker, sizes ...int) IBioSequence {
nworkers := 4
buffsize := iterator.BufferSize()
if len(sizes) > 0 {
nworkers = sizes[0]
}
if len(sizes) > 1 {
buffsize = sizes[1]
}
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(nworkers)
@ -112,17 +101,12 @@ func (iterator IBioSequence) MakeIConditionalWorker(predicate obiseq.SequencePre
func (iterator IBioSequence) MakeISliceWorker(worker obiseq.SeqSliceWorker, sizes ...int) IBioSequence {
nworkers := 4
buffsize := iterator.BufferSize()
if len(sizes) > 0 {
nworkers = sizes[0]
}
if len(sizes) > 1 {
buffsize = sizes[1]
}
newIter := MakeIBioSequence(buffsize)
newIter := MakeIBioSequence()
newIter.Add(nworkers)
@ -140,7 +124,7 @@ func (iterator IBioSequence) MakeISliceWorker(worker obiseq.SeqSliceWorker, size
newIter.Done()
}
log.Printf("Start of the batch slice workers on %d workers (buffer : %d)\n", nworkers, buffsize)
log.Printf("Start of the batch slice workers on %d workers\n", nworkers)
for i := 0; i < nworkers-1; i++ {
go f(iterator.Split())
}
@ -168,4 +152,3 @@ func SliceWorkerPipe(worker obiseq.SeqSliceWorker, sizes ...int) Pipeable {
return f
}

View File

@ -11,7 +11,6 @@ type _Options struct {
withProgressBar bool
parallelWorkers int
batchSize int
bufferSize int
}
// Options stores a set of option usable by the
@ -56,16 +55,6 @@ func OptionAllowedMismatches(count int) WithOption {
return f
}
// OptionBufferSize sets the requested channel
// buffer size.
func OptionBufferSize(size int) WithOption {
f := WithOption(func(opt Options) {
opt.pointer.bufferSize = size
})
return f
}
// OptionParallelWorkers sets how many search
// jobs will be run in parallel.
func OptionParallelWorkers(nworkers int) WithOption {
@ -102,12 +91,6 @@ func (options Options) WithProgressBar() bool {
return options.pointer.withProgressBar
}
// BufferSize returns the size of the channel
// buffer specified by the options
func (options Options) BufferSize() int {
return options.pointer.bufferSize
}
// BatchSize returns the size of the
// sequence batch used by the PCR algorithm
func (options Options) BatchSize() int {
@ -130,7 +113,6 @@ func MakeOptions(setters []WithOption) Options {
withProgressBar: false,
parallelWorkers: 4,
batchSize: 1000,
bufferSize: 100,
}
opt := Options{&o}

View File

@ -13,10 +13,9 @@ import (
var _Debug = false
var _ParallelWorkers = runtime.NumCPU()*2 - 1
var _MaxAllowedCPU = runtime.NumCPU()
var _BufferSize = 1
var _BatchSize = 5000
type ArgumentParser func([]string) (*getoptions.GetOpt, []string, error)
type ArgumentParser func([]string) (*getoptions.GetOpt, []string)
func GenerateOptionParser(optionset ...func(*getoptions.GetOpt)) ArgumentParser {
@ -38,10 +37,14 @@ func GenerateOptionParser(optionset ...func(*getoptions.GetOpt)) ArgumentParser
o(options)
}
return func(args []string) (*getoptions.GetOpt, []string, error) {
return func(args []string) (*getoptions.GetOpt, []string) {
remaining, err := options.Parse(args[1:])
if err != nil {
log.Fatalf("Error on the commande line : %v",err)
}
// Setup the maximum number of CPU usable by the program
runtime.GOMAXPROCS(_MaxAllowedCPU)
if options.Called("max-cpu") {
@ -67,7 +70,7 @@ func GenerateOptionParser(optionset ...func(*getoptions.GetOpt)) ArgumentParser
log.Debugln("Switch to debug level logging")
}
return options, remaining, err
return options, remaining
}
}
@ -88,11 +91,6 @@ func CLIMaxCPU() int {
return _MaxAllowedCPU
}
// CLIBufferSize returns the expeted channel buffer size for obitools
func CLIBufferSize() int {
return _BufferSize
}
// CLIBatchSize returns the expeted size of the sequence batches
func CLIBatchSize() int {
return _BatchSize

View File

@ -8,6 +8,15 @@ import (
log "github.com/sirupsen/logrus"
)
func (s *BioSequence) HasAttribute(key string) bool {
ok := s.annotations != nil
if ok {
_, ok = s.annotations[key]
}
return ok
}
// A method that returns the value of the key in the annotation map.
func (s *BioSequence) GetAttribute(key string) (interface{}, bool) {
var val interface{}

View File

@ -278,3 +278,28 @@ func (s *BioSequence) Clear() {
s.sequence = s.sequence[0:0]
}
func (s *BioSequence) Composition() map[byte]int {
a := 0
c := 0
g := 0
t := 0
other := 0
for _, char := range s.sequence {
switch char {
case 'a':
a++
case 'c':
c++
case 'g':
g++
case 't':
t++
default:
other++
}
}
return map[byte]int{'a': a, 'c': c, 'g': g, 't': t, 'o': other}
}

View File

@ -316,3 +316,4 @@ func RotateClassifier(size int) *BioSequenceClassifier {
c := BioSequenceClassifier{code, value, reset, clone,"RotateClassifier"}
return &c
}

View File

@ -4,13 +4,12 @@ import (
"context"
"fmt"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obieval"
log "github.com/sirupsen/logrus"
)
func Expression(expression string) func(*BioSequence) (interface{}, error) {
exp, err := obieval.OBILang.NewEvaluable(expression)
exp, err := OBILang.NewEvaluable(expression)
if err != nil {
log.Fatalf("Error in the expression : %s", expression)
}

View File

@ -1,4 +1,4 @@
package obieval
package obiseq
import (
"fmt"
@ -174,8 +174,19 @@ var OBILang = gval.NewLanguage(
log.Fatalf("%v cannot be converted to a boolan value", args[0])
}
return val, nil
}),
gval.Function("ifelse", func(args ...interface{}) (interface{}, error) {
if args[0].(bool) {
return args[1], nil
} else {
return args[2], nil
}
}),
gval.Function("gcskew", func(args ...interface{}) (interface{}, error) {
composition := (args[0].(*BioSequence)).Composition()
return float64(composition['g']-composition['c']) / float64(composition['g']+composition['c']), nil
}),
gval.Function("composition", func(args ...interface{}) (interface{}, error) {
return (args[0].(*BioSequence)).Composition(), nil
}))
func Expression(expression string) (gval.Evaluable, error) {
return OBILang.NewEvaluable(expression)
}

View File

@ -5,7 +5,6 @@ import (
"fmt"
"regexp"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obieval"
log "github.com/sirupsen/logrus"
)
@ -256,7 +255,7 @@ func IsIdIn(ids ...string) SequencePredicate {
func ExpressionPredicat(expression string) SequencePredicate {
exp, err := obieval.OBILang.NewEvaluable(expression)
exp, err := OBILang.NewEvaluable(expression)
if err != nil {
log.Fatalf("Error in the expression : %s", expression)
}

View File

@ -0,0 +1,63 @@
package obicleandb
import (
log "github.com/sirupsen/logrus"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obichunk"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiiter"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obioptions"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiseq"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obitools/obigrep"
)
func ICleanDB(itertator obiiter.IBioSequence) obiiter.IBioSequence {
var rankPredicate obiseq.SequencePredicate
options := make([]obichunk.WithOption, 0, 30)
// Make sequence dereplication with a constraint on the taxid.
// To be merged, both sequences must have the same taxid.
options = append(options,
obichunk.OptionBatchCount(100),
obichunk.OptionSortOnMemory(),
obichunk.OptionSubCategory("taxid"),
obichunk.OptionsParallelWorkers(
obioptions.CLIParallelWorkers()),
obichunk.OptionsBatchSize(
obioptions.CLIBatchSize()),
obichunk.OptionNAValue("NA"),
)
unique, err := obichunk.IUniqueSequence(itertator, options...)
if err != nil {
log.Fatal(err)
}
taxonomy := obigrep.CLILoadSelectedTaxonomy()
if len(obigrep.CLIRequiredRanks()) > 0 {
rankPredicate = obigrep.CLIHasRankDefinedPredicate()
} else {
rankPredicate = taxonomy.HasRequiredRank("species").And(taxonomy.HasRequiredRank("genus")).And(taxonomy.HasRequiredRank("family"))
}
goodTaxa := taxonomy.IsAValidTaxon(CLIUpdateTaxids()).And(rankPredicate)
usable := unique.FilterOn(goodTaxa,
obioptions.CLIBatchSize(),
obioptions.CLIParallelWorkers())
annotated := usable.MakeIWorker(taxonomy.MakeSetSpeciesWorker(),
obioptions.CLIParallelWorkers(),
).MakeIWorker(taxonomy.MakeSetGenusWorker(),
obioptions.CLIParallelWorkers(),
).MakeIWorker(taxonomy.MakeSetFamilyWorker(),
obioptions.CLIParallelWorkers(),
)
// annotated.MakeIConditionalWorker(obiseq.IsMoreAbundantOrEqualTo(3),1000)
return annotated
}

View File

@ -60,6 +60,21 @@ func InputOptionSet(options *getoptions.GetOpt) {
}
func OutputModeOptionSet(options *getoptions.GetOpt) {
options.BoolVar(&__no_progress_bar__, "no-progressbar", false,
options.Description("Disable the progress bar printing"))
options.BoolVar(&__compressed__, "compress", false,
options.Alias("Z"),
options.Description("Output is compressed"))
options.StringVar(&__output_file_name__, "out", __output_file_name__,
options.Alias("o"),
options.ArgName("FILENAME"),
options.Description("Filename used for saving the output"),
)
}
func OutputOptionSet(options *getoptions.GetOpt) {
options.BoolVar(&__output_in_fasta__, "fasta-output", false,
options.Description("Read data following the ecoPCR output format."))
@ -73,19 +88,7 @@ func OutputOptionSet(options *getoptions.GetOpt) {
options.Alias("O"),
options.Description("output FASTA/FASTQ title line annotations follow OBI format."))
options.BoolVar(&__no_progress_bar__, "no-progressbar", false,
options.Description("Disable the progress bar printing"))
options.BoolVar(&__compressed__, "compress", false,
options.Alias("Z"),
options.Description("Output is compressed"))
options.StringVar(&__output_file_name__, "out", __output_file_name__,
options.Alias("o"),
options.ArgName("FILENAME"),
options.Description("Filename used for saving the output"),
)
OutputModeOptionSet(options)
}
func PairedFilesOptionSet(options *getoptions.GetOpt) {

View File

@ -48,6 +48,10 @@ func _ExpandListOfFiles(check_ext bool, filenames ...string) ([]string, error) {
strings.HasSuffix(path, "fasta.gz") ||
strings.HasSuffix(path, "fastq") ||
strings.HasSuffix(path, "fastq.gz") ||
strings.HasSuffix(path, "seq") ||
strings.HasSuffix(path, "seq.gz") ||
strings.HasSuffix(path, "gb") ||
strings.HasSuffix(path, "gb.gz") ||
strings.HasSuffix(path, "dat") ||
strings.HasSuffix(path, "dat.gz") ||
strings.HasSuffix(path, "ecopcr") ||
@ -82,13 +86,12 @@ func CLIReadBioSequences(filenames ...string) (obiiter.IBioSequence, error) {
opts = append(opts, obiformats.OptionsFastSeqHeaderParser(obiformats.ParseGuessedFastSeqHeader))
}
nworkers := obioptions.CLIParallelWorkers() // / 4
nworkers := obioptions.CLIParallelWorkers()
if nworkers < 2 {
nworkers = 2
}
opts = append(opts, obiformats.OptionsParallelWorkers(nworkers))
opts = append(opts, obiformats.OptionsBufferSize(obioptions.CLIBufferSize()))
opts = append(opts, obiformats.OptionsBatchSize(obioptions.CLIBatchSize()))
opts = append(opts, obiformats.OptionsQualityShift(CLIInputQualityShift()))

View File

@ -60,7 +60,6 @@ func CLIWriteBioSequences(iterator obiiter.IBioSequence,
}
opts = append(opts, obiformats.OptionsParallelWorkers(nworkers))
opts = append(opts, obiformats.OptionsBufferSize(obioptions.CLIBufferSize()))
opts = append(opts, obiformats.OptionsBatchSize(obioptions.CLIBatchSize()))
opts = append(opts, obiformats.OptionsQualityShift(CLIOutputQualityShift()))

View File

@ -0,0 +1,61 @@
package obicsv
import (
"log"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiformats"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiiter"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obioptions"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obitools/obiconvert"
)
func CLIWriteCSV(iterator obiiter.IBioSequence,
terminalAction bool, filenames ...string) (obiiter.IBioSequence, error) {
if obiconvert.CLIProgressBar() {
iterator = iterator.Speed()
}
var newIter obiiter.IBioSequence
opts := make([]obiformats.WithOption, 0, 10)
nworkers := obioptions.CLIParallelWorkers() / 4
if nworkers < 2 {
nworkers = 2
}
opts = append(opts, obiformats.OptionsParallelWorkers(nworkers))
opts = append(opts, obiformats.OptionsBatchSize(obioptions.CLIBatchSize()))
opts = append(opts, obiformats.OptionsQualityShift(obiconvert.CLIOutputQualityShift()))
opts = append(opts, obiformats.OptionsCompressed(obiconvert.CLICompressed()))
opts = append(opts, obiformats.CSVId(CLIPrintId()),
obiformats.CSVCount(CLIPrintCount()),
obiformats.CSVTaxon(CLIPrintTaxon()),
obiformats.CSVDefinition(CLIPrintDefinition()),
obiformats.CSVKeys(CLIToBeKeptAttributes()),
)
var err error
if len(filenames) == 0 {
newIter, err = obiformats.WriteCSVToStdout(iterator, opts...)
} else {
newIter, err = obiformats.WriteCSVToFile(iterator, filenames[0], opts...)
}
if err != nil {
log.Fatalf("Write file error: %v", err)
return obiiter.NilIBioSequence, err
}
if terminalAction {
newIter.Recycle()
return obiiter.NilIBioSequence, nil
}
return newIter, nil
}

View File

@ -0,0 +1,126 @@
package obicsv
import (
"git.metabarcoding.org/lecasofts/go/obitools/pkg/goutils"
"git.metabarcoding.org/lecasofts/go/obitools/pkg/obitools/obiconvert"
"github.com/DavidGamba/go-getoptions"
)
var _outputIds = true
var _outputCount = false
var _outputTaxon = false
var _outputSequence = true
var _outputQuality = true
var _outputDefinition = false
var _obipairing = false
var _autoColumns = false
var _keepOnly = make([]string, 0)
var _naValue = "NA"
var _softAttributes = map[string][]string{
"obipairing": {"mode", "seq_a_single", "seq_b_single",
"ali_dir", "score", "score_norm",
"seq_ab_match", "pairing_mismatches",
},
}
func CSVOptionSet(options *getoptions.GetOpt) {
options.BoolVar(&_outputIds, "ids", _outputIds,
options.Alias("i"),
options.Description("Prints sequence ids in the ouput."))
options.BoolVar(&_outputSequence, "sequence", _outputSequence,
options.Alias("s"),
options.Description("Prints sequence itself in the output."))
options.BoolVar(&_outputQuality, "quality", _outputQuality,
options.Alias("q"),
options.Description("Prints sequence quality in the output."))
options.BoolVar(&_outputDefinition, "definition", _outputDefinition,
options.Alias("d"),
options.Description("Prints sequence definition in the output."))
options.BoolVar(&_autoColumns, "auto", _autoColumns,
options.Description("Based on the first sequences, propose a list of attibutes to print"))
options.BoolVar(&_outputCount, "count", _outputCount,
options.Description("Prints the count attribute in the output"))
options.BoolVar(&_outputTaxon, "taxon", _outputTaxon,
options.Description("Prints the NCBI taxid and its related scientific name"))
options.BoolVar(&_obipairing, "obipairing", _obipairing,
options.Description("Prints the attributes added by obipairing"))
options.StringSliceVar(&_keepOnly, "keep", 1, 1,
options.Alias("k"),
options.ArgName("KEY"),
options.Description("Keeps only attribute with key <KEY>. Several -k options can be combined."))
options.StringVar(&_naValue, "na-value", _naValue,
options.ArgName("NAVALUE"),
options.Description("A string representing non available values in the CSV file."))
}
func OptionSet(options *getoptions.GetOpt) {
obiconvert.OutputModeOptionSet(options)
CSVOptionSet(options)
}
func CLIPrintId() bool {
return _outputIds
}
func CLIPrintSequence() bool {
return _outputSequence
}
func CLIPrintCount() bool {
return _outputCount
}
func CLIPrintTaxon() bool {
return _outputTaxon
}
func CLIPrintQuality() bool {
return _outputQuality
}
func CLIPrintDefinition() bool {
return _outputDefinition
}
func CLIAutoColumns() bool {
return _autoColumns
}
func CLIHasToBeKeptAttributes() bool {
return len(_keepOnly) > 0
}
func CLIToBeKeptAttributes() []string {
if _obipairing {
_keepOnly = append(_keepOnly, _softAttributes["obipairing"]...)
}
if i := goutils.LookFor(_keepOnly, "count"); i >= 0 {
_keepOnly = goutils.RemoveIndex(_keepOnly, i)
_outputCount = true
}
if i := goutils.LookFor(_keepOnly, "taxid"); i >= 0 {
_keepOnly = goutils.RemoveIndex(_keepOnly, i)
_outputTaxon = true
}
if i := goutils.LookFor(_keepOnly, "scientific_name"); i >= 0 {
_keepOnly = goutils.RemoveIndex(_keepOnly, i)
_outputTaxon = true
}
return _keepOnly
}
func CLINAValue() string {
return _naValue
}

View File

@ -31,7 +31,6 @@ func DistributeSequence(sequences obiiter.IBioSequence) {
}
opts = append(opts, obiformats.OptionsParallelWorkers(nworkers),
obiformats.OptionsBufferSize(obioptions.CLIBufferSize()),
obiformats.OptionsBatchSize(obioptions.CLIBatchSize()),
obiformats.OptionsQualityShift(obiconvert.CLIOutputQualityShift()),
obiformats.OptionsAppendFile(CLIAppendSequences()),

View File

@ -39,7 +39,6 @@ func CLIFilterSequence(iterator obiiter.IBioSequence) obiiter.IBioSequence {
newIter = iterator.FilterOn(predicate,
obioptions.CLIBatchSize(),
obioptions.CLIParallelWorkers(),
obioptions.CLIBufferSize(),
)
}
} else {

View File

@ -20,7 +20,6 @@ func IExtractBarcode(iterator obiiter.IBioSequence) (obiiter.IBioSequence, error
obingslibrary.OptionDiscardErrors(!CLIConservedErrors()),
obingslibrary.OptionParallelWorkers(obioptions.CLIParallelWorkers()),
obingslibrary.OptionBatchSize(obioptions.CLIBatchSize()),
obingslibrary.OptionBufferSize(obioptions.CLIBufferSize()),
)
ngsfilter, err := CLINGSFIlter()

View File

@ -211,17 +211,13 @@ func IAssemblePESequencesBatch(iterator obiiter.IBioSequence,
}
nworkers := obioptions.CLIMaxCPU() * 3 / 2
buffsize := iterator.BufferSize()
if len(sizes) > 0 {
nworkers = sizes[0]
}
if len(sizes) > 1 {
buffsize = sizes[1]
}
newIter := obiiter.MakeIBioSequence(buffsize)
newIter := obiiter.MakeIBioSequence()
newIter.Add(nworkers)

View File

@ -51,8 +51,6 @@ func Unique(sequences obiiter.IBioSequence) obiiter.IBioSequence {
options = append(options,
obichunk.OptionsParallelWorkers(
obioptions.CLIParallelWorkers()),
obichunk.OptionsBufferSize(
obioptions.CLIBufferSize()),
obichunk.OptionsBatchSize(
obioptions.CLIBatchSize()),
obichunk.OptionNAValue(CLINAValue()),