mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-03-25 21:40:52 +00:00
Compare commits
59 Commits
blackboard
...
taxonomy
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
00f2dc2697 | ||
|
|
c50a0f409d | ||
|
|
7c4042df6b | ||
|
|
0a567f621c | ||
|
|
9acb4a85a8 | ||
|
|
3137c1f841 | ||
|
|
ffd67252c3 | ||
|
|
757448cb1e | ||
|
|
4ae3336135 | ||
|
|
d066bb6878 | ||
|
|
becb995e3d | ||
|
|
c58d9772ac | ||
|
|
67c0d00a4d | ||
|
|
4fe0db63ff | ||
|
|
ccd3b06532 | ||
|
|
5d0f996625 | ||
|
|
abfa8f357a | ||
|
|
795df34d1a | ||
|
|
f2525d7b07 | ||
|
|
39dd3e3ce8 | ||
|
|
f41a6fbb60 | ||
|
|
00b0edc15a | ||
|
|
ad2461a656 | ||
|
|
40fb4e9767 | ||
|
|
d29a56dcbf | ||
|
|
69ef1758a2 | ||
|
|
3d06978808 | ||
|
|
7884a74f9c | ||
|
|
36327c79c8 | ||
|
|
f3d8707c08 | ||
|
|
7633fc4d23 | ||
|
|
f5d79d0bc4 | ||
|
|
03f4e88a17 | ||
|
|
9471fedfa1 | ||
|
|
4b65bfce84 | ||
|
|
fc75974c68 | ||
|
|
422f11cceb | ||
|
|
fefc360f80 | ||
|
|
3e00d39d47 | ||
|
|
9e8a7fd9be | ||
|
|
74280e4704 | ||
|
|
7255c71576 | ||
|
|
241f2286f2 | ||
|
|
b37fc39ead | ||
|
|
2b4a633c30 | ||
|
|
05bf2bfd6c | ||
|
|
65ae82622e | ||
|
|
373464cb06 | ||
|
|
cd330db672 | ||
|
|
31bfc88eb9 | ||
|
|
bdb96dda94 | ||
|
|
3f57935328 | ||
|
|
886b5d9a96 | ||
|
|
f83032e643 | ||
|
|
c0c18030c8 | ||
|
|
242f4d8f56 | ||
|
|
1b1cd41fd3 | ||
|
|
bc1aaaf7d9 | ||
|
|
2247c3bc0a |
14
.gitignore
vendored
14
.gitignore
vendored
@@ -118,3 +118,17 @@ doc/book/wolf_data/Release-253/ncbitaxo/readme.txt
|
|||||||
doc/book/results/toto.tasta
|
doc/book/results/toto.tasta
|
||||||
sample/.DS_Store
|
sample/.DS_Store
|
||||||
GO
|
GO
|
||||||
|
ncbitaxo/citations.dmp
|
||||||
|
ncbitaxo/delnodes.dmp
|
||||||
|
ncbitaxo/division.dmp
|
||||||
|
ncbitaxo/gc.prt
|
||||||
|
ncbitaxo/gencode.dmp
|
||||||
|
ncbitaxo/merged.dmp
|
||||||
|
ncbitaxo/names.dmp
|
||||||
|
ncbitaxo/nodes.dmp
|
||||||
|
ncbitaxo/readme.txt
|
||||||
|
template.16S
|
||||||
|
xxx.gz
|
||||||
|
*.sav
|
||||||
|
*.old
|
||||||
|
ncbitaxo.tgz
|
||||||
|
|||||||
@@ -5,7 +5,7 @@ They are implemented in *GO* and are tens of times faster than OBITools2.
|
|||||||
|
|
||||||
The git for *OBITools4* is available at :
|
The git for *OBITools4* is available at :
|
||||||
|
|
||||||
> https://metabarcoding.org/obitools4
|
> https://github.com/metabarcoding/obitools4
|
||||||
|
|
||||||
## Installing *OBITools V4*
|
## Installing *OBITools V4*
|
||||||
|
|
||||||
@@ -13,7 +13,7 @@ An installation script that compiles the new *OBITools* on your Unix-like system
|
|||||||
The easiest way to run it is to copy and paste the following command into your terminal
|
The easiest way to run it is to copy and paste the following command into your terminal
|
||||||
|
|
||||||
```{bash}
|
```{bash}
|
||||||
curl -L https://metabarcoding.org/obitools4/install.sh | bash
|
curl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh | bash
|
||||||
```
|
```
|
||||||
|
|
||||||
By default, the script installs the *OBITools* commands and other associated files into the `/usr/local` directory.
|
By default, the script installs the *OBITools* commands and other associated files into the `/usr/local` directory.
|
||||||
@@ -33,7 +33,7 @@ available on your system, the installation script offers two options:
|
|||||||
You can use these options by following the installation command:
|
You can use these options by following the installation command:
|
||||||
|
|
||||||
```{bash}
|
```{bash}
|
||||||
curl -L https://metabarcoding.org/obitools4/install.sh | \
|
curl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh | \
|
||||||
bash -s -- --install-dir test_install --obitools-prefix k
|
bash -s -- --install-dir test_install --obitools-prefix k
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -2,6 +2,65 @@
|
|||||||
|
|
||||||
## Latest changes
|
## Latest changes
|
||||||
|
|
||||||
|
### Breaking changes
|
||||||
|
|
||||||
|
- In `obimultiplex`, the short version of the **--tag-list** option used to specify the list
|
||||||
|
of tags and primers to be used for the demultiplexing has been changed from `-t` to `-s`.
|
||||||
|
|
||||||
|
- The command `obifind` is now renamed `obitaxonomy`.
|
||||||
|
|
||||||
|
- The **--taxdump** option used to specify the path to the taxdump containing the NCBI taxonomy
|
||||||
|
has been renamed to **--taxonomy**.
|
||||||
|
|
||||||
|
### Bug fixes
|
||||||
|
|
||||||
|
- In `obipairing`, correct the stats `seq_a_single` and `seq_b_single` when
|
||||||
|
on right alignment mode
|
||||||
|
|
||||||
|
- Not really a bug but the memory impact of `obiuniq` has been reduced by reducing
|
||||||
|
the batch size and not reading the qualities from the fastq files as `obiuniq`
|
||||||
|
is producing only fasta output without qualities.
|
||||||
|
|
||||||
|
### New features
|
||||||
|
|
||||||
|
- `obitoaxonomy` a new **--dump|D** option allows for dumping a sub-taxonomy.
|
||||||
|
|
||||||
|
- Taxonomy dump can now be provided as a four-columns CSV file to the **--taxonomy**
|
||||||
|
option.
|
||||||
|
|
||||||
|
- NCBI Taxonomy dump does not need to be uncompressed and unarchived anymore. The
|
||||||
|
path of the tar and gziped dump file can be directly specified using the
|
||||||
|
**--taxonomy** option.
|
||||||
|
|
||||||
|
- Most of the time obitools identify automatically sequence file format. But
|
||||||
|
it fails sometimes. Two new option **--fasta** and **--fastq** are added to
|
||||||
|
allow the processing of the rare fasta and fastq files not recognized.
|
||||||
|
|
||||||
|
- In `obiscript`, adds new methods to the Lua sequence object:
|
||||||
|
- `md5_string()`: returning the MD5 check sum as an hexadecimal string,
|
||||||
|
- `subsequence(from,to)`: allows to extract a subsequence on a 0 based
|
||||||
|
coordinate system, upper bound expluded like in go.
|
||||||
|
- `reverse_complement`: returning a sequence object corresponding to the reverse complement
|
||||||
|
of the current sequence.
|
||||||
|
|
||||||
|
### Change of git repositiory
|
||||||
|
|
||||||
|
- The OBITools4 git repository has been moved to the github repository.
|
||||||
|
The new address is: https://github.com/metabarcoding/obitools4.
|
||||||
|
Take care for using the new install script for retrieving the new version.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh \
|
||||||
|
| bash
|
||||||
|
```
|
||||||
|
|
||||||
|
or with options:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh \
|
||||||
|
| bash -s -- --install-dir test_install --obitools-prefix k
|
||||||
|
```
|
||||||
|
|
||||||
### CPU limitation
|
### CPU limitation
|
||||||
|
|
||||||
- By default, *OBITools4* tries to use all the computing power available on
|
- By default, *OBITools4* tries to use all the computing power available on
|
||||||
@@ -20,6 +79,37 @@
|
|||||||
|
|
||||||
### New features
|
### New features
|
||||||
|
|
||||||
|
- The output of the obitools will evolve to produce results only in standard
|
||||||
|
formats such as fasta and fastq. For non-sequential data, the output will be
|
||||||
|
in CSV format, with the separator `,`, the decimal separator `.`, and a
|
||||||
|
header line with the column names. It is more convenient to use the output
|
||||||
|
in other programs. For example, you can use the `csvtomd` command to
|
||||||
|
reformat the csv output into a markdown table. The first command to initiate
|
||||||
|
this change is `obicount`, which now produces a 3-line CSV output.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
obicount data.csv | csvtomd
|
||||||
|
```
|
||||||
|
|
||||||
|
- Adds the new experimental `obicleandb` utility to clean up reference
|
||||||
|
database files created with `obipcr`. An easy way to create a reference
|
||||||
|
database for `obitag` is to use `obipcr` on a local copy of Genbank or EMBL.
|
||||||
|
However, these sequence databases are known to contain many taxonomic
|
||||||
|
errors, such as bacterial sequences annotated with the taxid of their host
|
||||||
|
species. obicleandb tries to detect these errors. To do this, it first keeps
|
||||||
|
only sequences annotated with the taxid to which a species, genus, and
|
||||||
|
family taxid can be assigned. Then, for each sequence, it compares the
|
||||||
|
distance of the sequence to the other sequences belonging to the same genus
|
||||||
|
to the same number of distances between the considered sequence and a
|
||||||
|
randomly selected set of sequences belonging to another family using a
|
||||||
|
Mann-Whitney U test. The alternative hypothesis is that out-of-family
|
||||||
|
distances are greater than intrageneric distances. Sequences are annotated
|
||||||
|
with the p-value of the Mann-Whitney U test in the **obicleandb_trusted**
|
||||||
|
slot. Later, the distribution of this p-value can be analyzed to determine a
|
||||||
|
threshold. Empirically, a threshold of 0.05 is a good compromise and allows
|
||||||
|
to filter out less than 1‰ of the sequences. These sequences can then be
|
||||||
|
removed using `obigrep`.
|
||||||
|
|
||||||
- Adds a new `obijoin` utility to join information contained in a sequence
|
- Adds a new `obijoin` utility to join information contained in a sequence
|
||||||
file with that contained in another sequence or CSV file. The command allows
|
file with that contained in another sequence or CSV file. The command allows
|
||||||
you to specify the names of the keys in the main sequence file and in the
|
you to specify the names of the keys in the main sequence file and in the
|
||||||
|
|||||||
@@ -3,13 +3,11 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiannotate"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiannotate"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -37,15 +35,11 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
annotator := obiannotate.CLIAnnotationPipeline()
|
annotator := obiannotate.CLIAnnotationPipeline()
|
||||||
obiconvert.CLIWriteBioSequences(sequences.Pipe(annotator), true)
|
obiconvert.CLIWriteBioSequences(sequences.Pipe(annotator), true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,11 +3,9 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiclean"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiclean"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
@@ -18,16 +16,12 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
cleaned := obiclean.CLIOBIClean(fs)
|
cleaned := obiclean.CLIOBIClean(fs)
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(cleaned, true)
|
obiconvert.CLIWriteBioSequences(cleaned, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,33 +3,28 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obicleandb"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obicleandb"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
obioptions.SetBatchSize(10)
|
obidefault.SetBatchSize(10)
|
||||||
|
|
||||||
optionParser := obioptions.GenerateOptionParser(obicleandb.OptionSet)
|
optionParser := obioptions.GenerateOptionParser(obicleandb.OptionSet)
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
cleaned := obicleandb.ICleanDB(fs)
|
cleaned := obicleandb.ICleanDB(fs)
|
||||||
|
|
||||||
toconsume, _ := obiconvert.CLIWriteBioSequences(cleaned, false)
|
toconsume, _ := obiconvert.CLIWriteBioSequences(cleaned, false)
|
||||||
toconsume.Consume()
|
toconsume.Consume()
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,11 +3,9 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
@@ -18,15 +16,11 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
comp := fs.MakeIWorker(obiseq.ReverseComplementWorker(true), true)
|
comp := fs.MakeIWorker(obiseq.ReverseComplementWorker(true), true)
|
||||||
obiconvert.CLIWriteBioSequences(comp, true)
|
obiconvert.CLIWriteBioSequences(comp, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,11 +3,9 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconsensus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconsensus"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
@@ -18,16 +16,12 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
cleaned := obiconsensus.CLIOBIMinion(fs)
|
cleaned := obiconsensus.CLIOBIMinion(fs)
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(cleaned, true)
|
obiconvert.CLIWriteBioSequences(cleaned, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,31 +3,26 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
obioptions.SetStrictReadWorker(2)
|
obidefault.SetStrictReadWorker(2)
|
||||||
obioptions.SetStrictWriteWorker(2)
|
obidefault.SetStrictWriteWorker(2)
|
||||||
|
|
||||||
optionParser := obioptions.GenerateOptionParser(obiconvert.OptionSet)
|
optionParser := obioptions.GenerateOptionParser(obiconvert.OptionSet)
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(fs, true)
|
obiconvert.CLIWriteBioSequences(fs, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -4,8 +4,7 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obicount"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obicount"
|
||||||
|
|
||||||
@@ -35,27 +34,24 @@ func main() {
|
|||||||
|
|
||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
obioptions.SetStrictReadWorker(min(4, obioptions.CLIParallelWorkers()))
|
obidefault.SetStrictReadWorker(min(4, obidefault.ParallelWorkers()))
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
nvariant, nread, nsymbol := fs.Count(true)
|
nvariant, nread, nsymbol := fs.Count(true)
|
||||||
|
|
||||||
|
fmt.Print("entities,n\n")
|
||||||
|
|
||||||
if obicount.CLIIsPrintingVariantCount() {
|
if obicount.CLIIsPrintingVariantCount() {
|
||||||
fmt.Printf(" %d", nvariant)
|
fmt.Printf("variants,%d\n", nvariant)
|
||||||
}
|
}
|
||||||
|
|
||||||
if obicount.CLIIsPrintingReadCount() {
|
if obicount.CLIIsPrintingReadCount() {
|
||||||
fmt.Printf(" %d", nread)
|
fmt.Printf("reads,%d\n", nread)
|
||||||
}
|
}
|
||||||
|
|
||||||
if obicount.CLIIsPrintingSymbolCount() {
|
if obicount.CLIIsPrintingSymbolCount() {
|
||||||
fmt.Printf(" %d", nsymbol)
|
fmt.Printf("symbols,%d\n", nsymbol)
|
||||||
}
|
}
|
||||||
|
|
||||||
fmt.Printf("\n")
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,12 +3,10 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obicsv"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obicsv"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -17,14 +15,10 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
|
|
||||||
if err != nil {
|
obicsv.CLIWriteSequenceCSV(fs, true)
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
obicsv.CLIWriteCSV(fs, true)
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,34 +3,29 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obidemerge"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obidemerge"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
obioptions.SetStrictReadWorker(2)
|
obidefault.SetStrictReadWorker(2)
|
||||||
obioptions.SetStrictWriteWorker(2)
|
obidefault.SetStrictWriteWorker(2)
|
||||||
|
|
||||||
optionParser := obioptions.GenerateOptionParser(obidemerge.OptionSet)
|
optionParser := obioptions.GenerateOptionParser(obidemerge.OptionSet)
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
demerged := obidemerge.CLIDemergeSequences(fs)
|
demerged := obidemerge.CLIDemergeSequences(fs)
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(demerged, true)
|
obiconvert.CLIWriteBioSequences(demerged, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,11 +3,9 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obidistribute"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obidistribute"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
@@ -18,14 +16,10 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
obidistribute.CLIDistributeSequence(fs)
|
obidistribute.CLIDistributeSequence(fs)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,68 +0,0 @@
|
|||||||
package main
|
|
||||||
|
|
||||||
import (
|
|
||||||
"fmt"
|
|
||||||
"os"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obifind"
|
|
||||||
)
|
|
||||||
|
|
||||||
func main() {
|
|
||||||
optionParser := obioptions.GenerateOptionParser(obifind.OptionSet)
|
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
|
||||||
|
|
||||||
//prof, _ := os.Create("obifind.prof")
|
|
||||||
//pprof.StartCPUProfile(prof)
|
|
||||||
|
|
||||||
restrictions, err := obifind.ITaxonRestrictions()
|
|
||||||
if err != nil {
|
|
||||||
fmt.Printf("%+v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
switch {
|
|
||||||
case obifind.CLIRequestsPathForTaxid() >= 0:
|
|
||||||
taxonomy, err := obifind.CLILoadSelectedTaxonomy()
|
|
||||||
if err != nil {
|
|
||||||
fmt.Printf("%+v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
taxon, err := taxonomy.Taxon(obifind.CLIRequestsPathForTaxid())
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
fmt.Printf("%+v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
s, err := taxon.Path()
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
fmt.Printf("%+v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
obifind.TaxonWriter(s.Iterator(),
|
|
||||||
fmt.Sprintf("path:%d", taxon.Taxid()))
|
|
||||||
|
|
||||||
case len(args) == 0:
|
|
||||||
taxonomy, err := obifind.CLILoadSelectedTaxonomy()
|
|
||||||
if err != nil {
|
|
||||||
fmt.Printf("%+v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
obifind.TaxonWriter(restrictions(taxonomy.Iterator()), "")
|
|
||||||
|
|
||||||
default:
|
|
||||||
matcher, err := obifind.ITaxonNameMatcher()
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
fmt.Printf("%+v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
for _, pattern := range args {
|
|
||||||
s := restrictions(matcher(pattern))
|
|
||||||
obifind.TaxonWriter(s, pattern)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
//pprof.StopCPUProfile()
|
|
||||||
}
|
|
||||||
@@ -3,13 +3,11 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obigrep"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obigrep"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -37,13 +35,10 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
selected := obigrep.CLIFilterSequence(sequences)
|
selected := obigrep.CLIFilterSequence(sequences)
|
||||||
obiconvert.CLIWriteBioSequences(selected, true)
|
obiconvert.CLIWriteBioSequences(selected, true)
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,34 +3,29 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obijoin"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obijoin"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
obioptions.SetStrictReadWorker(2)
|
obidefault.SetStrictReadWorker(2)
|
||||||
obioptions.SetStrictWriteWorker(2)
|
obidefault.SetStrictWriteWorker(2)
|
||||||
|
|
||||||
optionParser := obioptions.GenerateOptionParser(obijoin.OptionSet)
|
optionParser := obioptions.GenerateOptionParser(obijoin.OptionSet)
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
joined := obijoin.CLIJoinSequences(fs)
|
joined := obijoin.CLIJoinSequences(fs)
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(joined, true)
|
obiconvert.CLIWriteBioSequences(joined, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
51
cmd/obitools/obikmermatch/main.go
Normal file
51
cmd/obitools/obikmermatch/main.go
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obikmersim"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
)
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
|
||||||
|
defer obiseq.LogBioSeqStatus()
|
||||||
|
|
||||||
|
// go tool pprof -http=":8000" ./obipairing ./cpu.pprof
|
||||||
|
// f, err := os.Create("cpu.pprof")
|
||||||
|
// if err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
// pprof.StartCPUProfile(f)
|
||||||
|
// defer pprof.StopCPUProfile()
|
||||||
|
|
||||||
|
// go tool trace cpu.trace
|
||||||
|
// ftrace, err := os.Create("cpu.trace")
|
||||||
|
// if err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
// trace.Start(ftrace)
|
||||||
|
// defer trace.Stop()
|
||||||
|
|
||||||
|
optionParser := obioptions.GenerateOptionParser(obikmersim.MatchOptionSet)
|
||||||
|
|
||||||
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
|
var err error
|
||||||
|
sequences := obiiter.NilIBioSequence
|
||||||
|
|
||||||
|
if !obikmersim.CLISelf() {
|
||||||
|
sequences, err = obiconvert.CLIReadBioSequences(args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
|
|
||||||
|
selected := obikmersim.CLIAlignSequences(sequences)
|
||||||
|
obiconvert.CLIWriteBioSequences(selected, true)
|
||||||
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
|
}
|
||||||
59
cmd/obitools/obikmersimcount/main.go
Normal file
59
cmd/obitools/obikmersimcount/main.go
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"log"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obikmersim"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
)
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
|
||||||
|
defer obiseq.LogBioSeqStatus()
|
||||||
|
|
||||||
|
// go tool pprof -http=":8000" ./obipairing ./cpu.pprof
|
||||||
|
// f, err := os.Create("cpu.pprof")
|
||||||
|
// if err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
// pprof.StartCPUProfile(f)
|
||||||
|
// defer pprof.StopCPUProfile()
|
||||||
|
|
||||||
|
// go tool trace cpu.trace
|
||||||
|
// ftrace, err := os.Create("cpu.trace")
|
||||||
|
// if err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
// trace.Start(ftrace)
|
||||||
|
// defer trace.Stop()
|
||||||
|
|
||||||
|
optionParser := obioptions.GenerateOptionParser(obikmersim.CountOptionSet)
|
||||||
|
|
||||||
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
|
var err error
|
||||||
|
sequences := obiiter.NilIBioSequence
|
||||||
|
|
||||||
|
if !obikmersim.CLISelf() {
|
||||||
|
sequences, err = obiconvert.CLIReadBioSequences(args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
|
|
||||||
|
counted := obikmersim.CLILookForSharedKmers(sequences)
|
||||||
|
topull, err := obiconvert.CLIWriteBioSequences(counted, false)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Panic(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
topull.Consume()
|
||||||
|
|
||||||
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
|
}
|
||||||
@@ -3,11 +3,9 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obilandmark"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obilandmark"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
@@ -18,15 +16,11 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
indexed := obilandmark.CLISelectLandmarkSequences(fs)
|
indexed := obilandmark.CLISelectLandmarkSequences(fs)
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(indexed, true)
|
obiconvert.CLIWriteBioSequences(indexed, true)
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -4,8 +4,6 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
@@ -39,11 +37,7 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
matrix := obimatrix.IMatrix(fs)
|
matrix := obimatrix.IMatrix(fs)
|
||||||
|
|
||||||
|
|||||||
44
cmd/obitools/obimicrosat/main.go
Normal file
44
cmd/obitools/obimicrosat/main.go
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obimicrosat"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
)
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
|
||||||
|
defer obiseq.LogBioSeqStatus()
|
||||||
|
|
||||||
|
// go tool pprof -http=":8000" ./obipairing ./cpu.pprof
|
||||||
|
// f, err := os.Create("cpu.pprof")
|
||||||
|
// if err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
// pprof.StartCPUProfile(f)
|
||||||
|
// defer pprof.StopCPUProfile()
|
||||||
|
|
||||||
|
// go tool trace cpu.trace
|
||||||
|
// ftrace, err := os.Create("cpu.trace")
|
||||||
|
// if err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
// trace.Start(ftrace)
|
||||||
|
// defer trace.Stop()
|
||||||
|
|
||||||
|
optionParser := obioptions.GenerateOptionParser(obimicrosat.OptionSet)
|
||||||
|
|
||||||
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
|
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
|
|
||||||
|
selected := obimicrosat.CLIAnnotateMicrosat(sequences)
|
||||||
|
obiconvert.CLIWriteBioSequences(selected, true)
|
||||||
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
|
}
|
||||||
@@ -6,10 +6,10 @@ import (
|
|||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obimultiplex"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obimultiplex"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -43,14 +43,11 @@ func main() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
amplicons, _ := obimultiplex.IExtractBarcode(sequences)
|
amplicons, _ := obimultiplex.IExtractBarcode(sequences)
|
||||||
obiconvert.CLIWriteBioSequences(amplicons, true)
|
obiconvert.CLIWriteBioSequences(amplicons, true)
|
||||||
amplicons.Wait()
|
amplicons.Wait()
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,10 +5,11 @@ import (
|
|||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obipairing"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obipairing"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -33,8 +34,8 @@ func main() {
|
|||||||
|
|
||||||
optionParser(os.Args)
|
optionParser(os.Args)
|
||||||
|
|
||||||
obioptions.SetStrictReadWorker(2)
|
obidefault.SetStrictReadWorker(2)
|
||||||
obioptions.SetStrictWriteWorker(2)
|
obidefault.SetStrictWriteWorker(2)
|
||||||
pairs, err := obipairing.CLIPairedSequence()
|
pairs, err := obipairing.CLIPairedSequence()
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -51,10 +52,10 @@ func main() {
|
|||||||
obipairing.CLIFastMode(),
|
obipairing.CLIFastMode(),
|
||||||
obipairing.CLIFastRelativeScore(),
|
obipairing.CLIFastRelativeScore(),
|
||||||
obipairing.CLIWithStats(),
|
obipairing.CLIWithStats(),
|
||||||
obioptions.CLIParallelWorkers(),
|
obidefault.ParallelWorkers(),
|
||||||
)
|
)
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(paired, true)
|
obiconvert.CLIWriteBioSequences(paired, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,12 +3,11 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obipcr"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obipcr"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -25,24 +24,20 @@ func main() {
|
|||||||
// trace.Start(ftrace)
|
// trace.Start(ftrace)
|
||||||
// defer trace.Stop()
|
// defer trace.Stop()
|
||||||
|
|
||||||
obioptions.SetWorkerPerCore(2)
|
obidefault.SetWorkerPerCore(2)
|
||||||
obioptions.SetReadWorkerPerCore(0.5)
|
obidefault.SetReadWorkerPerCore(0.5)
|
||||||
obioptions.SetParallelFilesRead(obioptions.CLIParallelWorkers() / 4)
|
obidefault.SetParallelFilesRead(obidefault.ParallelWorkers() / 4)
|
||||||
obioptions.SetBatchSize(10)
|
obidefault.SetBatchSize(10)
|
||||||
|
|
||||||
optionParser := obioptions.GenerateOptionParser(obipcr.OptionSet)
|
optionParser := obioptions.GenerateOptionParser(obipcr.OptionSet)
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
amplicons, _ := obipcr.CLIPCR(sequences)
|
amplicons, _ := obipcr.CLIPCR(sequences)
|
||||||
obiconvert.CLIWriteBioSequences(amplicons, true)
|
obiconvert.CLIWriteBioSequences(amplicons, true)
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,11 +3,9 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obirefidx"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obirefidx"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
@@ -18,15 +16,11 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
indexed := obirefidx.IndexFamilyDB(fs)
|
indexed := obirefidx.IndexFamilyDB(fs)
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(indexed, true)
|
obiconvert.CLIWriteBioSequences(indexed, true)
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,11 +3,9 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obirefidx"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obirefidx"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
@@ -18,14 +16,11 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
indexed := obirefidx.IndexReferenceDB(fs)
|
indexed := obirefidx.IndexReferenceDB(fs)
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(indexed, true)
|
obiconvert.CLIWriteBioSequences(indexed, true)
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -4,13 +4,11 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiscript"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiscript"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -43,15 +41,11 @@ func main() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
annotator := obiscript.CLIScriptPipeline()
|
annotator := obiscript.CLIScriptPipeline()
|
||||||
obiconvert.CLIWriteBioSequences(sequences.Pipe(annotator), true)
|
obiconvert.CLIWriteBioSequences(sequences.Pipe(annotator), true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -4,13 +4,11 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obisplit"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obisplit"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -43,15 +41,11 @@ func main() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
annotator := obisplit.CLISlitPipeline()
|
annotator := obisplit.CLISlitPipeline()
|
||||||
obiconvert.CLIWriteBioSequences(sequences.Pipe(annotator), true)
|
obiconvert.CLIWriteBioSequences(sequences.Pipe(annotator), true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,7 +5,6 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
"gopkg.in/yaml.v3"
|
"gopkg.in/yaml.v3"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
@@ -39,11 +38,7 @@ func main() {
|
|||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
summary := obisummary.ISummary(fs, obisummary.CLIMapSummary())
|
summary := obisummary.ISummary(fs, obisummary.CLIMapSummary())
|
||||||
|
|
||||||
|
|||||||
@@ -6,10 +6,12 @@ import (
|
|||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitax"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obifind"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obitag"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obitag"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
)
|
)
|
||||||
@@ -32,25 +34,21 @@ func main() {
|
|||||||
// trace.Start(ftrace)
|
// trace.Start(ftrace)
|
||||||
// defer trace.Stop()
|
// defer trace.Stop()
|
||||||
|
|
||||||
obioptions.SetWorkerPerCore(2)
|
obidefault.SetWorkerPerCore(2)
|
||||||
obioptions.SetStrictReadWorker(1)
|
obidefault.SetStrictReadWorker(1)
|
||||||
obioptions.SetStrictWriteWorker(1)
|
obidefault.SetStrictWriteWorker(1)
|
||||||
obioptions.SetBatchSize(10)
|
obidefault.SetBatchSize(10)
|
||||||
|
|
||||||
optionParser := obioptions.GenerateOptionParser(obitag.OptionSet)
|
optionParser := obioptions.GenerateOptionParser(obitag.OptionSet)
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
fs, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
|
|
||||||
if err != nil {
|
taxo := obitax.DefaultTaxonomy()
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
if taxo == nil {
|
||||||
os.Exit(1)
|
log.Panicln("No loaded taxonomy")
|
||||||
}
|
|
||||||
|
|
||||||
taxo, error := obifind.CLILoadSelectedTaxonomy()
|
|
||||||
if error != nil {
|
|
||||||
log.Panicln(error)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
references := obitag.CLIRefDB()
|
references := obitag.CLIRefDB()
|
||||||
@@ -64,7 +62,7 @@ func main() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(identified, true)
|
obiconvert.CLIWriteBioSequences(identified, true)
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
obitag.CLISaveRefetenceDB(references)
|
obitag.CLISaveRefetenceDB(references)
|
||||||
|
|
||||||
|
|||||||
@@ -5,11 +5,12 @@ import (
|
|||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obipairing"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obipairing"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obitagpcr"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obitagpcr"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -30,7 +31,7 @@ func main() {
|
|||||||
// trace.Start(ftrace)
|
// trace.Start(ftrace)
|
||||||
// defer trace.Stop()
|
// defer trace.Stop()
|
||||||
|
|
||||||
obioptions.SetWorkerPerCore(1)
|
obidefault.SetWorkerPerCore(1)
|
||||||
|
|
||||||
optionParser := obioptions.GenerateOptionParser(obitagpcr.OptionSet)
|
optionParser := obioptions.GenerateOptionParser(obitagpcr.OptionSet)
|
||||||
|
|
||||||
@@ -54,5 +55,5 @@ func main() {
|
|||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(paired, true)
|
obiconvert.CLIWriteBioSequences(paired, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
}
|
}
|
||||||
|
|||||||
71
cmd/obitools/obitaxonomy/main.go
Normal file
71
cmd/obitools/obitaxonomy/main.go
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"log"
|
||||||
|
"os"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitax"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obitaxonomy"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
)
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
optionParser := obioptions.GenerateOptionParser(obitaxonomy.OptionSet)
|
||||||
|
|
||||||
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
|
var iterator *obitax.ITaxon
|
||||||
|
|
||||||
|
switch {
|
||||||
|
|
||||||
|
case obitaxonomy.CLIDumpSubtaxonomy():
|
||||||
|
iterator = obitaxonomy.CLISubTaxonomyIterator()
|
||||||
|
|
||||||
|
case obitaxonomy.CLIRequestsPathForTaxid() != "NA":
|
||||||
|
|
||||||
|
taxon := obitax.DefaultTaxonomy().Taxon(obitaxonomy.CLIRequestsPathForTaxid())
|
||||||
|
|
||||||
|
if taxon == nil {
|
||||||
|
log.Fatalf("Cannot identify the requested taxon: %s",
|
||||||
|
obitaxonomy.CLIRequestsPathForTaxid())
|
||||||
|
}
|
||||||
|
|
||||||
|
s := taxon.Path()
|
||||||
|
|
||||||
|
if s == nil {
|
||||||
|
log.Fatalf("Cannot extract taxonomic path describing %s", taxon.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator = s.Iterator()
|
||||||
|
|
||||||
|
if obitaxonomy.CLIWithQuery() {
|
||||||
|
iterator = iterator.AddMetadata("query", taxon.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
case len(args) == 0:
|
||||||
|
iterator = obitax.DefaultTaxonomy().Iterator()
|
||||||
|
default:
|
||||||
|
iters := make([]*obitax.ITaxon, len(args))
|
||||||
|
|
||||||
|
for i, pat := range args {
|
||||||
|
ii := obitax.DefaultTaxonomy().IFilterOnName(pat, obitaxonomy.CLIFixedPattern(), true)
|
||||||
|
if obitaxonomy.CLIWithQuery() {
|
||||||
|
ii = ii.AddMetadata("query", pat)
|
||||||
|
}
|
||||||
|
iters[i] = ii
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator = iters[0]
|
||||||
|
|
||||||
|
if len(iters) > 1 {
|
||||||
|
iterator = iterator.Concat(iters[1:]...)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator = obitaxonomy.CLITaxonRestrictions(iterator)
|
||||||
|
obitaxonomy.CLICSVTaxaWriter(iterator, true)
|
||||||
|
|
||||||
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
|
}
|
||||||
@@ -3,13 +3,12 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiuniq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiuniq"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
@@ -32,20 +31,18 @@ func main() {
|
|||||||
// trace.Start(ftrace)
|
// trace.Start(ftrace)
|
||||||
// defer trace.Stop()
|
// defer trace.Stop()
|
||||||
|
|
||||||
|
obidefault.SetBatchSize(10)
|
||||||
|
obidefault.SetReadQualities(false)
|
||||||
optionParser := obioptions.GenerateOptionParser(obiuniq.OptionSet)
|
optionParser := obioptions.GenerateOptionParser(obiuniq.OptionSet)
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
_, args := optionParser(os.Args)
|
||||||
|
|
||||||
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
sequences, err := obiconvert.CLIReadBioSequences(args...)
|
||||||
|
obiconvert.OpenSequenceDataErrorMessage(args, err)
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
unique := obiuniq.CLIUnique(sequences)
|
unique := obiuniq.CLIUnique(sequences)
|
||||||
obiconvert.CLIWriteBioSequences(unique, true)
|
obiconvert.CLIWriteBioSequences(unique, true)
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
obiutils.WaitForLastPipe()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,36 +3,14 @@ package main
|
|||||||
import (
|
import (
|
||||||
"os"
|
"os"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitax"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitools/obiconvert"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
optionParser := obioptions.GenerateOptionParser(obiconvert.OptionSet)
|
|
||||||
|
|
||||||
_, args := optionParser(os.Args)
|
obitax.DetectTaxonomyFormat(os.Args[1])
|
||||||
|
println(obiutils.RemoveAllExt("toto/tutu/test.txt"))
|
||||||
fs, err := obiconvert.CLIReadBioSequences(args...)
|
println(obiutils.Basename("toto/tutu/test.txt"))
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Errorf("Cannot open file (%v)", err)
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
frags := obiiter.IFragments(
|
|
||||||
1000,
|
|
||||||
100,
|
|
||||||
10,
|
|
||||||
100,
|
|
||||||
obioptions.CLIParallelWorkers(),
|
|
||||||
)
|
|
||||||
|
|
||||||
obiconvert.CLIWriteBioSequences(fs.Pipe(frags), true)
|
|
||||||
|
|
||||||
obiiter.WaitForLastPipe()
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
13
go.mod
13
go.mod
@@ -1,12 +1,13 @@
|
|||||||
module git.metabarcoding.org/obitools/obitools4/obitools4
|
module git.metabarcoding.org/obitools/obitools4/obitools4
|
||||||
|
|
||||||
go 1.22.1
|
go 1.23.1
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/DavidGamba/go-getoptions v0.28.0
|
github.com/DavidGamba/go-getoptions v0.28.0
|
||||||
github.com/PaesslerAG/gval v1.2.2
|
github.com/PaesslerAG/gval v1.2.2
|
||||||
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df
|
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df
|
||||||
github.com/chen3feng/stl4go v0.1.1
|
github.com/chen3feng/stl4go v0.1.1
|
||||||
|
github.com/dlclark/regexp2 v1.11.4
|
||||||
github.com/goccy/go-json v0.10.3
|
github.com/goccy/go-json v0.10.3
|
||||||
github.com/klauspost/pgzip v1.2.6
|
github.com/klauspost/pgzip v1.2.6
|
||||||
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58
|
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58
|
||||||
@@ -23,20 +24,14 @@ require (
|
|||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/bytedance/sonic v1.11.9 // indirect
|
github.com/Clever/csvlint v0.3.0 // indirect
|
||||||
github.com/bytedance/sonic/loader v0.1.1 // indirect
|
github.com/buger/jsonparser v1.1.1 // indirect
|
||||||
github.com/cloudwego/base64x v0.1.4 // indirect
|
|
||||||
github.com/cloudwego/iasm v0.2.0 // indirect
|
|
||||||
github.com/davecgh/go-spew v1.1.1 // indirect
|
github.com/davecgh/go-spew v1.1.1 // indirect
|
||||||
github.com/goombaio/orderedmap v0.0.0-20180924084748-ba921b7e2419 // indirect
|
github.com/goombaio/orderedmap v0.0.0-20180924084748-ba921b7e2419 // indirect
|
||||||
github.com/klauspost/cpuid/v2 v2.0.9 // indirect
|
|
||||||
github.com/kr/pretty v0.3.0 // indirect
|
github.com/kr/pretty v0.3.0 // indirect
|
||||||
github.com/kr/text v0.2.0 // indirect
|
github.com/kr/text v0.2.0 // indirect
|
||||||
github.com/montanaflynn/stats v0.7.1 // indirect
|
|
||||||
github.com/pmezard/go-difflib v1.0.0 // indirect
|
github.com/pmezard/go-difflib v1.0.0 // indirect
|
||||||
github.com/rogpeppe/go-internal v1.6.1 // indirect
|
github.com/rogpeppe/go-internal v1.6.1 // indirect
|
||||||
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
|
||||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670 // indirect
|
|
||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
|
|||||||
34
go.sum
34
go.sum
@@ -1,3 +1,5 @@
|
|||||||
|
github.com/Clever/csvlint v0.3.0 h1:58WEFXWy+i0fCbxTXscR2QwYESRuAUFjEGLgZs6j2iU=
|
||||||
|
github.com/Clever/csvlint v0.3.0/go.mod h1:+wLRuW/bI8NhpRoeyUBxqKsK35OhvgJhXHSWdKp5XJU=
|
||||||
github.com/DavidGamba/go-getoptions v0.28.0 h1:18wgEvfZdrlfIhVDGEBO3Dl0fkOyXqXLa0tLMCKxM1c=
|
github.com/DavidGamba/go-getoptions v0.28.0 h1:18wgEvfZdrlfIhVDGEBO3Dl0fkOyXqXLa0tLMCKxM1c=
|
||||||
github.com/DavidGamba/go-getoptions v0.28.0/go.mod h1:zE97E3PR9P3BI/HKyNYgdMlYxodcuiC6W68KIgeYT84=
|
github.com/DavidGamba/go-getoptions v0.28.0/go.mod h1:zE97E3PR9P3BI/HKyNYgdMlYxodcuiC6W68KIgeYT84=
|
||||||
github.com/PaesslerAG/gval v1.2.2 h1:Y7iBzhgE09IGTt5QgGQ2IdaYYYOU134YGHBThD+wm9E=
|
github.com/PaesslerAG/gval v1.2.2 h1:Y7iBzhgE09IGTt5QgGQ2IdaYYYOU134YGHBThD+wm9E=
|
||||||
@@ -6,27 +8,21 @@ github.com/PaesslerAG/jsonpath v0.1.0 h1:gADYeifvlqK3R3i2cR5B4DGgxLXIPb3TRTH1mGi
|
|||||||
github.com/PaesslerAG/jsonpath v0.1.0/go.mod h1:4BzmtoM/PI8fPO4aQGIusjGxGir2BzcV0grWtFzq1Y8=
|
github.com/PaesslerAG/jsonpath v0.1.0/go.mod h1:4BzmtoM/PI8fPO4aQGIusjGxGir2BzcV0grWtFzq1Y8=
|
||||||
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df h1:GSoSVRLoBaFpOOds6QyY1L8AX7uoY+Ln3BHc22W40X0=
|
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df h1:GSoSVRLoBaFpOOds6QyY1L8AX7uoY+Ln3BHc22W40X0=
|
||||||
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df/go.mod h1:hiVxq5OP2bUGBRNS3Z/bt/reCLFNbdcST6gISi1fiOM=
|
github.com/barkimedes/go-deepcopy v0.0.0-20220514131651-17c30cfc62df/go.mod h1:hiVxq5OP2bUGBRNS3Z/bt/reCLFNbdcST6gISi1fiOM=
|
||||||
github.com/bytedance/sonic v1.11.9 h1:LFHENlIY/SLzDWverzdOvgMztTxcfcF+cqNsz9pK5zg=
|
github.com/buger/jsonparser v1.1.1 h1:2PnMjfWD7wBILjqQbt530v576A/cAbQvEW9gGIpYMUs=
|
||||||
github.com/bytedance/sonic v1.11.9/go.mod h1:LysEHSvpvDySVdC2f87zGWf6CIKJcAvqab1ZaiQtds4=
|
github.com/buger/jsonparser v1.1.1/go.mod h1:6RYKKt7H4d4+iWqouImQ9R2FZql3VbhNgx27UK13J/0=
|
||||||
github.com/bytedance/sonic/loader v0.1.1 h1:c+e5Pt1k/cy5wMveRDyk2X4B9hF4g7an8N3zCYjJFNM=
|
|
||||||
github.com/bytedance/sonic/loader v0.1.1/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4yY2JpfqGeCtNLU=
|
|
||||||
github.com/chen3feng/stl4go v0.1.1 h1:0L1+mDw7pomftKDruM23f1mA7miavOj6C6MZeadzN2Q=
|
github.com/chen3feng/stl4go v0.1.1 h1:0L1+mDw7pomftKDruM23f1mA7miavOj6C6MZeadzN2Q=
|
||||||
github.com/chen3feng/stl4go v0.1.1/go.mod h1:5ml3psLgETJjRJnMbPE+JiHLrCpt+Ajc2weeTECXzWU=
|
github.com/chen3feng/stl4go v0.1.1/go.mod h1:5ml3psLgETJjRJnMbPE+JiHLrCpt+Ajc2weeTECXzWU=
|
||||||
github.com/cloudwego/base64x v0.1.4 h1:jwCgWpFanWmN8xoIUHa2rtzmkd5J2plF/dnLS6Xd/0Y=
|
|
||||||
github.com/cloudwego/base64x v0.1.4/go.mod h1:0zlkT4Wn5C6NdauXdJRhSKRlJvmclQ1hhJgA0rcu/8w=
|
|
||||||
github.com/cloudwego/iasm v0.2.0 h1:1KNIy1I1H9hNNFEEH3DVnI4UujN+1zjpuk6gwHLTssg=
|
|
||||||
github.com/cloudwego/iasm v0.2.0/go.mod h1:8rXZaNYT2n95jn+zTI1sDr+IgcD2GVs0nlbbQPiEFhY=
|
|
||||||
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
|
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
|
||||||
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
||||||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||||
|
github.com/dlclark/regexp2 v1.11.4 h1:rPYF9/LECdNymJufQKmri9gV604RvvABwgOA8un7yAo=
|
||||||
|
github.com/dlclark/regexp2 v1.11.4/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
|
||||||
github.com/dsnet/compress v0.0.1 h1:PlZu0n3Tuv04TzpfPbrnI0HW/YwodEXDS+oPKahKF0Q=
|
github.com/dsnet/compress v0.0.1 h1:PlZu0n3Tuv04TzpfPbrnI0HW/YwodEXDS+oPKahKF0Q=
|
||||||
github.com/dsnet/compress v0.0.1/go.mod h1:Aw8dCMJ7RioblQeTqt88akK31OvO8Dhf5JflhBbQEHo=
|
github.com/dsnet/compress v0.0.1/go.mod h1:Aw8dCMJ7RioblQeTqt88akK31OvO8Dhf5JflhBbQEHo=
|
||||||
github.com/dsnet/golib v0.0.0-20171103203638-1ea166775780/go.mod h1:Lj+Z9rebOhdfkVLjJ8T6VcRQv3SXugXy999NBtR9aFY=
|
github.com/dsnet/golib v0.0.0-20171103203638-1ea166775780/go.mod h1:Lj+Z9rebOhdfkVLjJ8T6VcRQv3SXugXy999NBtR9aFY=
|
||||||
github.com/gabriel-vasile/mimetype v1.4.3 h1:in2uUcidCuFcDKtdcBxlR0rJ1+fsokWf+uqxgUFjbI0=
|
github.com/gabriel-vasile/mimetype v1.4.3 h1:in2uUcidCuFcDKtdcBxlR0rJ1+fsokWf+uqxgUFjbI0=
|
||||||
github.com/gabriel-vasile/mimetype v1.4.3/go.mod h1:d8uq/6HKRL6CGdk+aubisF/M5GcPfT7nKyLpA0lbSSk=
|
github.com/gabriel-vasile/mimetype v1.4.3/go.mod h1:d8uq/6HKRL6CGdk+aubisF/M5GcPfT7nKyLpA0lbSSk=
|
||||||
github.com/goccy/go-json v0.10.2 h1:CrxCmQqYDkv1z7lO7Wbh2HN93uovUHgrECaO5ZrCXAU=
|
|
||||||
github.com/goccy/go-json v0.10.2/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
|
||||||
github.com/goccy/go-json v0.10.3 h1:KZ5WoDbxAIgm2HNbYckL0se1fHD6rz5j4ywS6ebzDqA=
|
github.com/goccy/go-json v0.10.3 h1:KZ5WoDbxAIgm2HNbYckL0se1fHD6rz5j4ywS6ebzDqA=
|
||||||
github.com/goccy/go-json v0.10.3/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M=
|
github.com/goccy/go-json v0.10.3/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M=
|
||||||
github.com/goombaio/orderedmap v0.0.0-20180924084748-ba921b7e2419 h1:SajEQ6tktpF9SRIuzbiPOX9AEZZ53Bvw0k9Mzrts8Lg=
|
github.com/goombaio/orderedmap v0.0.0-20180924084748-ba921b7e2419 h1:SajEQ6tktpF9SRIuzbiPOX9AEZZ53Bvw0k9Mzrts8Lg=
|
||||||
@@ -37,13 +33,9 @@ github.com/k0kubun/go-ansi v0.0.0-20180517002512-3bf9e2903213/go.mod h1:vNUNkEQ1
|
|||||||
github.com/klauspost/compress v1.4.1/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
|
github.com/klauspost/compress v1.4.1/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
|
||||||
github.com/klauspost/compress v1.17.2 h1:RlWWUY/Dr4fL8qk9YG7DTZ7PDgME2V4csBXA8L/ixi4=
|
github.com/klauspost/compress v1.17.2 h1:RlWWUY/Dr4fL8qk9YG7DTZ7PDgME2V4csBXA8L/ixi4=
|
||||||
github.com/klauspost/compress v1.17.2/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
|
github.com/klauspost/compress v1.17.2/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
|
||||||
github.com/klauspost/cpuid v1.2.0 h1:NMpwD2G9JSFOE1/TJjGSo5zG7Yb2bTe7eq1jH+irmeE=
|
|
||||||
github.com/klauspost/cpuid v1.2.0/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=
|
github.com/klauspost/cpuid v1.2.0/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=
|
||||||
github.com/klauspost/cpuid/v2 v2.0.9 h1:lgaqFMSdTdQYdZ04uHyN2d/eKdOMyi2YLSvlQIBFYa4=
|
|
||||||
github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
|
|
||||||
github.com/klauspost/pgzip v1.2.6 h1:8RXeL5crjEUFnR2/Sn6GJNWtSQ3Dk8pq4CL3jvdDyjU=
|
github.com/klauspost/pgzip v1.2.6 h1:8RXeL5crjEUFnR2/Sn6GJNWtSQ3Dk8pq4CL3jvdDyjU=
|
||||||
github.com/klauspost/pgzip v1.2.6/go.mod h1:Ch1tH69qFZu15pkjo5kYi6mth2Zzwzt50oCQKQE9RUs=
|
github.com/klauspost/pgzip v1.2.6/go.mod h1:Ch1tH69qFZu15pkjo5kYi6mth2Zzwzt50oCQKQE9RUs=
|
||||||
github.com/knz/go-libedit v1.10.1/go.mod h1:MZTVkCWyz0oBc7JOWP3wNAzd002ZbM/5hgShxwh4x8M=
|
|
||||||
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
|
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
|
||||||
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
|
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
|
||||||
github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
|
github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
|
||||||
@@ -58,8 +50,6 @@ github.com/mattn/go-runewidth v0.0.15 h1:UNAjwbU9l54TA3KzvqLGxwWjHmMgBUVhBiTjelZ
|
|||||||
github.com/mattn/go-runewidth v0.0.15/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
|
github.com/mattn/go-runewidth v0.0.15/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
|
||||||
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db h1:62I3jR2EmQ4l5rM/4FEfDWcRD+abF5XlKShorW5LRoQ=
|
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db h1:62I3jR2EmQ4l5rM/4FEfDWcRD+abF5XlKShorW5LRoQ=
|
||||||
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db/go.mod h1:l0dey0ia/Uv7NcFFVbCLtqEBQbrT4OCwCSKTEv6enCw=
|
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db/go.mod h1:l0dey0ia/Uv7NcFFVbCLtqEBQbrT4OCwCSKTEv6enCw=
|
||||||
github.com/montanaflynn/stats v0.7.1 h1:etflOAAHORrCC44V+aR6Ftzort912ZU+YLiSTuV8eaE=
|
|
||||||
github.com/montanaflynn/stats v0.7.1/go.mod h1:etXPPgVO6n31NxCd9KQUMvCM+ve0ruNzt6R8Bnaayow=
|
|
||||||
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58 h1:onHthvaw9LFnH4t2DcNVpwGmV9E1BkGknEliJkfwQj0=
|
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58 h1:onHthvaw9LFnH4t2DcNVpwGmV9E1BkGknEliJkfwQj0=
|
||||||
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58/go.mod h1:DXv8WO4yhMYhSNPKjeNKa5WY9YCIEBRbNzFFPJbWO6Y=
|
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58/go.mod h1:DXv8WO4yhMYhSNPKjeNKa5WY9YCIEBRbNzFFPJbWO6Y=
|
||||||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
||||||
@@ -78,26 +68,18 @@ github.com/shopspring/decimal v1.3.1/go.mod h1:DKyhrW/HYNuLGql+MJL6WCR6knT2jwCFR
|
|||||||
github.com/sirupsen/logrus v1.9.3 h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ=
|
github.com/sirupsen/logrus v1.9.3 h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ=
|
||||||
github.com/sirupsen/logrus v1.9.3/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ=
|
github.com/sirupsen/logrus v1.9.3/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ=
|
||||||
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
|
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
|
||||||
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
|
|
||||||
github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
|
|
||||||
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
|
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
|
||||||
|
github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
||||||
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
||||||
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
|
||||||
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
|
|
||||||
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
|
|
||||||
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
|
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
|
||||||
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
||||||
github.com/tevino/abool/v2 v2.1.0 h1:7w+Vf9f/5gmKT4m4qkayb33/92M+Um45F2BkHOR+L/c=
|
github.com/tevino/abool/v2 v2.1.0 h1:7w+Vf9f/5gmKT4m4qkayb33/92M+Um45F2BkHOR+L/c=
|
||||||
github.com/tevino/abool/v2 v2.1.0/go.mod h1:+Lmlqk6bHDWHqN1cbxqhwEAwMPXgc8I1SDEamtseuXY=
|
github.com/tevino/abool/v2 v2.1.0/go.mod h1:+Lmlqk6bHDWHqN1cbxqhwEAwMPXgc8I1SDEamtseuXY=
|
||||||
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
|
|
||||||
github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
|
|
||||||
github.com/ulikunitz/xz v0.5.6/go.mod h1:2bypXElzHzzJZwzH67Y6wb67pO62Rzfn7BSiF4ABRW8=
|
github.com/ulikunitz/xz v0.5.6/go.mod h1:2bypXElzHzzJZwzH67Y6wb67pO62Rzfn7BSiF4ABRW8=
|
||||||
github.com/ulikunitz/xz v0.5.11 h1:kpFauv27b6ynzBNT/Xy+1k+fK4WswhN/6PN5WhFAGw8=
|
github.com/ulikunitz/xz v0.5.11 h1:kpFauv27b6ynzBNT/Xy+1k+fK4WswhN/6PN5WhFAGw8=
|
||||||
github.com/ulikunitz/xz v0.5.11/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
|
github.com/ulikunitz/xz v0.5.11/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
|
||||||
github.com/yuin/gopher-lua v1.1.1 h1:kYKnWBjvbNP4XLT3+bPEwAXJx262OhaHDWDVOPjL46M=
|
github.com/yuin/gopher-lua v1.1.1 h1:kYKnWBjvbNP4XLT3+bPEwAXJx262OhaHDWDVOPjL46M=
|
||||||
github.com/yuin/gopher-lua v1.1.1/go.mod h1:GBR0iDaNXjAgGg9zfCvksxSRnQx76gclCIb7kdAd1Pw=
|
github.com/yuin/gopher-lua v1.1.1/go.mod h1:GBR0iDaNXjAgGg9zfCvksxSRnQx76gclCIb7kdAd1Pw=
|
||||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670 h1:18EFjUmQOcUvxNYSkA6jO9VAiXCnxFY6NyDX0bHDmkU=
|
|
||||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
|
||||||
golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI=
|
golang.org/x/exp v0.0.0-20231006140011-7918f672742d h1:jtJma62tbqLibJ5sFQz8bKtEM8rJBtfilJ2qTU199MI=
|
||||||
golang.org/x/exp v0.0.0-20231006140011-7918f672742d/go.mod h1:ldy0pHrwJyGW56pPQzzkH36rKxoZW1tw7ZJpeKx+hdo=
|
golang.org/x/exp v0.0.0-20231006140011-7918f672742d/go.mod h1:ldy0pHrwJyGW56pPQzzkH36rKxoZW1tw7ZJpeKx+hdo=
|
||||||
golang.org/x/net v0.17.0 h1:pVaXccu2ozPjCXewfr1S7xza/zcXTity9cCdXQYSjIM=
|
golang.org/x/net v0.17.0 h1:pVaXccu2ozPjCXewfr1S7xza/zcXTity9cCdXQYSjIM=
|
||||||
@@ -120,8 +102,6 @@ gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI=
|
|||||||
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||||
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||||
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||||
nullprogram.com/x/optparse v1.0.0/go.mod h1:KdyPE+Igbe0jQUrVfMqDMeJQIJZEuyV7pjYmp6pbG50=
|
|
||||||
rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
|
|
||||||
scientificgo.org/special v0.0.0 h1:P6WJkECo6tgtvZAEfNXl+KEB9ReAatjKAeX8U07mjSc=
|
scientificgo.org/special v0.0.0 h1:P6WJkECo6tgtvZAEfNXl+KEB9ReAatjKAeX8U07mjSc=
|
||||||
scientificgo.org/special v0.0.0/go.mod h1:LoGVh9tS431RLTJo7gFlYDKFWq44cEb7QqL+M0EKtZU=
|
scientificgo.org/special v0.0.0/go.mod h1:LoGVh9tS431RLTJo7gFlYDKFWq44cEb7QqL+M0EKtZU=
|
||||||
scientificgo.org/testutil v0.0.0 h1:y356DHRo0tAz9zIFmxlhZoKDlHPHaWW/DCm9k3PhIMA=
|
scientificgo.org/testutil v0.0.0 h1:y356DHRo0tAz9zIFmxlhZoKDlHPHaWW/DCm9k3PhIMA=
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ INSTALL_DIR="/usr/local"
|
|||||||
OBITOOLS_PREFIX=""
|
OBITOOLS_PREFIX=""
|
||||||
# default values
|
# default values
|
||||||
URL="https://go.dev/dl/"
|
URL="https://go.dev/dl/"
|
||||||
OBIURL4="https://git.metabarcoding.org/obitools/obitools4/obitools4/-/archive/master/obitools4-master.tar.gz"
|
OBIURL4="https://github.com/metabarcoding/obitools4/archive/refs/heads/master.zip"
|
||||||
INSTALL_DIR="/usr/local"
|
INSTALL_DIR="/usr/local"
|
||||||
OBITOOLS_PREFIX=""
|
OBITOOLS_PREFIX=""
|
||||||
|
|
||||||
@@ -106,8 +106,10 @@ curl "$GOURL" \
|
|||||||
PATH="$(pwd)/go/bin:$PATH"
|
PATH="$(pwd)/go/bin:$PATH"
|
||||||
export PATH
|
export PATH
|
||||||
|
|
||||||
curl -L "$OBIURL4" \
|
curl -L "$OBIURL4" > master.zip
|
||||||
| tar zxf -
|
unzip master.zip
|
||||||
|
|
||||||
|
echo "Install OBITOOLS from : $OBIURL4"
|
||||||
|
|
||||||
cd obitools4-master || exit
|
cd obitools4-master || exit
|
||||||
|
|
||||||
|
|||||||
@@ -12,7 +12,7 @@ func _Backtracking(pathMatrix []int, lseqA, lseqB int, path *[]int) []int {
|
|||||||
cp := cap(*path)
|
cp := cap(*path)
|
||||||
(*path) = slices.Grow((*path), needed)
|
(*path) = slices.Grow((*path), needed)
|
||||||
if cp < cap(*path) {
|
if cp < cap(*path) {
|
||||||
log.Infof("Resized path from %d to %d\n", cp, cap(*path))
|
log.Debugf("Resized path from %d to %d\n", cp, cap(*path))
|
||||||
}
|
}
|
||||||
p := cap(*path)
|
p := cap(*path)
|
||||||
*path = (*path)[:p]
|
*path = (*path)[:p]
|
||||||
|
|||||||
@@ -1,30 +1,73 @@
|
|||||||
package obialign
|
package obialign
|
||||||
|
|
||||||
|
import log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
|
// buffIndex converts a pair of coordinates (i, j) into a linear index in a matrix
|
||||||
|
// of size width x width. The coordinates are (-1)-indexed, and the linear index
|
||||||
|
// is 0-indexed as well. The function first adds 1 to both coordinates to make
|
||||||
|
// sure the (-1,-1) coordinate is at position 0 in the matrix, and then computes
|
||||||
|
// the linear index by multiplying the first coordinate by the width and adding
|
||||||
|
// the second coordinate.
|
||||||
func buffIndex(i, j, width int) int {
|
func buffIndex(i, j, width int) int {
|
||||||
return (i+1)*width + (j + 1)
|
return (i+1)*width + (j + 1)
|
||||||
}
|
}
|
||||||
func LocatePattern(pattern, sequence []byte) (int, int, int) {
|
|
||||||
|
// LocatePattern is a function to locate a pattern in a sequence.
|
||||||
|
//
|
||||||
|
// It uses a dynamic programming approach to build a matrix of scores.
|
||||||
|
// The score at each cell is the maximum of the score of the cell
|
||||||
|
// above it (representing a deletion), the score of the cell to its
|
||||||
|
// left (representing an insertion), and the score of the cell
|
||||||
|
// diagonally above it (representing a match).
|
||||||
|
//
|
||||||
|
// The score of a match is 0 if the two characters are the same,
|
||||||
|
// and -1 if they are different.
|
||||||
|
//
|
||||||
|
// The function returns the start and end positions of the best
|
||||||
|
// match, as well as the number of errors in the best match.
|
||||||
|
func LocatePattern(id string, pattern, sequence []byte) (int, int, int) {
|
||||||
|
|
||||||
|
if len(pattern) >= len(sequence) {
|
||||||
|
log.Panicf("Sequence %s:Pattern %s must be shorter than sequence %s", id, pattern, sequence)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pattern spreads over the columns
|
||||||
|
// Sequence spreads over the rows
|
||||||
width := len(pattern) + 1
|
width := len(pattern) + 1
|
||||||
buffsize := (len(pattern) + 1) * (len(sequence) + 1)
|
buffsize := (len(pattern) + 1) * (len(sequence) + 1)
|
||||||
buffer := make([]int, buffsize)
|
buffer := make([]int, buffsize)
|
||||||
|
|
||||||
|
// The path matrix keeps track of the best path through the matrix
|
||||||
|
// 0 : indicate the diagonal path
|
||||||
|
// 1 : indicate the up path
|
||||||
|
// -1 : indicate the left path
|
||||||
path := make([]int, buffsize)
|
path := make([]int, buffsize)
|
||||||
|
|
||||||
|
// Initialize the first row of the matrix
|
||||||
for j := 0; j < len(pattern); j++ {
|
for j := 0; j < len(pattern); j++ {
|
||||||
idx := buffIndex(-1, j, width)
|
idx := buffIndex(-1, j, width)
|
||||||
buffer[idx] = -j - 1
|
buffer[idx] = -j - 1
|
||||||
path[idx] = -1
|
path[idx] = -1
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Initialize the first column of the matrix
|
||||||
|
// Alignment is endgap free so first column = 0
|
||||||
|
// to allow primer to shift freely along the sequence
|
||||||
for i := -1; i < len(sequence); i++ {
|
for i := -1; i < len(sequence); i++ {
|
||||||
idx := buffIndex(i, -1, width)
|
idx := buffIndex(i, -1, width)
|
||||||
buffer[idx] = 0
|
buffer[idx] = 0
|
||||||
path[idx] = +1
|
path[idx] = +1
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Fills the matrix except the last column
|
||||||
|
// where gaps must be free too.
|
||||||
path[0] = 0
|
path[0] = 0
|
||||||
jmax := len(pattern) - 1
|
jmax := len(pattern) - 1
|
||||||
for i := 0; i < len(sequence); i++ {
|
for i := 0; i < len(sequence); i++ {
|
||||||
for j := 0; j < jmax; j++ {
|
for j := 0; j < jmax; j++ {
|
||||||
|
|
||||||
|
// Mismatch score = -1
|
||||||
|
// Match score = 0
|
||||||
match := -1
|
match := -1
|
||||||
if _samenuc(pattern[j], sequence[i]) {
|
if _samenuc(pattern[j], sequence[i]) {
|
||||||
match = 0
|
match = 0
|
||||||
@@ -33,6 +76,8 @@ func LocatePattern(pattern, sequence []byte) (int, int, int) {
|
|||||||
idx := buffIndex(i, j, width)
|
idx := buffIndex(i, j, width)
|
||||||
|
|
||||||
diag := buffer[buffIndex(i-1, j-1, width)] + match
|
diag := buffer[buffIndex(i-1, j-1, width)] + match
|
||||||
|
|
||||||
|
// Each gap cost -1
|
||||||
left := buffer[buffIndex(i, j-1, width)] - 1
|
left := buffer[buffIndex(i, j-1, width)] - 1
|
||||||
up := buffer[buffIndex(i-1, j, width)] - 1
|
up := buffer[buffIndex(i-1, j, width)] - 1
|
||||||
|
|
||||||
@@ -51,9 +96,12 @@ func LocatePattern(pattern, sequence []byte) (int, int, int) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Fills the last column considering the free up gap
|
||||||
for i := 0; i < len(sequence); i++ {
|
for i := 0; i < len(sequence); i++ {
|
||||||
idx := buffIndex(i, jmax, width)
|
idx := buffIndex(i, jmax, width)
|
||||||
|
|
||||||
|
// Mismatch score = -1
|
||||||
|
// Match score = 0
|
||||||
match := -1
|
match := -1
|
||||||
if _samenuc(pattern[jmax], sequence[i]) {
|
if _samenuc(pattern[jmax], sequence[i]) {
|
||||||
match = 0
|
match = 0
|
||||||
@@ -65,6 +113,7 @@ func LocatePattern(pattern, sequence []byte) (int, int, int) {
|
|||||||
|
|
||||||
score := max(diag, up, left)
|
score := max(diag, up, left)
|
||||||
buffer[idx] = score
|
buffer[idx] = score
|
||||||
|
|
||||||
switch {
|
switch {
|
||||||
case score == left:
|
case score == left:
|
||||||
path[idx] = -1
|
path[idx] = -1
|
||||||
@@ -75,11 +124,13 @@ func LocatePattern(pattern, sequence []byte) (int, int, int) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Bactracking of the aligment
|
||||||
|
|
||||||
i := len(sequence) - 1
|
i := len(sequence) - 1
|
||||||
j := jmax
|
j := jmax
|
||||||
end := -1
|
end := -1
|
||||||
lali := 0
|
lali := 0
|
||||||
for i > -1 && j > 0 {
|
for j > 0 { // C'était i > -1 && j > 0
|
||||||
lali++
|
lali++
|
||||||
switch path[buffIndex(i, j, width)] {
|
switch path[buffIndex(i, j, width)] {
|
||||||
case 0:
|
case 0:
|
||||||
@@ -100,5 +151,9 @@ func LocatePattern(pattern, sequence []byte) (int, int, int) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// log.Warnf("from : %d to: %d error: %d match: %v",
|
||||||
|
// i, end+1, -buffer[buffIndex(len(sequence)-1, len(pattern)-1, width)],
|
||||||
|
// string(sequence[i:(end+1)]))
|
||||||
return i, end + 1, -buffer[buffIndex(len(sequence)-1, len(pattern)-1, width)]
|
return i, end + 1, -buffer[buffIndex(len(sequence)-1, len(pattern)-1, width)]
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,6 +1,8 @@
|
|||||||
package obialign
|
package obialign
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"log"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obikmer"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obikmer"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
)
|
)
|
||||||
@@ -313,6 +315,105 @@ func _FillMatrixPeRightAlign(seqA, qualA, seqB, qualB []byte, gap, scale float64
|
|||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Gaps at the beginning and at the end of seqA are free
|
||||||
|
// With seqA spanning over lines and seqB over columns
|
||||||
|
//
|
||||||
|
// SeqA must be the longer sequence. If that constraint is not
|
||||||
|
// respected, the function will panic.
|
||||||
|
//
|
||||||
|
// TO BE FINISHED
|
||||||
|
// - First column gap = 0
|
||||||
|
// - Last column gaps = 0
|
||||||
|
//
|
||||||
|
// Paths are encoded :
|
||||||
|
// - 0 : for diagonal
|
||||||
|
// - -1 : for top
|
||||||
|
// - +1 : for left
|
||||||
|
func _FillMatrixPeCenterAlign(seqA, qualA, seqB, qualB []byte, gap, scale float64,
|
||||||
|
scoreMatrix, pathMatrix *[]int) int {
|
||||||
|
|
||||||
|
la := len(seqA)
|
||||||
|
lb := len(seqB)
|
||||||
|
|
||||||
|
if len(seqA) < len(seqB) {
|
||||||
|
log.Panicf("len(seqA) < len(seqB) : %d < %d", len(seqA), len(seqB))
|
||||||
|
}
|
||||||
|
|
||||||
|
// The actual gap score is the gap score times the mismatch between
|
||||||
|
// two bases with a score of 40
|
||||||
|
gapPenalty := int(scale*gap*float64(_NucScorePartMatchMismatch[40][40]) + 0.5)
|
||||||
|
|
||||||
|
needed := (la + 1) * (lb + 1)
|
||||||
|
|
||||||
|
if needed > cap(*scoreMatrix) {
|
||||||
|
*scoreMatrix = make([]int, needed)
|
||||||
|
}
|
||||||
|
|
||||||
|
if needed > cap(*pathMatrix) {
|
||||||
|
*pathMatrix = make([]int, needed)
|
||||||
|
}
|
||||||
|
|
||||||
|
*scoreMatrix = (*scoreMatrix)[:needed]
|
||||||
|
*pathMatrix = (*pathMatrix)[:needed]
|
||||||
|
|
||||||
|
// Sets the first position of the matrix with 0 score
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, -1, -1, 0, 0)
|
||||||
|
|
||||||
|
// Fills the first column with score 0
|
||||||
|
for i := 0; i < la; i++ {
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, i, -1, 0, -1)
|
||||||
|
}
|
||||||
|
|
||||||
|
// la1 := la - 1 // Except the last line (gaps are free on it)
|
||||||
|
lb1 := lb - 1 // Except the last column (gaps are free on it)
|
||||||
|
|
||||||
|
for j := 0; j < lb1; j++ {
|
||||||
|
|
||||||
|
// Fill the first line with scores corresponding to a set of gaps
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, -1, j, (j+1)*gapPenalty, 1)
|
||||||
|
|
||||||
|
for i := 0; i < la; i++ {
|
||||||
|
left, diag, top := _GetMatrixFrom(scoreMatrix, la, i, j)
|
||||||
|
// log.Infof("LA: i : %d j : %d left : %d diag : %d top : %d\n", i, j, left, diag, top)
|
||||||
|
|
||||||
|
diag += _PairingScorePeAlign(seqA[i], qualA[i], seqB[j], qualB[j], scale)
|
||||||
|
left += gapPenalty
|
||||||
|
top += gapPenalty
|
||||||
|
|
||||||
|
switch {
|
||||||
|
case diag >= left && diag >= top:
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, i, j, diag, 0)
|
||||||
|
case left >= diag && left >= top:
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, i, j, left, +1)
|
||||||
|
default:
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, i, j, top, -1)
|
||||||
|
}
|
||||||
|
// log.Infof("LA: i : %d j : %d left : %d diag : %d top : %d [%d]\n", i, j, left, diag, top, _GetMatrix(scoreMatrix, la, i, j))
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
for i := 0; i < la; i++ {
|
||||||
|
left, diag, top := _GetMatrixFrom(scoreMatrix, la, i, lb1)
|
||||||
|
// log.Infof("LA: i : %d j : %d left : %d diag : %d top : %d\n", i, j, left, diag, top)
|
||||||
|
|
||||||
|
diag += _PairingScorePeAlign(seqA[i], qualA[i], seqB[lb1], qualB[lb1], scale)
|
||||||
|
left += gapPenalty
|
||||||
|
|
||||||
|
switch {
|
||||||
|
case diag >= left && diag >= top:
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, i, lb1, diag, 0)
|
||||||
|
case left >= diag && left >= top:
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, i, lb1, left, +1)
|
||||||
|
default:
|
||||||
|
_SetMatrices(scoreMatrix, pathMatrix, la, i, lb1, top, -1)
|
||||||
|
}
|
||||||
|
// log.Infof("LA: i : %d j : %d left : %d diag : %d top : %d [%d]\n", i, j, left, diag, top, _GetMatrix(scoreMatrix, la, i, j))
|
||||||
|
}
|
||||||
|
|
||||||
|
return _GetMatrix(scoreMatrix, la, la-1, lb1)
|
||||||
|
}
|
||||||
|
|
||||||
func PELeftAlign(seqA, seqB *obiseq.BioSequence, gap, scale float64,
|
func PELeftAlign(seqA, seqB *obiseq.BioSequence, gap, scale float64,
|
||||||
arena PEAlignArena) (int, []int) {
|
arena PEAlignArena) (int, []int) {
|
||||||
|
|
||||||
@@ -359,9 +460,33 @@ func PERightAlign(seqA, seqB *obiseq.BioSequence, gap, scale float64,
|
|||||||
return score, path
|
return score, path
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func PECenterAlign(seqA, seqB *obiseq.BioSequence, gap, scale float64,
|
||||||
|
arena PEAlignArena) (int, []int) {
|
||||||
|
|
||||||
|
if !_InitializedDnaScore {
|
||||||
|
_InitDNAScoreMatrix()
|
||||||
|
}
|
||||||
|
|
||||||
|
if arena.pointer == nil {
|
||||||
|
arena = MakePEAlignArena(seqA.Len(), seqB.Len())
|
||||||
|
}
|
||||||
|
|
||||||
|
score := _FillMatrixPeCenterAlign(seqA.Sequence(), seqA.Qualities(),
|
||||||
|
seqB.Sequence(), seqB.Qualities(), gap, scale,
|
||||||
|
&arena.pointer.scoreMatrix,
|
||||||
|
&arena.pointer.pathMatrix)
|
||||||
|
|
||||||
|
path := _Backtracking(arena.pointer.pathMatrix,
|
||||||
|
seqA.Len(), seqB.Len(),
|
||||||
|
&arena.pointer.path)
|
||||||
|
|
||||||
|
return score, path
|
||||||
|
}
|
||||||
|
|
||||||
func PEAlign(seqA, seqB *obiseq.BioSequence,
|
func PEAlign(seqA, seqB *obiseq.BioSequence,
|
||||||
gap, scale float64, fastAlign bool, delta int, fastScoreRel bool,
|
gap, scale float64, fastAlign bool, delta int, fastScoreRel bool,
|
||||||
arena PEAlignArena, shift_buff *map[int]int) (int, []int, int, int, float64) {
|
arena PEAlignArena, shift_buff *map[int]int) (bool, int, []int, int, int, float64) {
|
||||||
|
var isLeftAlign bool
|
||||||
var score, shift int
|
var score, shift int
|
||||||
var startA, startB int
|
var startA, startB int
|
||||||
var partLen, over int
|
var partLen, over int
|
||||||
@@ -412,6 +537,7 @@ func PEAlign(seqA, seqB *obiseq.BioSequence,
|
|||||||
rawSeqB = seqB.Sequence()[0:partLen]
|
rawSeqB = seqB.Sequence()[0:partLen]
|
||||||
qualSeqB = seqB.Qualities()[0:partLen]
|
qualSeqB = seqB.Qualities()[0:partLen]
|
||||||
extra3 = seqB.Len() - partLen
|
extra3 = seqB.Len() - partLen
|
||||||
|
isLeftAlign = true
|
||||||
score = _FillMatrixPeLeftAlign(
|
score = _FillMatrixPeLeftAlign(
|
||||||
rawSeqA, qualSeqA, rawSeqB, qualSeqB, gap, scale,
|
rawSeqA, qualSeqA, rawSeqB, qualSeqB, gap, scale,
|
||||||
&arena.pointer.scoreMatrix,
|
&arena.pointer.scoreMatrix,
|
||||||
@@ -433,7 +559,7 @@ func PEAlign(seqA, seqB *obiseq.BioSequence,
|
|||||||
rawSeqA = seqA.Sequence()[:partLen]
|
rawSeqA = seqA.Sequence()[:partLen]
|
||||||
qualSeqA = seqA.Qualities()[:partLen]
|
qualSeqA = seqA.Qualities()[:partLen]
|
||||||
extra3 = partLen - seqA.Len()
|
extra3 = partLen - seqA.Len()
|
||||||
|
isLeftAlign = false
|
||||||
score = _FillMatrixPeRightAlign(
|
score = _FillMatrixPeRightAlign(
|
||||||
rawSeqA, qualSeqA, rawSeqB, qualSeqB, gap, scale,
|
rawSeqA, qualSeqA, rawSeqB, qualSeqB, gap, scale,
|
||||||
&arena.pointer.scoreMatrix,
|
&arena.pointer.scoreMatrix,
|
||||||
@@ -457,6 +583,7 @@ func PEAlign(seqA, seqB *obiseq.BioSequence,
|
|||||||
qualSeqB = seqB.Qualities()[0:partLen]
|
qualSeqB = seqB.Qualities()[0:partLen]
|
||||||
extra3 = seqB.Len() - partLen
|
extra3 = seqB.Len() - partLen
|
||||||
score = 0
|
score = 0
|
||||||
|
isLeftAlign = true
|
||||||
} else {
|
} else {
|
||||||
startA = 0
|
startA = 0
|
||||||
startB = -shift
|
startB = -shift
|
||||||
@@ -465,6 +592,7 @@ func PEAlign(seqA, seqB *obiseq.BioSequence,
|
|||||||
partLen = len(qualSeqB)
|
partLen = len(qualSeqB)
|
||||||
extra3 = partLen - seqA.Len()
|
extra3 = partLen - seqA.Len()
|
||||||
qualSeqA = seqA.Qualities()[:partLen]
|
qualSeqA = seqA.Qualities()[:partLen]
|
||||||
|
isLeftAlign = false
|
||||||
}
|
}
|
||||||
score = 0
|
score = 0
|
||||||
for i, qualA := range qualSeqA {
|
for i, qualA := range qualSeqA {
|
||||||
@@ -501,6 +629,8 @@ func PEAlign(seqA, seqB *obiseq.BioSequence,
|
|||||||
len(rawSeqA), len(rawSeqB),
|
len(rawSeqA), len(rawSeqB),
|
||||||
&(arena.pointer.path))
|
&(arena.pointer.path))
|
||||||
|
|
||||||
|
isLeftAlign = false
|
||||||
|
|
||||||
scoreL := _FillMatrixPeLeftAlign(
|
scoreL := _FillMatrixPeLeftAlign(
|
||||||
rawSeqA, qualSeqA, rawSeqB, qualSeqB, gap, scale,
|
rawSeqA, qualSeqA, rawSeqB, qualSeqB, gap, scale,
|
||||||
&arena.pointer.scoreMatrix,
|
&arena.pointer.scoreMatrix,
|
||||||
@@ -510,9 +640,10 @@ func PEAlign(seqA, seqB *obiseq.BioSequence,
|
|||||||
path = _Backtracking(arena.pointer.pathMatrix,
|
path = _Backtracking(arena.pointer.pathMatrix,
|
||||||
len(rawSeqA), len(rawSeqB),
|
len(rawSeqA), len(rawSeqB),
|
||||||
&(arena.pointer.path))
|
&(arena.pointer.path))
|
||||||
|
isLeftAlign = true
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return score, path, fastCount, over, fastScore
|
return isLeftAlign, score, path, fastCount, over, fastScore
|
||||||
}
|
}
|
||||||
|
|||||||
154
pkg/obialign/readalign.go
Normal file
154
pkg/obialign/readalign.go
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
package obialign
|
||||||
|
|
||||||
|
import (
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obikmer"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
)
|
||||||
|
|
||||||
|
func ReadAlign(seqA, seqB *obiseq.BioSequence,
|
||||||
|
gap, scale float64, delta int, fastScoreRel bool,
|
||||||
|
arena PEAlignArena, shift_buff *map[int]int) (int, []int, int, int, float64, bool) {
|
||||||
|
var score, shift int
|
||||||
|
var startA, startB int
|
||||||
|
var partLen, over int
|
||||||
|
var rawSeqA, qualSeqA []byte
|
||||||
|
var rawSeqB, qualSeqB []byte
|
||||||
|
var extra5, extra3 int
|
||||||
|
|
||||||
|
var path []int
|
||||||
|
|
||||||
|
if !_InitializedDnaScore {
|
||||||
|
_InitDNAScoreMatrix()
|
||||||
|
}
|
||||||
|
|
||||||
|
fastCount := -1
|
||||||
|
fastScore := -1.0
|
||||||
|
|
||||||
|
directAlignment := true
|
||||||
|
|
||||||
|
index := obikmer.Index4mer(seqA,
|
||||||
|
&arena.pointer.fastIndex,
|
||||||
|
&arena.pointer.fastBuffer)
|
||||||
|
|
||||||
|
shift, fastCount, fastScore = obikmer.FastShiftFourMer(index, shift_buff, seqA.Len(), seqB, fastScoreRel, nil)
|
||||||
|
|
||||||
|
seqBR := seqB.ReverseComplement(false)
|
||||||
|
shiftR, fastCountR, fastScoreR := obikmer.FastShiftFourMer(index, shift_buff, seqA.Len(), seqBR, fastScoreRel, nil)
|
||||||
|
|
||||||
|
if fastCount < fastCountR {
|
||||||
|
shift = shiftR
|
||||||
|
fastCount = fastCountR
|
||||||
|
fastScore = fastScoreR
|
||||||
|
seqB = seqBR
|
||||||
|
directAlignment = false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Compute the overlapping region length
|
||||||
|
switch {
|
||||||
|
case shift > 0:
|
||||||
|
over = seqA.Len() - shift
|
||||||
|
case shift < 0:
|
||||||
|
over = seqB.Len() + shift
|
||||||
|
default:
|
||||||
|
over = min(seqA.Len(), seqB.Len())
|
||||||
|
}
|
||||||
|
|
||||||
|
// log.Warnf("fw/fw: %v shift=%d fastCount=%d/over=%d fastScore=%f",
|
||||||
|
// directAlignment, shift, fastCount, over, fastScore)
|
||||||
|
|
||||||
|
// log.Warnf(("seqA: %s\nseqB: %s\n"), seqA.String(), seqB.String())
|
||||||
|
|
||||||
|
// At least one mismatch exists in the overlaping region
|
||||||
|
if fastCount+3 < over {
|
||||||
|
|
||||||
|
if shift > 0 || (shift == 0 && seqB.Len() >= seqA.Len()) {
|
||||||
|
startA = shift - delta
|
||||||
|
if startA < 0 {
|
||||||
|
startA = 0
|
||||||
|
}
|
||||||
|
extra5 = -startA
|
||||||
|
startB = 0
|
||||||
|
|
||||||
|
rawSeqA = seqA.Sequence()[startA:]
|
||||||
|
qualSeqA = seqA.Qualities()[startA:]
|
||||||
|
partLen = len(rawSeqA)
|
||||||
|
if partLen > seqB.Len() {
|
||||||
|
partLen = seqB.Len()
|
||||||
|
}
|
||||||
|
rawSeqB = seqB.Sequence()[0:partLen]
|
||||||
|
qualSeqB = seqB.Qualities()[0:partLen]
|
||||||
|
extra3 = seqB.Len() - partLen
|
||||||
|
score = _FillMatrixPeLeftAlign(
|
||||||
|
rawSeqA, qualSeqA, rawSeqB, qualSeqB, gap, scale,
|
||||||
|
&arena.pointer.scoreMatrix,
|
||||||
|
&arena.pointer.pathMatrix)
|
||||||
|
} else {
|
||||||
|
|
||||||
|
startA = 0
|
||||||
|
startB = -shift - delta
|
||||||
|
if startB < 0 {
|
||||||
|
startB = 0
|
||||||
|
}
|
||||||
|
extra5 = startB
|
||||||
|
rawSeqB = seqB.Sequence()[startB:]
|
||||||
|
qualSeqB = seqB.Qualities()[startB:]
|
||||||
|
partLen = len(rawSeqB)
|
||||||
|
if partLen > seqA.Len() {
|
||||||
|
partLen = seqA.Len()
|
||||||
|
}
|
||||||
|
rawSeqA = seqA.Sequence()[:partLen]
|
||||||
|
qualSeqA = seqA.Qualities()[:partLen]
|
||||||
|
extra3 = partLen - seqA.Len()
|
||||||
|
|
||||||
|
score = _FillMatrixPeRightAlign(
|
||||||
|
rawSeqA, qualSeqA, rawSeqB, qualSeqB, gap, scale,
|
||||||
|
&arena.pointer.scoreMatrix,
|
||||||
|
&arena.pointer.pathMatrix)
|
||||||
|
}
|
||||||
|
|
||||||
|
path = _Backtracking(arena.pointer.pathMatrix,
|
||||||
|
len(rawSeqA), len(rawSeqB),
|
||||||
|
&arena.pointer.path)
|
||||||
|
|
||||||
|
} else {
|
||||||
|
|
||||||
|
// Both overlaping regions are identicals
|
||||||
|
|
||||||
|
if shift > 0 || (shift == 0 && seqB.Len() >= seqA.Len()) {
|
||||||
|
startA = shift
|
||||||
|
startB = 0
|
||||||
|
extra5 = -startA
|
||||||
|
qualSeqA = seqA.Qualities()[startA:]
|
||||||
|
partLen = len(qualSeqA)
|
||||||
|
qualSeqB = seqB.Qualities()[0:partLen]
|
||||||
|
extra3 = seqB.Len() - partLen
|
||||||
|
score = 0
|
||||||
|
} else {
|
||||||
|
startA = 0
|
||||||
|
startB = -shift
|
||||||
|
extra5 = startB
|
||||||
|
qualSeqB = seqB.Qualities()[startB:]
|
||||||
|
partLen = len(qualSeqB)
|
||||||
|
extra3 = partLen - seqA.Len()
|
||||||
|
qualSeqA = seqA.Qualities()[:partLen]
|
||||||
|
}
|
||||||
|
|
||||||
|
score = 0
|
||||||
|
for i, qualA := range qualSeqA {
|
||||||
|
qualB := qualSeqB[i]
|
||||||
|
score += _NucScorePartMatchMatch[qualA][qualB]
|
||||||
|
}
|
||||||
|
|
||||||
|
path = arena.pointer.path[:0]
|
||||||
|
path = append(path, 0, partLen)
|
||||||
|
}
|
||||||
|
|
||||||
|
path[0] += extra5
|
||||||
|
if path[len(path)-1] == 0 {
|
||||||
|
path[len(path)-2] += extra3
|
||||||
|
} else {
|
||||||
|
path = append(path, extra3, 0)
|
||||||
|
}
|
||||||
|
|
||||||
|
return score, path, fastCount, over, fastScore, directAlignment
|
||||||
|
}
|
||||||
@@ -137,6 +137,28 @@ char *reverseSequence(char *str,char isPattern)
|
|||||||
return str;
|
return str;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* -------------------------------------------- */
|
||||||
|
/* lowercase sequence */
|
||||||
|
/* -------------------------------------------- */
|
||||||
|
|
||||||
|
#define IS_UPPER(c) (((c) >= 'A') && ((c) <= 'A'))
|
||||||
|
#define TO_LOWER(c) ((c) - 'A' + 'a')
|
||||||
|
|
||||||
|
char *LowerSequence(char *seq)
|
||||||
|
{
|
||||||
|
char *cseq;
|
||||||
|
|
||||||
|
for (cseq = seq ; *cseq ; cseq++)
|
||||||
|
if (IS_UPPER(*cseq))
|
||||||
|
*cseq = TO_LOWER(*cseq);
|
||||||
|
|
||||||
|
return seq;
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef IS_UPPER
|
||||||
|
#undef TO_LOWER
|
||||||
|
|
||||||
|
|
||||||
char *ecoComplementPattern(char *nucAcSeq)
|
char *ecoComplementPattern(char *nucAcSeq)
|
||||||
{
|
{
|
||||||
return reverseSequence(LXBioSeqComplement(nucAcSeq),1);
|
return reverseSequence(LXBioSeqComplement(nucAcSeq),1);
|
||||||
@@ -165,6 +187,7 @@ void UpperSequence(char *seq)
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/* -------------------------------------------- */
|
/* -------------------------------------------- */
|
||||||
/* encode sequence */
|
/* encode sequence */
|
||||||
/* IS_UPPER is slightly faster than isupper */
|
/* IS_UPPER is slightly faster than isupper */
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ import "C"
|
|||||||
import (
|
import (
|
||||||
"errors"
|
"errors"
|
||||||
"runtime"
|
"runtime"
|
||||||
|
"strings"
|
||||||
"unsafe"
|
"unsafe"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
@@ -114,7 +115,7 @@ func (pattern ApatPattern) ReverseComplement() (ApatPattern, error) {
|
|||||||
C.free(unsafe.Pointer(errmsg))
|
C.free(unsafe.Pointer(errmsg))
|
||||||
return ApatPattern{nil}, errors.New(message)
|
return ApatPattern{nil}, errors.New(message)
|
||||||
}
|
}
|
||||||
spat := C.GoString(apc.cpat)
|
spat := strings.ToLower(C.GoString(apc.cpat))
|
||||||
ap := _ApatPattern{apc, spat}
|
ap := _ApatPattern{apc, spat}
|
||||||
|
|
||||||
runtime.SetFinalizer(&ap, func(p *_ApatPattern) {
|
runtime.SetFinalizer(&ap, func(p *_ApatPattern) {
|
||||||
@@ -296,6 +297,24 @@ func (pattern ApatPattern) FindAllIndex(sequence ApatSequence, begin, length int
|
|||||||
return loc
|
return loc
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (pattern ApatPattern) IsMatching(sequence ApatSequence, begin, length int) bool {
|
||||||
|
if begin < 0 {
|
||||||
|
begin = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
if length < 0 {
|
||||||
|
length = sequence.Len()
|
||||||
|
}
|
||||||
|
|
||||||
|
nhits := int(C.ManberAll(sequence.pointer.pointer,
|
||||||
|
pattern.pointer.pointer,
|
||||||
|
0,
|
||||||
|
C.int32_t(begin),
|
||||||
|
C.int32_t(length+C.MAX_PAT_LEN)))
|
||||||
|
|
||||||
|
return nhits > 0
|
||||||
|
}
|
||||||
|
|
||||||
// BestMatch finds the best match of a given pattern in a sequence.
|
// BestMatch finds the best match of a given pattern in a sequence.
|
||||||
//
|
//
|
||||||
// THe function identify the first occurrence of the pattern in the sequence.
|
// THe function identify the first occurrence of the pattern in the sequence.
|
||||||
@@ -335,6 +354,11 @@ func (pattern ApatPattern) BestMatch(sequence ApatSequence, begin, length int) (
|
|||||||
nerr = best[2]
|
nerr = best[2]
|
||||||
end = best[1]
|
end = best[1]
|
||||||
|
|
||||||
|
if best[0] < 0 || best[1] > sequence.Len() {
|
||||||
|
matched = false
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
if nerr == 0 || !pattern.pointer.pointer.hasIndel {
|
if nerr == 0 || !pattern.pointer.pointer.hasIndel {
|
||||||
start = best[0]
|
start = best[0]
|
||||||
log.Debugln("No nws ", start, nerr)
|
log.Debugln("No nws ", start, nerr)
|
||||||
@@ -355,17 +379,31 @@ func (pattern ApatPattern) BestMatch(sequence ApatSequence, begin, length int) (
|
|||||||
best[0], nerr, int(pattern.pointer.pointer.patlen),
|
best[0], nerr, int(pattern.pointer.pointer.patlen),
|
||||||
sequence.Len(), start, end)
|
sequence.Len(), start, end)
|
||||||
|
|
||||||
from, to, score := obialign.LocatePattern((*cpattern)[0:int(pattern.pointer.pointer.patlen)], frg)
|
from, to, score := obialign.LocatePattern(sequence.pointer.reference.Id(),
|
||||||
|
(*cpattern)[0:int(pattern.pointer.pointer.patlen)],
|
||||||
|
frg)
|
||||||
|
|
||||||
// olderr := m[2]
|
// olderr := m[2]
|
||||||
|
|
||||||
nerr = score
|
nerr = score
|
||||||
start = start + from
|
start = start + from
|
||||||
end = start + to
|
end = start + to
|
||||||
log.Debugln("results", score, start, nerr)
|
log.Debugf("BestMatch on %s : score=%d [%d..%d]", sequence.pointer.reference.Id(), score, start, nerr)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// FilterBestMatch filters the best non overlapping matches of a given pattern in a sequence.
|
||||||
|
//
|
||||||
|
// It takes the following parameters:
|
||||||
|
// - pattern: the pattern to search for (ApatPattern).
|
||||||
|
// - sequence: the sequence to search in (ApatSequence).
|
||||||
|
// - begin: the starting index of the search (int).
|
||||||
|
// - length: the length of the search (int).
|
||||||
|
//
|
||||||
|
// It returns a slice of [3]int representing the locations of all non-overlapping matches in the sequence.
|
||||||
|
// The two firsts values of the [3]int indicate respectively the start and the end position of
|
||||||
|
// the match. Following the GO convention the end position is not included in the
|
||||||
|
// match. The third value indicates the number of error detected for this occurrence.
|
||||||
func (pattern ApatPattern) FilterBestMatch(sequence ApatSequence, begin, length int) (loc [][3]int) {
|
func (pattern ApatPattern) FilterBestMatch(sequence ApatSequence, begin, length int) (loc [][3]int) {
|
||||||
res := pattern.FindAllIndex(sequence, begin, length)
|
res := pattern.FindAllIndex(sequence, begin, length)
|
||||||
filtered := make([][3]int, 0, len(res))
|
filtered := make([][3]int, 0, len(res))
|
||||||
@@ -424,13 +462,15 @@ func (pattern ApatPattern) FilterBestMatch(sequence ApatSequence, begin, length
|
|||||||
func (pattern ApatPattern) AllMatches(sequence ApatSequence, begin, length int) (loc [][3]int) {
|
func (pattern ApatPattern) AllMatches(sequence ApatSequence, begin, length int) (loc [][3]int) {
|
||||||
res := pattern.FilterBestMatch(sequence, begin, length)
|
res := pattern.FilterBestMatch(sequence, begin, length)
|
||||||
|
|
||||||
|
j := 0
|
||||||
for _, m := range res {
|
for _, m := range res {
|
||||||
// Recompute the start and end position of the match
|
// Recompute the start and end position of the match
|
||||||
// when the pattern allows for indels
|
// when the pattern allows for indels
|
||||||
if m[2] > 0 && pattern.pointer.pointer.hasIndel {
|
if m[2] > 0 && pattern.pointer.pointer.hasIndel {
|
||||||
start := m[0] - m[2]
|
// log.Warnf("Locating indel on sequence %s[%s]", sequence.pointer.reference.Id(), pattern.String())
|
||||||
|
start := m[0] - m[2]*2
|
||||||
start = max(start, 0)
|
start = max(start, 0)
|
||||||
end := start + int(pattern.pointer.pointer.patlen) + 2*m[2]
|
end := start + int(pattern.pointer.pointer.patlen) + 4*m[2]
|
||||||
end = min(end, sequence.Len())
|
end = min(end, sequence.Len())
|
||||||
// 1 << 30 = 1,073,741,824 = 1Gb
|
// 1 << 30 = 1,073,741,824 = 1Gb
|
||||||
// It's a virtual array mapping the sequence to the pattern
|
// It's a virtual array mapping the sequence to the pattern
|
||||||
@@ -439,18 +479,24 @@ func (pattern ApatPattern) AllMatches(sequence ApatSequence, begin, length int)
|
|||||||
cpattern := (*[1 << 30]byte)(unsafe.Pointer(pattern.pointer.pointer.cpat))
|
cpattern := (*[1 << 30]byte)(unsafe.Pointer(pattern.pointer.pointer.cpat))
|
||||||
frg := sequence.pointer.reference.Sequence()[start:end]
|
frg := sequence.pointer.reference.Sequence()[start:end]
|
||||||
|
|
||||||
begin, end, score := obialign.LocatePattern((*cpattern)[0:int(pattern.pointer.pointer.patlen)], frg)
|
pb, pe, score := obialign.LocatePattern(
|
||||||
|
sequence.pointer.reference.Id(),
|
||||||
|
(*cpattern)[0:int(pattern.pointer.pointer.patlen)],
|
||||||
|
frg)
|
||||||
|
|
||||||
// olderr := m[2]
|
// olderr := m[2]
|
||||||
m[2] = score
|
m[2] = score
|
||||||
m[0] = start + begin
|
m[0] = start + pb
|
||||||
m[1] = start + end
|
m[1] = start + pe
|
||||||
|
|
||||||
// log.Warnf("seq[%d@%d:%d] %d: %s %d - %s:%s:%s", i, m[0], m[1], olderr, sequence.pointer.reference.Id(), score,
|
// log.Warnf("seq[%d@%d:%d] %d: %s %d - %s:%s:%s", i, m[0], m[1], olderr, sequence.pointer.reference.Id(), score,
|
||||||
// frg, (*cpattern)[0:int(pattern.pointer.pointer.patlen)], sequence.pointer.reference.Sequence()[m[0]:m[1]])
|
// frg, (*cpattern)[0:int(pattern.pointer.pointer.patlen)], sequence.pointer.reference.Sequence()[m[0]:m[1]])
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if int(pattern.pointer.pointer.maxerr) >= m[2] {
|
||||||
|
res[j] = m
|
||||||
|
j++
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
return res[0:j]
|
||||||
// log.Debugf("All matches : %v", res)
|
|
||||||
|
|
||||||
return res
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ package obiapat
|
|||||||
import (
|
import (
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
@@ -104,7 +104,7 @@ func MakeOptions(setters []WithOption) Options {
|
|||||||
extension: -1,
|
extension: -1,
|
||||||
fullExtension: false,
|
fullExtension: false,
|
||||||
circular: false,
|
circular: false,
|
||||||
parallelWorkers: obioptions.CLIParallelWorkers(),
|
parallelWorkers: obidefault.ParallelWorkers(),
|
||||||
batchSize: 100,
|
batchSize: 100,
|
||||||
forward: NilApatPattern,
|
forward: NilApatPattern,
|
||||||
cfwd: NilApatPattern,
|
cfwd: NilApatPattern,
|
||||||
@@ -529,7 +529,6 @@ func PCRSliceWorker(options ...WithOption) obiseq.SeqSliceWorker {
|
|||||||
opt := MakeOptions(options)
|
opt := MakeOptions(options)
|
||||||
worker := func(sequences obiseq.BioSequenceSlice) (obiseq.BioSequenceSlice, error) {
|
worker := func(sequences obiseq.BioSequenceSlice) (obiseq.BioSequenceSlice, error) {
|
||||||
result := _PCRSlice(sequences, opt)
|
result := _PCRSlice(sequences, opt)
|
||||||
sequences.Recycle(true)
|
|
||||||
return result, nil
|
return result, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
40
pkg/obiapat/predicat.go
Normal file
40
pkg/obiapat/predicat.go
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
package obiapat
|
||||||
|
|
||||||
|
import (
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
log "github.com/sirupsen/logrus"
|
||||||
|
)
|
||||||
|
|
||||||
|
func IsPatternMatchSequence(pattern string, errormax int, bothStrand, allowIndels bool) obiseq.SequencePredicate {
|
||||||
|
|
||||||
|
pat, err := MakeApatPattern(pattern, errormax, allowIndels)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("error in sequence regular pattern syntax : %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
cpat, err := pat.ReverseComplement()
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("cannot reverse complement the pattern : %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
f := func(sequence *obiseq.BioSequence) bool {
|
||||||
|
aseq, err := MakeApatSequence(sequence, false)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Panicf("Cannot convert sequence %s to apat format", sequence.Id())
|
||||||
|
}
|
||||||
|
|
||||||
|
match := pat.IsMatching(aseq, 0, aseq.Len())
|
||||||
|
|
||||||
|
if !match && bothStrand {
|
||||||
|
|
||||||
|
match = cpat.IsMatching(aseq, 0, aseq.Len())
|
||||||
|
}
|
||||||
|
|
||||||
|
return match
|
||||||
|
}
|
||||||
|
|
||||||
|
return f
|
||||||
|
}
|
||||||
@@ -12,6 +12,13 @@ import (
|
|||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// tempDir creates a temporary directory with a prefix "obiseq_chunks_"
|
||||||
|
// in the system's temporary directory. It returns the path of the
|
||||||
|
// created directory and any error encountered during the creation process.
|
||||||
|
//
|
||||||
|
// If the directory creation is successful, the path to the new
|
||||||
|
// temporary directory is returned. If there is an error, it returns
|
||||||
|
// an empty string and the error encountered.
|
||||||
func tempDir() (string, error) {
|
func tempDir() (string, error) {
|
||||||
dir, err := os.MkdirTemp(os.TempDir(), "obiseq_chunks_")
|
dir, err := os.MkdirTemp(os.TempDir(), "obiseq_chunks_")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -20,6 +27,19 @@ func tempDir() (string, error) {
|
|||||||
return dir, nil
|
return dir, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// find searches for files with a specific extension in the given root directory
|
||||||
|
// and its subdirectories. It returns a slice of strings containing the paths
|
||||||
|
// of the found files.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - root: The root directory to start the search from.
|
||||||
|
// - ext: The file extension to look for (including the leading dot, e.g., ".txt").
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// A slice of strings containing the paths of files that match the specified
|
||||||
|
// extension. If no files are found, an empty slice is returned. Any errors
|
||||||
|
// encountered during the directory traversal will be returned as part of the
|
||||||
|
// WalkDir function's error handling.
|
||||||
func find(root, ext string) []string {
|
func find(root, ext string) []string {
|
||||||
var a []string
|
var a []string
|
||||||
filepath.WalkDir(root, func(s string, d fs.DirEntry, e error) error {
|
filepath.WalkDir(root, func(s string, d fs.DirEntry, e error) error {
|
||||||
@@ -34,6 +54,24 @@ func find(root, ext string) []string {
|
|||||||
return a
|
return a
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ISequenceChunkOnDisk processes a sequence iterator by distributing the sequences
|
||||||
|
// into chunks stored on disk. It uses a classifier to determine how to distribute
|
||||||
|
// the sequences and returns a new iterator for the processed sequences.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - iterator: An iterator of biosequences to be processed.
|
||||||
|
// - classifier: A pointer to a BioSequenceClassifier used to classify the sequences
|
||||||
|
// during distribution.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// An iterator of biosequences representing the processed chunks. If an error occurs
|
||||||
|
// during the creation of the temporary directory or any other operation, it returns
|
||||||
|
// an error along with a nil iterator.
|
||||||
|
//
|
||||||
|
// The function operates asynchronously, creating a temporary directory to store
|
||||||
|
// the sequence chunks. Once the processing is complete, the temporary directory
|
||||||
|
// is removed. The function logs the number of batches created and the processing
|
||||||
|
// status of each batch.
|
||||||
func ISequenceChunkOnDisk(iterator obiiter.IBioSequence,
|
func ISequenceChunkOnDisk(iterator obiiter.IBioSequence,
|
||||||
classifier *obiseq.BioSequenceClassifier) (obiiter.IBioSequence, error) {
|
classifier *obiseq.BioSequenceClassifier) (obiiter.IBioSequence, error) {
|
||||||
dir, err := tempDir()
|
dir, err := tempDir()
|
||||||
@@ -73,11 +111,11 @@ func ISequenceChunkOnDisk(iterator obiiter.IBioSequence,
|
|||||||
panic(err)
|
panic(err)
|
||||||
}
|
}
|
||||||
|
|
||||||
chunck := iseq.Load()
|
source, chunk := iseq.Load()
|
||||||
|
|
||||||
newIter.Push(obiiter.MakeBioSequenceBatch(order, chunck))
|
newIter.Push(obiiter.MakeBioSequenceBatch(source, order, chunk))
|
||||||
log.Infof("Start processing of batch %d/%d : %d sequences",
|
log.Infof("Start processing of batch %d/%d : %d sequences",
|
||||||
order, nbatch, len(chunck))
|
order, nbatch, len(chunk))
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -28,6 +28,7 @@ func ISequenceChunk(iterator obiiter.IBioSequence,
|
|||||||
|
|
||||||
jobDone := sync.WaitGroup{}
|
jobDone := sync.WaitGroup{}
|
||||||
chunks := make(map[int]*obiseq.BioSequenceSlice, 1000)
|
chunks := make(map[int]*obiseq.BioSequenceSlice, 1000)
|
||||||
|
sources := make(map[int]string, 1000)
|
||||||
|
|
||||||
for newflux := range dispatcher.News() {
|
for newflux := range dispatcher.News() {
|
||||||
jobDone.Add(1)
|
jobDone.Add(1)
|
||||||
@@ -43,12 +44,17 @@ func ISequenceChunk(iterator obiiter.IBioSequence,
|
|||||||
chunks[newflux] = chunk
|
chunks[newflux] = chunk
|
||||||
lock.Unlock()
|
lock.Unlock()
|
||||||
|
|
||||||
|
source := ""
|
||||||
for data.Next() {
|
for data.Next() {
|
||||||
b := data.Get()
|
b := data.Get()
|
||||||
|
source = b.Source()
|
||||||
*chunk = append(*chunk, b.Slice()...)
|
*chunk = append(*chunk, b.Slice()...)
|
||||||
b.Recycle(false)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
lock.Lock()
|
||||||
|
sources[newflux] = source
|
||||||
|
lock.Unlock()
|
||||||
|
|
||||||
jobDone.Done()
|
jobDone.Done()
|
||||||
}(newflux)
|
}(newflux)
|
||||||
}
|
}
|
||||||
@@ -56,10 +62,10 @@ func ISequenceChunk(iterator obiiter.IBioSequence,
|
|||||||
jobDone.Wait()
|
jobDone.Wait()
|
||||||
order := 0
|
order := 0
|
||||||
|
|
||||||
for _, chunck := range chunks {
|
for i, chunk := range chunks {
|
||||||
|
|
||||||
if len(*chunck) > 0 {
|
if len(*chunk) > 0 {
|
||||||
newIter.Push(obiiter.MakeBioSequenceBatch(order, *chunck))
|
newIter.Push(obiiter.MakeBioSequenceBatch(sources[i], order, *chunk))
|
||||||
order++
|
order++
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
package obichunk
|
package obichunk
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -29,8 +29,8 @@ func MakeOptions(setters []WithOption) Options {
|
|||||||
navalue: "NA",
|
navalue: "NA",
|
||||||
cacheOnDisk: false,
|
cacheOnDisk: false,
|
||||||
batchCount: 100,
|
batchCount: 100,
|
||||||
batchSize: obioptions.CLIBatchSize(),
|
batchSize: obidefault.BatchSize(),
|
||||||
parallelWorkers: obioptions.CLIParallelWorkers(),
|
parallelWorkers: obidefault.ParallelWorkers(),
|
||||||
noSingleton: false,
|
noSingleton: false,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -6,8 +6,8 @@ import (
|
|||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -62,7 +62,7 @@ func ISequenceSubChunk(iterator obiiter.IBioSequence,
|
|||||||
nworkers int) (obiiter.IBioSequence, error) {
|
nworkers int) (obiiter.IBioSequence, error) {
|
||||||
|
|
||||||
if nworkers <= 0 {
|
if nworkers <= 0 {
|
||||||
nworkers = obioptions.CLIParallelWorkers()
|
nworkers = obidefault.ParallelWorkers()
|
||||||
}
|
}
|
||||||
|
|
||||||
newIter := obiiter.MakeIBioSequence()
|
newIter := obiiter.MakeIBioSequence()
|
||||||
@@ -90,7 +90,7 @@ func ISequenceSubChunk(iterator obiiter.IBioSequence,
|
|||||||
for iterator.Next() {
|
for iterator.Next() {
|
||||||
|
|
||||||
batch := iterator.Get()
|
batch := iterator.Get()
|
||||||
|
source := batch.Source()
|
||||||
if batch.Len() > 1 {
|
if batch.Len() > 1 {
|
||||||
classifier.Reset()
|
classifier.Reset()
|
||||||
|
|
||||||
@@ -107,8 +107,6 @@ func ISequenceSubChunk(iterator obiiter.IBioSequence,
|
|||||||
batch.Slice()[i] = nil
|
batch.Slice()[i] = nil
|
||||||
}
|
}
|
||||||
|
|
||||||
batch.Recycle(false)
|
|
||||||
|
|
||||||
_By(func(p1, p2 *sSS) bool {
|
_By(func(p1, p2 *sSS) bool {
|
||||||
return p1.code < p2.code
|
return p1.code < p2.code
|
||||||
}).Sort(ordered)
|
}).Sort(ordered)
|
||||||
@@ -117,7 +115,7 @@ func ISequenceSubChunk(iterator obiiter.IBioSequence,
|
|||||||
ss := obiseq.MakeBioSequenceSlice()
|
ss := obiseq.MakeBioSequenceSlice()
|
||||||
for i, v := range ordered {
|
for i, v := range ordered {
|
||||||
if v.code != last {
|
if v.code != last {
|
||||||
newIter.Push(obiiter.MakeBioSequenceBatch(nextOrder(), ss))
|
newIter.Push(obiiter.MakeBioSequenceBatch(source, nextOrder(), ss))
|
||||||
ss = obiseq.MakeBioSequenceSlice()
|
ss = obiseq.MakeBioSequenceSlice()
|
||||||
last = v.code
|
last = v.code
|
||||||
}
|
}
|
||||||
@@ -127,7 +125,7 @@ func ISequenceSubChunk(iterator obiiter.IBioSequence,
|
|||||||
}
|
}
|
||||||
|
|
||||||
if len(ss) > 0 {
|
if len(ss) > 0 {
|
||||||
newIter.Push(obiiter.MakeBioSequenceBatch(nextOrder(), ss))
|
newIter.Push(obiiter.MakeBioSequenceBatch(source, nextOrder(), ss))
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
newIter.Push(batch.Reorder(nextOrder()))
|
newIter.Push(batch.Reorder(nextOrder()))
|
||||||
|
|||||||
@@ -95,10 +95,7 @@ func IUniqueSequence(iterator obiiter.IBioSequence,
|
|||||||
|
|
||||||
if icat < 0 || len(batch.Slice()) == 1 {
|
if icat < 0 || len(batch.Slice()) == 1 {
|
||||||
// No more sub classification of sequence or only a single sequence
|
// No more sub classification of sequence or only a single sequence
|
||||||
if opts.NoSingleton() && len(batch.Slice()) == 1 && batch.Slice()[0].Count() == 1 {
|
if !(opts.NoSingleton() && len(batch.Slice()) == 1 && batch.Slice()[0].Count() == 1) {
|
||||||
// We remove singleton from output
|
|
||||||
batch.Recycle(true)
|
|
||||||
} else {
|
|
||||||
iUnique.Push(batch.Reorder(nextOrder()))
|
iUnique.Push(batch.Reorder(nextOrder()))
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
|
|||||||
26
pkg/obidefault/batch.go
Normal file
26
pkg/obidefault/batch.go
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
package obidefault
|
||||||
|
|
||||||
|
var _BatchSize = 2000
|
||||||
|
|
||||||
|
// SetBatchSize sets the size of the sequence batches.
|
||||||
|
//
|
||||||
|
// n - an integer representing the size of the sequence batches.
|
||||||
|
func SetBatchSize(n int) {
|
||||||
|
_BatchSize = n
|
||||||
|
}
|
||||||
|
|
||||||
|
// CLIBatchSize returns the expected size of the sequence batches.
|
||||||
|
//
|
||||||
|
// In Obitools, the sequences are processed in parallel by batches.
|
||||||
|
// The number of sequence in each batch is determined by the command line option
|
||||||
|
// --batch-size and the environment variable OBIBATCHSIZE.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns an integer value.
|
||||||
|
func BatchSize() int {
|
||||||
|
return _BatchSize
|
||||||
|
}
|
||||||
|
|
||||||
|
func BatchSizePtr() *int {
|
||||||
|
return &_BatchSize
|
||||||
|
}
|
||||||
15
pkg/obidefault/compressed.go
Normal file
15
pkg/obidefault/compressed.go
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
package obidefault
|
||||||
|
|
||||||
|
var __compressed__ = false
|
||||||
|
|
||||||
|
func CompressOutput() bool {
|
||||||
|
return __compressed__
|
||||||
|
}
|
||||||
|
|
||||||
|
func SetCompressOutput(b bool) {
|
||||||
|
__compressed__ = b
|
||||||
|
}
|
||||||
|
|
||||||
|
func CompressedPtr() *bool {
|
||||||
|
return &__compressed__
|
||||||
|
}
|
||||||
29
pkg/obidefault/quality.go
Normal file
29
pkg/obidefault/quality.go
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
package obidefault
|
||||||
|
|
||||||
|
var _Quality_Shift_Input = byte(33)
|
||||||
|
var _Quality_Shift_Output = byte(33)
|
||||||
|
var _Read_Qualities = true
|
||||||
|
|
||||||
|
func SetReadQualitiesShift(shift byte) {
|
||||||
|
_Quality_Shift_Input = shift
|
||||||
|
}
|
||||||
|
|
||||||
|
func ReadQualitiesShift() byte {
|
||||||
|
return _Quality_Shift_Input
|
||||||
|
}
|
||||||
|
|
||||||
|
func SetWriteQualitiesShift(shift byte) {
|
||||||
|
_Quality_Shift_Output = shift
|
||||||
|
}
|
||||||
|
|
||||||
|
func WriteQualitiesShift() byte {
|
||||||
|
return _Quality_Shift_Output
|
||||||
|
}
|
||||||
|
|
||||||
|
func SetReadQualities(read bool) {
|
||||||
|
_Read_Qualities = read
|
||||||
|
}
|
||||||
|
|
||||||
|
func ReadQualities() bool {
|
||||||
|
return _Read_Qualities
|
||||||
|
}
|
||||||
32
pkg/obidefault/taxonomy.go
Normal file
32
pkg/obidefault/taxonomy.go
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
package obidefault
|
||||||
|
|
||||||
|
var __taxonomy__ = ""
|
||||||
|
var __alternative_name__ = false
|
||||||
|
|
||||||
|
func SelectedTaxonomy() string {
|
||||||
|
return __taxonomy__
|
||||||
|
}
|
||||||
|
|
||||||
|
func HasSelectedTaxonomy() bool {
|
||||||
|
return __taxonomy__ != ""
|
||||||
|
}
|
||||||
|
|
||||||
|
func AreAlternativeNamesSelected() bool {
|
||||||
|
return __alternative_name__
|
||||||
|
}
|
||||||
|
|
||||||
|
func SelectedTaxonomyPtr() *string {
|
||||||
|
return &__taxonomy__
|
||||||
|
}
|
||||||
|
|
||||||
|
func AlternativeNamesSelectedPtr() *bool {
|
||||||
|
return &__alternative_name__
|
||||||
|
}
|
||||||
|
|
||||||
|
func SetSelectedTaxonomy(taxonomy string) {
|
||||||
|
__taxonomy__ = taxonomy
|
||||||
|
}
|
||||||
|
|
||||||
|
func SetAlternativeNamesSelected(alt bool) {
|
||||||
|
__alternative_name__ = alt
|
||||||
|
}
|
||||||
170
pkg/obidefault/workers.go
Normal file
170
pkg/obidefault/workers.go
Normal file
@@ -0,0 +1,170 @@
|
|||||||
|
package obidefault
|
||||||
|
|
||||||
|
import "runtime"
|
||||||
|
|
||||||
|
var _MaxAllowedCPU = runtime.NumCPU()
|
||||||
|
var _WorkerPerCore = 1.0
|
||||||
|
|
||||||
|
var _ReadWorkerPerCore = 0.25
|
||||||
|
var _WriteWorkerPerCore = 0.25
|
||||||
|
|
||||||
|
var _StrictReadWorker = 0
|
||||||
|
var _StrictWriteWorker = 0
|
||||||
|
|
||||||
|
var _ParallelFilesRead = 0
|
||||||
|
|
||||||
|
// CLIParallelWorkers returns the number of parallel workers used for
|
||||||
|
// computing the result.
|
||||||
|
//
|
||||||
|
// The number of parallel workers is determined by the command line option
|
||||||
|
// --max-cpu|-m and the environment variable OBIMAXCPU. This number is
|
||||||
|
// multiplied by the variable _WorkerPerCore.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns an integer representing the number of parallel workers.
|
||||||
|
func ParallelWorkers() int {
|
||||||
|
return int(float64(MaxCPU()) * float64(WorkerPerCore()))
|
||||||
|
}
|
||||||
|
|
||||||
|
// CLIMaxCPU returns the maximum number of CPU cores allowed.
|
||||||
|
//
|
||||||
|
// The maximum number of CPU cores is determined by the command line option
|
||||||
|
// --max-cpu|-m and the environment variable OBIMAXCPU.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns an integer representing the maximum number of CPU cores allowed.
|
||||||
|
func MaxCPU() int {
|
||||||
|
return _MaxAllowedCPU
|
||||||
|
}
|
||||||
|
|
||||||
|
func MaxCPUPtr() *int {
|
||||||
|
return &_MaxAllowedCPU
|
||||||
|
}
|
||||||
|
|
||||||
|
// WorkerPerCore returns the number of workers per CPU core.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a float64 representing the number of workers per CPU core.
|
||||||
|
func WorkerPerCore() float64 {
|
||||||
|
return _WorkerPerCore
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetWorkerPerCore sets the number of workers per CPU core.
|
||||||
|
//
|
||||||
|
// It takes a float64 parameter representing the number of workers
|
||||||
|
// per CPU core and does not return any value.
|
||||||
|
func SetWorkerPerCore(n float64) {
|
||||||
|
_WorkerPerCore = n
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetMaxCPU sets the maximum number of CPU cores allowed.
|
||||||
|
//
|
||||||
|
// n - an integer representing the new maximum number of CPU cores.
|
||||||
|
func SetMaxCPU(n int) {
|
||||||
|
_MaxAllowedCPU = n
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetReadWorker sets the number of workers for reading files.
|
||||||
|
//
|
||||||
|
// The number of worker dedicated to reading files is determined
|
||||||
|
// as the number of allowed CPU cores multiplied by number of read workers per core.
|
||||||
|
// Setting the number of read workers using this function allows to decouple the number
|
||||||
|
// of read workers from the number of CPU cores.
|
||||||
|
//
|
||||||
|
// n - an integer representing the number of workers to be set.
|
||||||
|
func SetStrictReadWorker(n int) {
|
||||||
|
_StrictReadWorker = n
|
||||||
|
}
|
||||||
|
|
||||||
|
func SetStrictWriteWorker(n int) {
|
||||||
|
_StrictWriteWorker = n
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetReadWorkerPerCore sets the number of worker per CPU
|
||||||
|
// core for reading files.
|
||||||
|
//
|
||||||
|
// n float64
|
||||||
|
func SetReadWorkerPerCore(n float64) {
|
||||||
|
_ReadWorkerPerCore = n
|
||||||
|
}
|
||||||
|
|
||||||
|
func SetWriteWorkerPerCore(n float64) {
|
||||||
|
_WriteWorkerPerCore = n
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReadWorker returns the number of workers for reading files.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns an integer representing the number of workers.
|
||||||
|
func StrictReadWorker() int {
|
||||||
|
return _StrictReadWorker
|
||||||
|
}
|
||||||
|
|
||||||
|
func StrictWriteWorker() int {
|
||||||
|
return _StrictWriteWorker
|
||||||
|
}
|
||||||
|
|
||||||
|
// CLIReadParallelWorkers returns the number of parallel workers used for
|
||||||
|
// reading files.
|
||||||
|
//
|
||||||
|
// The number of parallel workers is determined by the command line option
|
||||||
|
// --max-cpu|-m and the environment variable OBIMAXCPU. This number is
|
||||||
|
// multiplied by the variable _ReadWorkerPerCore.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns an integer representing the number of parallel workers.
|
||||||
|
func ReadParallelWorkers() int {
|
||||||
|
if StrictReadWorker() == 0 {
|
||||||
|
n := int(float64(MaxCPU()) * ReadWorkerPerCore())
|
||||||
|
if n == 0 {
|
||||||
|
n = 1
|
||||||
|
}
|
||||||
|
return n
|
||||||
|
} else {
|
||||||
|
return StrictReadWorker()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func WriteParallelWorkers() int {
|
||||||
|
if StrictWriteWorker() == 0 {
|
||||||
|
n := int(float64(MaxCPU()) * WriteWorkerPerCore())
|
||||||
|
if n == 0 {
|
||||||
|
n = 1
|
||||||
|
}
|
||||||
|
return n
|
||||||
|
} else {
|
||||||
|
return StrictReadWorker()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ReadWorkerPerCore returns the number of worker per CPU core for
|
||||||
|
// computing the result.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a float64 representing the number of worker per CPU core.
|
||||||
|
func ReadWorkerPerCore() float64 {
|
||||||
|
return _ReadWorkerPerCore
|
||||||
|
}
|
||||||
|
|
||||||
|
func WriteWorkerPerCore() float64 {
|
||||||
|
return _ReadWorkerPerCore
|
||||||
|
}
|
||||||
|
|
||||||
|
// ParallelFilesRead returns the number of files to be read in parallel.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns an integer representing the number of files to be read.
|
||||||
|
func ParallelFilesRead() int {
|
||||||
|
if _ParallelFilesRead == 0 {
|
||||||
|
return ReadParallelWorkers()
|
||||||
|
} else {
|
||||||
|
return _ParallelFilesRead
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetParallelFilesRead sets the number of files to be read in parallel.
|
||||||
|
//
|
||||||
|
// n - an integer representing the number of files to be set.
|
||||||
|
func SetParallelFilesRead(n int) {
|
||||||
|
_ParallelFilesRead = n
|
||||||
|
}
|
||||||
@@ -5,10 +5,10 @@ import (
|
|||||||
"io"
|
"io"
|
||||||
"os"
|
"os"
|
||||||
"path"
|
"path"
|
||||||
"unsafe"
|
"strings"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
"github.com/goccy/go-json"
|
"github.com/goccy/go-json"
|
||||||
@@ -94,31 +94,40 @@ func _ParseCsvFile(source string,
|
|||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
err := json.Unmarshal(unsafe.Slice(unsafe.StringData(field), len(field)), &val)
|
ft := header[i]
|
||||||
|
|
||||||
if err != nil {
|
switch {
|
||||||
val = field
|
case ft == "taxid":
|
||||||
} else {
|
sequence.SetTaxid(field)
|
||||||
if _, ok := val.(float64); ok {
|
case strings.HasSuffix(ft, "_taxid"):
|
||||||
if obiutils.IsIntegral(val.(float64)) {
|
sequence.SetTaxid(field, strings.TrimSuffix(ft, "_taxid"))
|
||||||
val = int(val.(float64))
|
default:
|
||||||
|
err := json.Unmarshal(obiutils.UnsafeBytes(field), &val)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
val = field
|
||||||
|
} else {
|
||||||
|
if _, ok := val.(float64); ok {
|
||||||
|
if obiutils.IsIntegral(val.(float64)) {
|
||||||
|
val = int(val.(float64))
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
sequence.SetAttribute(header[i], val)
|
sequence.SetAttribute(ft, val)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
slice = append(slice, sequence)
|
slice = append(slice, sequence)
|
||||||
if len(slice) >= batchSize {
|
if len(slice) >= batchSize {
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(o, slice))
|
out.Push(obiiter.MakeBioSequenceBatch(source, o, slice))
|
||||||
o++
|
o++
|
||||||
slice = obiseq.MakeBioSequenceSlice()
|
slice = obiseq.MakeBioSequenceSlice()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(slice) > 0 {
|
if len(slice) > 0 {
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(o, slice))
|
out.Push(obiiter.MakeBioSequenceBatch(source, o, slice))
|
||||||
}
|
}
|
||||||
|
|
||||||
out.Done()
|
out.Done()
|
||||||
@@ -134,7 +143,7 @@ func ReadCSV(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, err
|
|||||||
go _ParseCsvFile(opt.Source(),
|
go _ParseCsvFile(opt.Source(),
|
||||||
reader,
|
reader,
|
||||||
out,
|
out,
|
||||||
byte(obioptions.InputQualityShift()),
|
obidefault.ReadQualitiesShift(),
|
||||||
opt.BatchSize())
|
opt.BatchSize())
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
@@ -148,9 +157,9 @@ func ReadCSV(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, err
|
|||||||
func ReadCSVFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadCSVFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
|
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
||||||
file, err := Ropen(filename)
|
file, err := obiutils.Ropen(filename)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("file %s is empty", filename)
|
log.Infof("file %s is empty", filename)
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
@@ -164,9 +173,9 @@ func ReadCSVFromFile(filename string, options ...WithOption) (obiiter.IBioSequen
|
|||||||
|
|
||||||
func ReadCSVFromStdin(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadCSVFromStdin(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt("stdin")))
|
options = append(options, OptionsSource(obiutils.RemoveAllExt("stdin")))
|
||||||
input, err := Buf(os.Stdin)
|
input, err := obiutils.Buf(os.Stdin)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("stdin is empty")
|
log.Infof("stdin is empty")
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,22 +1,14 @@
|
|||||||
package obiformats
|
package obiformats
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"bytes"
|
|
||||||
"encoding/csv"
|
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
|
||||||
"os"
|
|
||||||
"sync"
|
|
||||||
"time"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
func CSVRecord(sequence *obiseq.BioSequence, opt Options) []string {
|
func CSVSequenceRecord(sequence *obiseq.BioSequence, opt Options) []string {
|
||||||
keys := opt.CSVKeys()
|
keys := opt.CSVKeys()
|
||||||
record := make([]string, 0, len(keys)+4)
|
record := make([]string, 0, len(keys)+4)
|
||||||
|
|
||||||
@@ -30,14 +22,10 @@ func CSVRecord(sequence *obiseq.BioSequence, opt Options) []string {
|
|||||||
|
|
||||||
if opt.CSVTaxon() {
|
if opt.CSVTaxon() {
|
||||||
taxid := sequence.Taxid()
|
taxid := sequence.Taxid()
|
||||||
sn, ok := sequence.GetAttribute("scientific_name")
|
sn, ok := sequence.GetStringAttribute("scientific_name")
|
||||||
|
|
||||||
if !ok {
|
if !ok {
|
||||||
if taxid == 1 {
|
sn = opt.CSVNAValue()
|
||||||
sn = "root"
|
|
||||||
} else {
|
|
||||||
sn = opt.CSVNAValue()
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
record = append(record, fmt.Sprint(taxid), fmt.Sprint(sn))
|
record = append(record, fmt.Sprint(taxid), fmt.Sprint(sn))
|
||||||
@@ -66,7 +54,7 @@ func CSVRecord(sequence *obiseq.BioSequence, opt Options) []string {
|
|||||||
l := sequence.Len()
|
l := sequence.Len()
|
||||||
q := sequence.Qualities()
|
q := sequence.Qualities()
|
||||||
ascii := make([]byte, l)
|
ascii := make([]byte, l)
|
||||||
quality_shift := obioptions.OutputQualityShift()
|
quality_shift := obidefault.WriteQualitiesShift()
|
||||||
for j := 0; j < l; j++ {
|
for j := 0; j < l; j++ {
|
||||||
ascii[j] = uint8(q[j]) + uint8(quality_shift)
|
ascii[j] = uint8(q[j]) + uint8(quality_shift)
|
||||||
}
|
}
|
||||||
@@ -78,182 +66,3 @@ func CSVRecord(sequence *obiseq.BioSequence, opt Options) []string {
|
|||||||
|
|
||||||
return record
|
return record
|
||||||
}
|
}
|
||||||
|
|
||||||
func CSVHeader(opt Options) []string {
|
|
||||||
keys := opt.CSVKeys()
|
|
||||||
record := make([]string, 0, len(keys)+4)
|
|
||||||
|
|
||||||
if opt.CSVId() {
|
|
||||||
record = append(record, "id")
|
|
||||||
}
|
|
||||||
|
|
||||||
if opt.CSVCount() {
|
|
||||||
record = append(record, "count")
|
|
||||||
}
|
|
||||||
|
|
||||||
if opt.CSVTaxon() {
|
|
||||||
record = append(record, "taxid", "scientific_name")
|
|
||||||
}
|
|
||||||
|
|
||||||
if opt.CSVDefinition() {
|
|
||||||
record = append(record, "definition")
|
|
||||||
}
|
|
||||||
|
|
||||||
record = append(record, opt.CSVKeys()...)
|
|
||||||
|
|
||||||
if opt.CSVSequence() {
|
|
||||||
record = append(record, "sequence")
|
|
||||||
}
|
|
||||||
|
|
||||||
if opt.CSVQuality() {
|
|
||||||
record = append(record, "quality")
|
|
||||||
}
|
|
||||||
|
|
||||||
return record
|
|
||||||
}
|
|
||||||
|
|
||||||
func FormatCVSBatch(batch obiiter.BioSequenceBatch, opt Options) []byte {
|
|
||||||
buff := new(bytes.Buffer)
|
|
||||||
csv := csv.NewWriter(buff)
|
|
||||||
|
|
||||||
if batch.Order() == 0 {
|
|
||||||
csv.Write(CSVHeader(opt))
|
|
||||||
}
|
|
||||||
for _, s := range batch.Slice() {
|
|
||||||
csv.Write(CSVRecord(s, opt))
|
|
||||||
}
|
|
||||||
|
|
||||||
csv.Flush()
|
|
||||||
|
|
||||||
return buff.Bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
func WriteCSV(iterator obiiter.IBioSequence,
|
|
||||||
file io.WriteCloser,
|
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
|
||||||
|
|
||||||
var auto_slot obiutils.Set[string]
|
|
||||||
opt := MakeOptions(options)
|
|
||||||
|
|
||||||
file, _ = obiutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
|
|
||||||
|
|
||||||
newIter := obiiter.MakeIBioSequence()
|
|
||||||
|
|
||||||
nwriters := opt.ParallelWorkers()
|
|
||||||
|
|
||||||
obiiter.RegisterAPipe()
|
|
||||||
chunkchan := make(chan FileChunck)
|
|
||||||
|
|
||||||
newIter.Add(nwriters)
|
|
||||||
var waitWriter sync.WaitGroup
|
|
||||||
|
|
||||||
go func() {
|
|
||||||
newIter.WaitAndClose()
|
|
||||||
for len(chunkchan) > 0 {
|
|
||||||
time.Sleep(time.Millisecond)
|
|
||||||
}
|
|
||||||
close(chunkchan)
|
|
||||||
waitWriter.Wait()
|
|
||||||
}()
|
|
||||||
|
|
||||||
ff := func(iterator obiiter.IBioSequence) {
|
|
||||||
for iterator.Next() {
|
|
||||||
|
|
||||||
batch := iterator.Get()
|
|
||||||
|
|
||||||
chunkchan <- FileChunck{
|
|
||||||
FormatCVSBatch(batch, opt),
|
|
||||||
batch.Order(),
|
|
||||||
}
|
|
||||||
newIter.Push(batch)
|
|
||||||
}
|
|
||||||
newIter.Done()
|
|
||||||
}
|
|
||||||
|
|
||||||
next_to_send := 0
|
|
||||||
received := make(map[int]FileChunck, 100)
|
|
||||||
|
|
||||||
waitWriter.Add(1)
|
|
||||||
go func() {
|
|
||||||
for chunk := range chunkchan {
|
|
||||||
if chunk.order == next_to_send {
|
|
||||||
file.Write(chunk.text)
|
|
||||||
next_to_send++
|
|
||||||
chunk, ok := received[next_to_send]
|
|
||||||
for ok {
|
|
||||||
file.Write(chunk.text)
|
|
||||||
delete(received, next_to_send)
|
|
||||||
next_to_send++
|
|
||||||
chunk, ok = received[next_to_send]
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
received[chunk.order] = chunk
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
file.Close()
|
|
||||||
|
|
||||||
log.Debugln("End of the CSV file writing")
|
|
||||||
obiiter.UnregisterPipe()
|
|
||||||
waitWriter.Done()
|
|
||||||
}()
|
|
||||||
|
|
||||||
if opt.pointer.csv_auto {
|
|
||||||
if iterator.Next() {
|
|
||||||
batch := iterator.Get()
|
|
||||||
auto_slot = batch.Slice().AttributeKeys(true)
|
|
||||||
CSVKeys(auto_slot.Members())(opt)
|
|
||||||
iterator.PushBack()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
log.Debugln("Start of the CSV file writing")
|
|
||||||
go ff(iterator)
|
|
||||||
for i := 0; i < nwriters-1; i++ {
|
|
||||||
go ff(iterator.Split())
|
|
||||||
}
|
|
||||||
|
|
||||||
return newIter, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
func WriteCSVToStdout(iterator obiiter.IBioSequence,
|
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
|
||||||
options = append(options, OptionDontCloseFile())
|
|
||||||
return WriteCSV(iterator, os.Stdout, options...)
|
|
||||||
}
|
|
||||||
|
|
||||||
func WriteCSVToFile(iterator obiiter.IBioSequence,
|
|
||||||
filename string,
|
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
|
||||||
|
|
||||||
opt := MakeOptions(options)
|
|
||||||
flags := os.O_WRONLY | os.O_CREATE
|
|
||||||
|
|
||||||
if opt.AppendFile() {
|
|
||||||
flags |= os.O_APPEND
|
|
||||||
}
|
|
||||||
file, err := os.OpenFile(filename, flags, 0660)
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Fatalf("open file error: %v", err)
|
|
||||||
return obiiter.NilIBioSequence, err
|
|
||||||
}
|
|
||||||
|
|
||||||
options = append(options, OptionCloseFile())
|
|
||||||
|
|
||||||
iterator, err = WriteCSV(iterator, file, options...)
|
|
||||||
|
|
||||||
if opt.HaveToSavePaired() {
|
|
||||||
var revfile *os.File
|
|
||||||
|
|
||||||
revfile, err = os.OpenFile(opt.PairedFileName(), flags, 0660)
|
|
||||||
if err != nil {
|
|
||||||
log.Fatalf("open file error: %v", err)
|
|
||||||
return obiiter.NilIBioSequence, err
|
|
||||||
}
|
|
||||||
iterator, err = WriteCSV(iterator.PairedWith(), revfile, options...)
|
|
||||||
}
|
|
||||||
|
|
||||||
return iterator, err
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -14,10 +14,40 @@ import (
|
|||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// SequenceBatchWriterToFile is a function type that defines a method for writing
|
||||||
|
// a batch of biosequences to a specified file. It takes an iterator of biosequences,
|
||||||
|
// a filename, and optional configuration options, and returns an iterator of biosequences
|
||||||
|
// along with any error encountered during the writing process.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - iterator: An iterator of biosequences to be written to the file.
|
||||||
|
// - filename: The name of the file where the sequences will be written.
|
||||||
|
// - options: Optional configuration options for the writing process.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// An iterator of biosequences that may have been modified during the writing process
|
||||||
|
// and an error if the writing operation fails.
|
||||||
type SequenceBatchWriterToFile func(iterator obiiter.IBioSequence,
|
type SequenceBatchWriterToFile func(iterator obiiter.IBioSequence,
|
||||||
filename string,
|
filename string,
|
||||||
options ...WithOption) (obiiter.IBioSequence, error)
|
options ...WithOption) (obiiter.IBioSequence, error)
|
||||||
|
|
||||||
|
// WriterDispatcher manages the writing of data to files based on a given
|
||||||
|
// prototype name and a dispatcher for distributing the sequences. It
|
||||||
|
// processes incoming data from the dispatcher in separate goroutines,
|
||||||
|
// formatting and writing the data to files as specified.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - prototypename: A string that serves as a template for naming the output files.
|
||||||
|
// - dispatcher: An instance of IDistribute that provides the data to be written
|
||||||
|
// and manages the distribution of sequences.
|
||||||
|
// - formater: A function of type SequenceBatchWriterToFile that formats and writes
|
||||||
|
// the sequences to the specified file.
|
||||||
|
// - options: Optional configuration options for the writing process.
|
||||||
|
//
|
||||||
|
// The function operates asynchronously, launching goroutines for each new data
|
||||||
|
// channel received from the dispatcher. It ensures that directories are created
|
||||||
|
// as needed and handles errors during the writing process. The function blocks
|
||||||
|
// until all writing jobs are completed.
|
||||||
func WriterDispatcher(prototypename string,
|
func WriterDispatcher(prototypename string,
|
||||||
dispatcher obiiter.IDistribute,
|
dispatcher obiiter.IDistribute,
|
||||||
formater SequenceBatchWriterToFile,
|
formater SequenceBatchWriterToFile,
|
||||||
@@ -34,7 +64,7 @@ func WriterDispatcher(prototypename string,
|
|||||||
data, err := dispatcher.Outputs(newflux)
|
data, err := dispatcher.Outputs(newflux)
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
log.Fatalf("Cannot retreive the new chanel : %v", err)
|
log.Fatalf("Cannot retrieve the new channel: %v", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
key := dispatcher.Classifier().Value(newflux)
|
key := dispatcher.Classifier().Value(newflux)
|
||||||
@@ -58,7 +88,7 @@ func WriterDispatcher(prototypename string,
|
|||||||
info, err := os.Stat(directory)
|
info, err := os.Stat(directory)
|
||||||
switch {
|
switch {
|
||||||
case !os.IsNotExist(err) && !info.IsDir():
|
case !os.IsNotExist(err) && !info.IsDir():
|
||||||
log.Fatalf("Cannot Create the directory %s", directory)
|
log.Fatalf("Cannot create the directory %s", directory)
|
||||||
case os.IsNotExist(err):
|
case os.IsNotExist(err):
|
||||||
os.Mkdir(directory, 0755)
|
os.Mkdir(directory, 0755)
|
||||||
}
|
}
|
||||||
@@ -71,7 +101,7 @@ func WriterDispatcher(prototypename string,
|
|||||||
options...)
|
options...)
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
log.Fatalf("cannot open the output file for key %s",
|
log.Fatalf("Cannot open the output file for key %s",
|
||||||
dispatcher.Classifier().Value(newflux))
|
dispatcher.Classifier().Value(newflux))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -122,7 +122,7 @@ func __read_ecopcr_bioseq__(file *__ecopcr_file__) (*obiseq.BioSequence, error)
|
|||||||
return bseq, nil
|
return bseq, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func ReadEcoPCR(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
func ReadEcoPCR(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
tag := make([]byte, 11)
|
tag := make([]byte, 11)
|
||||||
n, _ := reader.Read(tag)
|
n, _ := reader.Read(tag)
|
||||||
|
|
||||||
@@ -187,7 +187,7 @@ func ReadEcoPCR(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
|||||||
slice = append(slice, seq)
|
slice = append(slice, seq)
|
||||||
ii++
|
ii++
|
||||||
if ii >= opt.BatchSize() {
|
if ii >= opt.BatchSize() {
|
||||||
newIter.Push(obiiter.MakeBioSequenceBatch(i, slice))
|
newIter.Push(obiiter.MakeBioSequenceBatch(opt.Source(), i, slice))
|
||||||
slice = obiseq.MakeBioSequenceSlice()
|
slice = obiseq.MakeBioSequenceSlice()
|
||||||
i++
|
i++
|
||||||
ii = 0
|
ii = 0
|
||||||
@@ -198,7 +198,7 @@ func ReadEcoPCR(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
|||||||
}
|
}
|
||||||
|
|
||||||
if len(slice) > 0 {
|
if len(slice) > 0 {
|
||||||
newIter.Push(obiiter.MakeBioSequenceBatch(i, slice))
|
newIter.Push(obiiter.MakeBioSequenceBatch(opt.Source(), i, slice))
|
||||||
}
|
}
|
||||||
|
|
||||||
newIter.Done()
|
newIter.Done()
|
||||||
@@ -213,7 +213,7 @@ func ReadEcoPCR(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
|||||||
newIter = newIter.CompleteFileIterator()
|
newIter = newIter.CompleteFileIterator()
|
||||||
}
|
}
|
||||||
|
|
||||||
return newIter
|
return newIter, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func ReadEcoPCRFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadEcoPCRFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
@@ -235,5 +235,5 @@ func ReadEcoPCRFromFile(filename string, options ...WithOption) (obiiter.IBioSeq
|
|||||||
reader = greader
|
reader = greader
|
||||||
}
|
}
|
||||||
|
|
||||||
return ReadEcoPCR(reader, options...), nil
|
return ReadEcoPCR(reader, options...)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ import (
|
|||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
// _EndOfLastEntry finds the index of the last entry in the given byte slice 'buff'
|
// EndOfLastFlatFileEntry finds the index of the last entry in the given byte slice 'buff'
|
||||||
// using a pattern match of the form:
|
// using a pattern match of the form:
|
||||||
// <CR>?<LF>//<CR>?<LF>
|
// <CR>?<LF>//<CR>?<LF>
|
||||||
// where <CR> and <LF> are the ASCII codes for carriage return and line feed,
|
// where <CR> and <LF> are the ASCII codes for carriage return and line feed,
|
||||||
@@ -27,7 +27,7 @@ import (
|
|||||||
//
|
//
|
||||||
// Returns:
|
// Returns:
|
||||||
// int - the index of the end of the last entry or -1 if no match is found.
|
// int - the index of the end of the last entry or -1 if no match is found.
|
||||||
func _EndOfLastEntry(buff []byte) int {
|
func EndOfLastFlatFileEntry(buff []byte) int {
|
||||||
// 6 5 43 2 1
|
// 6 5 43 2 1
|
||||||
// <CR>?<LF>//<CR>?<LF>
|
// <CR>?<LF>//<CR>?<LF>
|
||||||
var i int
|
var i int
|
||||||
@@ -87,15 +87,9 @@ func _EndOfLastEntry(buff []byte) int {
|
|||||||
return -1
|
return -1
|
||||||
}
|
}
|
||||||
|
|
||||||
func _ParseEmblFile(source string, input ChannelSeqFileChunk,
|
func EmblChunkParser(withFeatureTable bool) func(string, io.Reader) (obiseq.BioSequenceSlice, error) {
|
||||||
out obiiter.IBioSequence,
|
parser := func(source string, input io.Reader) (obiseq.BioSequenceSlice, error) {
|
||||||
withFeatureTable bool,
|
scanner := bufio.NewScanner(input)
|
||||||
batch_size int,
|
|
||||||
total_seq_size int) {
|
|
||||||
|
|
||||||
for chunks := range input {
|
|
||||||
scanner := bufio.NewScanner(chunks.raw)
|
|
||||||
order := chunks.order
|
|
||||||
sequences := make(obiseq.BioSequenceSlice, 0, 100)
|
sequences := make(obiseq.BioSequenceSlice, 0, 100)
|
||||||
id := ""
|
id := ""
|
||||||
scientificName := ""
|
scientificName := ""
|
||||||
@@ -156,7 +150,31 @@ func _ParseEmblFile(source string, input ChannelSeqFileChunk,
|
|||||||
seqBytes = new(bytes.Buffer)
|
seqBytes = new(bytes.Buffer)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(order, sequences))
|
|
||||||
|
return sequences, nil
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
return parser
|
||||||
|
}
|
||||||
|
|
||||||
|
func _ParseEmblFile(
|
||||||
|
input ChannelFileChunk,
|
||||||
|
out obiiter.IBioSequence,
|
||||||
|
withFeatureTable bool,
|
||||||
|
) {
|
||||||
|
|
||||||
|
parser := EmblChunkParser(withFeatureTable)
|
||||||
|
|
||||||
|
for chunks := range input {
|
||||||
|
order := chunks.Order
|
||||||
|
sequences, err := parser(chunks.Source, chunks.Raw)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("%s : Cannot parse the embl file : %v", chunks.Source, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
out.Push(obiiter.MakeBioSequenceBatch(chunks.Source, order, sequences))
|
||||||
}
|
}
|
||||||
|
|
||||||
out.Done()
|
out.Done()
|
||||||
@@ -166,12 +184,18 @@ func _ParseEmblFile(source string, input ChannelSeqFileChunk,
|
|||||||
// 6 5 43 2 1
|
// 6 5 43 2 1
|
||||||
//
|
//
|
||||||
// <CR>?<LF>//<CR>?<LF>
|
// <CR>?<LF>//<CR>?<LF>
|
||||||
func ReadEMBL(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
func ReadEMBL(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
opt := MakeOptions(options)
|
opt := MakeOptions(options)
|
||||||
|
|
||||||
buff := make([]byte, 1024*1024*1024*256)
|
buff := make([]byte, 1024*1024*128) // 128 MB
|
||||||
|
|
||||||
|
entry_channel := ReadFileChunk(
|
||||||
|
opt.Source(),
|
||||||
|
reader,
|
||||||
|
buff,
|
||||||
|
EndOfLastFlatFileEntry,
|
||||||
|
)
|
||||||
|
|
||||||
entry_channel := ReadSeqFileChunk(reader, buff, _EndOfLastEntry)
|
|
||||||
newIter := obiiter.MakeIBioSequence()
|
newIter := obiiter.MakeIBioSequence()
|
||||||
|
|
||||||
nworkers := opt.ParallelWorkers()
|
nworkers := opt.ParallelWorkers()
|
||||||
@@ -179,10 +203,11 @@ func ReadEMBL(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
|||||||
// for j := 0; j < opt.ParallelWorkers(); j++ {
|
// for j := 0; j < opt.ParallelWorkers(); j++ {
|
||||||
for j := 0; j < nworkers; j++ {
|
for j := 0; j < nworkers; j++ {
|
||||||
newIter.Add(1)
|
newIter.Add(1)
|
||||||
go _ParseEmblFile(opt.Source(), entry_channel, newIter,
|
go _ParseEmblFile(
|
||||||
|
entry_channel,
|
||||||
|
newIter,
|
||||||
opt.WithFeatureTable(),
|
opt.WithFeatureTable(),
|
||||||
opt.BatchSize(),
|
)
|
||||||
opt.TotalSeqSize())
|
|
||||||
}
|
}
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
@@ -193,7 +218,7 @@ func ReadEMBL(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
|||||||
newIter = newIter.CompleteFileIterator()
|
newIter = newIter.CompleteFileIterator()
|
||||||
}
|
}
|
||||||
|
|
||||||
return newIter
|
return newIter, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func ReadEMBLFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadEMBLFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
@@ -202,9 +227,9 @@ func ReadEMBLFromFile(filename string, options ...WithOption) (obiiter.IBioSeque
|
|||||||
|
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
||||||
|
|
||||||
reader, err = Ropen(filename)
|
reader, err = obiutils.Ropen(filename)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("file %s is empty", filename)
|
log.Infof("file %s is empty", filename)
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
@@ -214,5 +239,5 @@ func ReadEMBLFromFile(filename string, options ...WithOption) (obiiter.IBioSeque
|
|||||||
return obiiter.NilIBioSequence, err
|
return obiiter.NilIBioSequence, err
|
||||||
}
|
}
|
||||||
|
|
||||||
return ReadEMBL(reader, options...), nil
|
return ReadEMBL(reader, options...)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ import (
|
|||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
)
|
)
|
||||||
|
|
||||||
func _EndOfLastFastaEntry(buffer []byte) int {
|
func EndOfLastFastaEntry(buffer []byte) int {
|
||||||
var i int
|
var i int
|
||||||
|
|
||||||
imax := len(buffer)
|
imax := len(buffer)
|
||||||
@@ -39,24 +39,18 @@ func _EndOfLastFastaEntry(buffer []byte) int {
|
|||||||
return last
|
return last
|
||||||
}
|
}
|
||||||
|
|
||||||
func _ParseFastaFile(source string,
|
func FastaChunkParser() func(string, io.Reader) (obiseq.BioSequenceSlice, error) {
|
||||||
input ChannelSeqFileChunk,
|
|
||||||
out obiiter.IBioSequence,
|
|
||||||
no_order bool,
|
|
||||||
batch_size int,
|
|
||||||
chunck_order func() int,
|
|
||||||
) {
|
|
||||||
|
|
||||||
var identifier string
|
parser := func(source string, input io.Reader) (obiseq.BioSequenceSlice, error) {
|
||||||
var definition string
|
var identifier string
|
||||||
|
var definition string
|
||||||
|
|
||||||
idBytes := bytes.Buffer{}
|
idBytes := bytes.Buffer{}
|
||||||
defBytes := bytes.Buffer{}
|
defBytes := bytes.Buffer{}
|
||||||
seqBytes := bytes.Buffer{}
|
seqBytes := bytes.Buffer{}
|
||||||
|
|
||||||
for chunks := range input {
|
|
||||||
state := 0
|
state := 0
|
||||||
scanner := bufio.NewReader(chunks.raw)
|
scanner := bufio.NewReader(input)
|
||||||
start, _ := scanner.Peek(20)
|
start, _ := scanner.Peek(20)
|
||||||
if start[0] != '>' {
|
if start[0] != '>' {
|
||||||
log.Fatalf("%s : first character is not '>'", string(start))
|
log.Fatalf("%s : first character is not '>'", string(start))
|
||||||
@@ -64,7 +58,8 @@ func _ParseFastaFile(source string,
|
|||||||
if start[1] == ' ' {
|
if start[1] == ' ' {
|
||||||
log.Fatalf("%s :Strange", string(start))
|
log.Fatalf("%s :Strange", string(start))
|
||||||
}
|
}
|
||||||
sequences := make(obiseq.BioSequenceSlice, 0, batch_size)
|
|
||||||
|
sequences := obiseq.MakeBioSequenceSlice(100)[:0]
|
||||||
|
|
||||||
previous := byte(0)
|
previous := byte(0)
|
||||||
|
|
||||||
@@ -160,12 +155,6 @@ func _ParseFastaFile(source string,
|
|||||||
s := obiseq.NewBioSequence(identifier, rawseq, definition)
|
s := obiseq.NewBioSequence(identifier, rawseq, definition)
|
||||||
s.SetSource(source)
|
s.SetSource(source)
|
||||||
sequences = append(sequences, s)
|
sequences = append(sequences, s)
|
||||||
if no_order {
|
|
||||||
if len(sequences) == batch_size {
|
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(chunck_order(), sequences))
|
|
||||||
sequences = make(obiseq.BioSequenceSlice, 0, batch_size)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
state = 1
|
state = 1
|
||||||
} else {
|
} else {
|
||||||
// Error
|
// Error
|
||||||
@@ -209,13 +198,29 @@ func _ParseFastaFile(source string,
|
|||||||
sequences = append(sequences, s)
|
sequences = append(sequences, s)
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(sequences) > 0 {
|
return sequences, nil
|
||||||
co := chunks.order
|
}
|
||||||
if no_order {
|
|
||||||
co = chunck_order()
|
return parser
|
||||||
}
|
}
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(co, sequences))
|
|
||||||
|
func _ParseFastaFile(
|
||||||
|
input ChannelFileChunk,
|
||||||
|
out obiiter.IBioSequence,
|
||||||
|
) {
|
||||||
|
|
||||||
|
parser := FastaChunkParser()
|
||||||
|
|
||||||
|
for chunks := range input {
|
||||||
|
sequences, err := parser(chunks.Source, chunks.Raw)
|
||||||
|
// log.Warnf("Chunck(%d:%d) -%d- ", chunks.Order, l, sequences.Len())
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("File %s : Cannot parse the fasta file : %v", chunks.Source, err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
out.Push(obiiter.MakeBioSequenceBatch(chunks.Source, chunks.Order, sequences))
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
out.Done()
|
out.Done()
|
||||||
@@ -228,26 +233,25 @@ func ReadFasta(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, e
|
|||||||
|
|
||||||
nworker := opt.ParallelWorkers()
|
nworker := opt.ParallelWorkers()
|
||||||
|
|
||||||
buff := make([]byte, 1024*1024*1024)
|
buff := make([]byte, 1024*1024)
|
||||||
|
|
||||||
chkchan := ReadSeqFileChunk(reader, buff, _EndOfLastFastaEntry)
|
chkchan := ReadFileChunk(
|
||||||
chunck_order := obiutils.AtomicCounter()
|
opt.Source(),
|
||||||
|
reader,
|
||||||
|
buff,
|
||||||
|
EndOfLastFastaEntry,
|
||||||
|
)
|
||||||
|
|
||||||
for i := 0; i < nworker; i++ {
|
for i := 0; i < nworker; i++ {
|
||||||
out.Add(1)
|
out.Add(1)
|
||||||
go _ParseFastaFile(opt.Source(),
|
go _ParseFastaFile(chkchan, out)
|
||||||
chkchan,
|
|
||||||
out,
|
|
||||||
opt.NoOrder(),
|
|
||||||
opt.BatchSize(),
|
|
||||||
chunck_order)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
out.WaitAndClose()
|
out.WaitAndClose()
|
||||||
}()
|
}()
|
||||||
|
|
||||||
newIter := out.SortBatches().Rebatch(opt.BatchSize())
|
newIter := out.SortBatches()
|
||||||
|
|
||||||
log.Debugln("Full file batch mode : ", opt.FullFileBatch())
|
log.Debugln("Full file batch mode : ", opt.FullFileBatch())
|
||||||
|
|
||||||
@@ -267,9 +271,9 @@ func ReadFasta(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, e
|
|||||||
func ReadFastaFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadFastaFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
||||||
|
|
||||||
file, err := Ropen(filename)
|
file, err := obiutils.Ropen(filename)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("file %s is empty", filename)
|
log.Infof("file %s is empty", filename)
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
@@ -282,10 +286,10 @@ func ReadFastaFromFile(filename string, options ...WithOption) (obiiter.IBioSequ
|
|||||||
}
|
}
|
||||||
|
|
||||||
func ReadFastaFromStdin(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadFastaFromStdin(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt("stdin")))
|
options = append(options, OptionsSource("stdin"))
|
||||||
input, err := Buf(os.Stdin)
|
input, err := obiutils.Buf(os.Stdin)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("stdin is empty")
|
log.Infof("stdin is empty")
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -7,16 +7,17 @@ import (
|
|||||||
"os"
|
"os"
|
||||||
"path"
|
"path"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
)
|
)
|
||||||
|
|
||||||
func _EndOfLastFastqEntry(buffer []byte) int {
|
func EndOfLastFastqEntry(buffer []byte) int {
|
||||||
var i int
|
var i int
|
||||||
|
|
||||||
|
// log.Warnf("EndOfLastFastqEntry(%d): %s", len(buffer), string(buffer[0:20]))
|
||||||
imax := len(buffer)
|
imax := len(buffer)
|
||||||
state := 0
|
state := 0
|
||||||
restart := imax - 1
|
restart := imax - 1
|
||||||
@@ -32,39 +33,48 @@ func _EndOfLastFastqEntry(buffer []byte) int {
|
|||||||
case 0:
|
case 0:
|
||||||
if C == '+' {
|
if C == '+' {
|
||||||
// Potential start of quality part step 1
|
// Potential start of quality part step 1
|
||||||
|
// log.Warn("Potential start of quality part step 1 - +")
|
||||||
state = 1
|
state = 1
|
||||||
restart = i
|
restart = i
|
||||||
}
|
}
|
||||||
case 1:
|
case 1:
|
||||||
if is_end_of_line {
|
if is_end_of_line {
|
||||||
// Potential start of quality part step 2
|
// Potential start of quality part step 2
|
||||||
|
// log.Warn("Potential start of quality part step 2 - +/end of line")
|
||||||
state = 2
|
state = 2
|
||||||
} else {
|
} else {
|
||||||
// it was not the start of quality part
|
// it was not the start of quality part
|
||||||
|
// log.Warn("it was not the start of quality part")
|
||||||
state = 0
|
state = 0
|
||||||
i = restart
|
i = restart
|
||||||
}
|
}
|
||||||
case 2:
|
case 2:
|
||||||
if is_sep {
|
if is_sep {
|
||||||
// Potential start of quality part step 2 (stay in the same state)
|
// Potential start of quality part step 2 (stay in the same state)
|
||||||
|
// log.Warn("Potential start of quality part step 2 - skipping separator")
|
||||||
state = 2
|
state = 2
|
||||||
} else if (C >= 'a' && C <= 'z') || (C >= 'A' && C <= 'Z') || C == '-' || C == '.' || C == '[' || C == ']' {
|
} else if (C >= 'a' && C <= 'z') || (C >= 'A' && C <= 'Z') || C == '-' || C == '.' || C == '[' || C == ']' {
|
||||||
// End of the sequence
|
// progressing along of the sequence
|
||||||
|
// log.Warn("Detected the end of the sequence switching to state 3")
|
||||||
state = 3
|
state = 3
|
||||||
} else {
|
} else {
|
||||||
// it was not the start of quality part
|
// it was not the start of quality part
|
||||||
|
// log.Warn("it was not the start of quality part because is not preceded by sequence")
|
||||||
state = 0
|
state = 0
|
||||||
i = restart
|
i = restart
|
||||||
}
|
}
|
||||||
case 3:
|
case 3:
|
||||||
if is_end_of_line {
|
if is_end_of_line {
|
||||||
// Entrering in the header line
|
// Entrering in the header line
|
||||||
|
// log.Warn("Potentially entrering in the header line")
|
||||||
state = 4
|
state = 4
|
||||||
} else if (C >= 'a' && C <= 'z') || (C >= 'A' && C <= 'Z') || C == '-' || C == '.' || C == '[' || C == ']' {
|
} else if (C >= 'a' && C <= 'z') || (C >= 'A' && C <= 'Z') || C == '-' || C == '.' || C == '[' || C == ']' {
|
||||||
// progressing along of the sequence
|
// progressing along of the sequence
|
||||||
|
// log.Warn("Progressing along of the sequence")
|
||||||
state = 3
|
state = 3
|
||||||
} else {
|
} else {
|
||||||
// it was not the sequence part
|
// it was not the sequence part
|
||||||
|
// log.Warnf("it was not the sequence part : %c", C)
|
||||||
state = 0
|
state = 0
|
||||||
i = restart
|
i = restart
|
||||||
}
|
}
|
||||||
@@ -72,6 +82,7 @@ func _EndOfLastFastqEntry(buffer []byte) int {
|
|||||||
if is_end_of_line {
|
if is_end_of_line {
|
||||||
state = 4
|
state = 4
|
||||||
} else {
|
} else {
|
||||||
|
|
||||||
state = 5
|
state = 5
|
||||||
}
|
}
|
||||||
case 5:
|
case 5:
|
||||||
@@ -80,15 +91,18 @@ func _EndOfLastFastqEntry(buffer []byte) int {
|
|||||||
state = 0
|
state = 0
|
||||||
i = restart
|
i = restart
|
||||||
} else if C == '@' {
|
} else if C == '@' {
|
||||||
|
// It was the header line
|
||||||
|
// log.Warn("It was the header line")
|
||||||
state = 6
|
state = 6
|
||||||
cut = i
|
cut = i
|
||||||
}
|
}
|
||||||
case 6:
|
case 6:
|
||||||
if is_end_of_line {
|
if is_end_of_line {
|
||||||
|
// log.Warn("====> End of the last sequence")
|
||||||
state = 7
|
state = 7
|
||||||
} else {
|
} else {
|
||||||
state = 0
|
// log.Warnf("%s: Strange it was not the end of the last sequence : %c : %s", string(buffer[0:40]), C, string(buffer[i-20:i+5]))
|
||||||
i = restart
|
state = 5
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -117,27 +131,20 @@ func _storeSequenceQuality(bytes *bytes.Buffer, out *obiseq.BioSequence, quality
|
|||||||
out.SetQualities(q)
|
out.SetQualities(q)
|
||||||
}
|
}
|
||||||
|
|
||||||
func _ParseFastqFile(source string,
|
func FastqChunkParser(quality_shift byte, with_quality bool) func(string, io.Reader) (obiseq.BioSequenceSlice, error) {
|
||||||
input ChannelSeqFileChunk,
|
parser := func(source string, input io.Reader) (obiseq.BioSequenceSlice, error) {
|
||||||
out obiiter.IBioSequence,
|
|
||||||
quality_shift byte,
|
|
||||||
no_order bool,
|
|
||||||
batch_size int,
|
|
||||||
chunck_order func() int,
|
|
||||||
) {
|
|
||||||
|
|
||||||
var identifier string
|
var identifier string
|
||||||
var definition string
|
var definition string
|
||||||
|
|
||||||
idBytes := bytes.Buffer{}
|
idBytes := bytes.Buffer{}
|
||||||
defBytes := bytes.Buffer{}
|
defBytes := bytes.Buffer{}
|
||||||
qualBytes := bytes.Buffer{}
|
qualBytes := bytes.Buffer{}
|
||||||
seqBytes := bytes.Buffer{}
|
seqBytes := bytes.Buffer{}
|
||||||
|
|
||||||
for chunks := range input {
|
|
||||||
state := 0
|
state := 0
|
||||||
scanner := bufio.NewReader(chunks.raw)
|
scanner := bufio.NewReader(input)
|
||||||
sequences := make(obiseq.BioSequenceSlice, 0, 100)
|
sequences := obiseq.MakeBioSequenceSlice(100)[:0]
|
||||||
previous := byte(0)
|
previous := byte(0)
|
||||||
|
|
||||||
for C, err := scanner.ReadByte(); err != io.EOF; C, err = scanner.ReadByte() {
|
for C, err := scanner.ReadByte(); err != io.EOF; C, err = scanner.ReadByte() {
|
||||||
@@ -256,15 +263,9 @@ func _ParseFastqFile(source string,
|
|||||||
}
|
}
|
||||||
case 10:
|
case 10:
|
||||||
if is_end_of_line {
|
if is_end_of_line {
|
||||||
_storeSequenceQuality(&qualBytes, sequences[len(sequences)-1], quality_shift)
|
if with_quality {
|
||||||
|
_storeSequenceQuality(&qualBytes, sequences[len(sequences)-1], quality_shift)
|
||||||
if no_order {
|
|
||||||
if len(sequences) == batch_size {
|
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(chunck_order(), sequences))
|
|
||||||
sequences = make(obiseq.BioSequenceSlice, 0, batch_size)
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
state = 11
|
state = 11
|
||||||
} else {
|
} else {
|
||||||
qualBytes.WriteByte(C)
|
qualBytes.WriteByte(C)
|
||||||
@@ -288,14 +289,32 @@ func _ParseFastqFile(source string,
|
|||||||
_storeSequenceQuality(&qualBytes, sequences[len(sequences)-1], quality_shift)
|
_storeSequenceQuality(&qualBytes, sequences[len(sequences)-1], quality_shift)
|
||||||
state = 1
|
state = 1
|
||||||
}
|
}
|
||||||
|
|
||||||
co := chunks.order
|
|
||||||
if no_order {
|
|
||||||
co = chunck_order()
|
|
||||||
}
|
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(co, sequences))
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
return sequences, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return parser
|
||||||
|
}
|
||||||
|
|
||||||
|
func _ParseFastqFile(
|
||||||
|
input ChannelFileChunk,
|
||||||
|
out obiiter.IBioSequence,
|
||||||
|
quality_shift byte,
|
||||||
|
with_quality bool,
|
||||||
|
) {
|
||||||
|
|
||||||
|
parser := FastqChunkParser(quality_shift, with_quality)
|
||||||
|
|
||||||
|
for chunks := range input {
|
||||||
|
sequences, err := parser(chunks.Source, chunks.Raw)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("File %s : Cannot parse the fastq file : %v", chunks.Source, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
out.Push(obiiter.MakeBioSequenceBatch(chunks.Source, chunks.Order, sequences))
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
out.Done()
|
out.Done()
|
||||||
@@ -307,28 +326,31 @@ func ReadFastq(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, e
|
|||||||
out := obiiter.MakeIBioSequence()
|
out := obiiter.MakeIBioSequence()
|
||||||
|
|
||||||
nworker := opt.ParallelWorkers()
|
nworker := opt.ParallelWorkers()
|
||||||
chunkorder := obiutils.AtomicCounter()
|
|
||||||
|
|
||||||
buff := make([]byte, 1024*1024*1024)
|
buff := make([]byte, 1024*1024)
|
||||||
|
|
||||||
chkchan := ReadSeqFileChunk(reader, buff, _EndOfLastFastqEntry)
|
chkchan := ReadFileChunk(
|
||||||
|
opt.Source(),
|
||||||
|
reader,
|
||||||
|
buff,
|
||||||
|
EndOfLastFastqEntry,
|
||||||
|
)
|
||||||
|
|
||||||
for i := 0; i < nworker; i++ {
|
for i := 0; i < nworker; i++ {
|
||||||
out.Add(1)
|
out.Add(1)
|
||||||
go _ParseFastqFile(opt.Source(),
|
go _ParseFastqFile(
|
||||||
chkchan,
|
chkchan,
|
||||||
out,
|
out,
|
||||||
byte(obioptions.InputQualityShift()),
|
obidefault.ReadQualitiesShift(),
|
||||||
opt.NoOrder(),
|
opt.ReadQualities(),
|
||||||
opt.BatchSize(),
|
)
|
||||||
chunkorder)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
out.WaitAndClose()
|
out.WaitAndClose()
|
||||||
}()
|
}()
|
||||||
|
|
||||||
newIter := out.SortBatches().Rebatch(opt.BatchSize())
|
newIter := out.SortBatches()
|
||||||
|
|
||||||
log.Debugln("Full file batch mode : ", opt.FullFileBatch())
|
log.Debugln("Full file batch mode : ", opt.FullFileBatch())
|
||||||
|
|
||||||
@@ -348,9 +370,9 @@ func ReadFastq(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, e
|
|||||||
func ReadFastqFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadFastqFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
||||||
|
|
||||||
file, err := Ropen(filename)
|
file, err := obiutils.Ropen(filename)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("file %s is empty", filename)
|
log.Infof("file %s is empty", filename)
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
@@ -364,9 +386,9 @@ func ReadFastqFromFile(filename string, options ...WithOption) (obiiter.IBioSequ
|
|||||||
|
|
||||||
func ReadFastqFromStdin(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadFastqFromStdin(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt("stdin")))
|
options = append(options, OptionsSource(obiutils.RemoveAllExt("stdin")))
|
||||||
input, err := Buf(os.Stdin)
|
input, err := obiutils.Buf(os.Stdin)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("stdin is empty")
|
log.Infof("stdin is empty")
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -2,18 +2,208 @@ package obiformats
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"bytes"
|
"bytes"
|
||||||
"math"
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
"unsafe"
|
"unsafe"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitax"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
"github.com/goccy/go-json"
|
"github.com/buger/jsonparser"
|
||||||
)
|
)
|
||||||
|
|
||||||
func _parse_json_header_(header string, annotations obiseq.Annotation) string {
|
func _parse_json_map_string(str []byte, sequence *obiseq.BioSequence) (map[string]string, error) {
|
||||||
|
values := make(map[string]string)
|
||||||
|
jsonparser.ObjectEach(str,
|
||||||
|
func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) (err error) {
|
||||||
|
skey := string(key)
|
||||||
|
values[skey] = string(value)
|
||||||
|
return
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_map_int(str []byte, sequence *obiseq.BioSequence) (map[string]int, error) {
|
||||||
|
values := make(map[string]int)
|
||||||
|
jsonparser.ObjectEach(str,
|
||||||
|
func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) (err error) {
|
||||||
|
skey := string(key)
|
||||||
|
intval, err := jsonparser.ParseInt(value)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
values[skey] = int(intval)
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_map_float(str []byte, sequence *obiseq.BioSequence) (map[string]float64, error) {
|
||||||
|
values := make(map[string]float64)
|
||||||
|
jsonparser.ObjectEach(str,
|
||||||
|
func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) (err error) {
|
||||||
|
skey := string(key)
|
||||||
|
floatval, err := strconv.ParseFloat(obiutils.UnsafeString(value), 64)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
values[skey] = float64(floatval)
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_map_bool(str []byte, sequence *obiseq.BioSequence) (map[string]bool, error) {
|
||||||
|
values := make(map[string]bool)
|
||||||
|
jsonparser.ObjectEach(str,
|
||||||
|
func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) (err error) {
|
||||||
|
skey := string(key)
|
||||||
|
boolval, err := jsonparser.ParseBoolean(value)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
values[skey] = boolval
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_map_interface(str []byte, sequence *obiseq.BioSequence) (map[string]interface{}, error) {
|
||||||
|
values := make(map[string]interface{})
|
||||||
|
jsonparser.ObjectEach(str,
|
||||||
|
func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) (err error) {
|
||||||
|
skey := string(key)
|
||||||
|
switch dataType {
|
||||||
|
case jsonparser.String:
|
||||||
|
values[skey] = string(value)
|
||||||
|
case jsonparser.Number:
|
||||||
|
// Try to parse the number as an int at first then as float if that fails.
|
||||||
|
values[skey], err = jsonparser.ParseInt(value)
|
||||||
|
if err != nil {
|
||||||
|
values[skey], err = strconv.ParseFloat(obiutils.UnsafeString(value), 64)
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
case jsonparser.Boolean:
|
||||||
|
default:
|
||||||
|
values[skey] = string(value)
|
||||||
|
}
|
||||||
|
return
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_array_string(str []byte, sequence *obiseq.BioSequence) ([]string, error) {
|
||||||
|
values := make([]string, 0)
|
||||||
|
jsonparser.ArrayEach(str,
|
||||||
|
func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
|
||||||
|
if dataType == jsonparser.String {
|
||||||
|
skey := string(value)
|
||||||
|
values = append(values, skey)
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_array_int(str []byte, sequence *obiseq.BioSequence) ([]int, error) {
|
||||||
|
values := make([]int, 0)
|
||||||
|
jsonparser.ArrayEach(str,
|
||||||
|
func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
|
||||||
|
if dataType == jsonparser.Number {
|
||||||
|
intval, err := jsonparser.ParseInt(value)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("%s: Parsing int failed on value %s: %s", sequence.Id(), value, err)
|
||||||
|
}
|
||||||
|
values = append(values, int(intval))
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_array_float(str []byte, sequence *obiseq.BioSequence) ([]float64, error) {
|
||||||
|
values := make([]float64, 0)
|
||||||
|
jsonparser.ArrayEach(str,
|
||||||
|
func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
|
||||||
|
if dataType == jsonparser.Number {
|
||||||
|
floatval, err := strconv.ParseFloat(obiutils.UnsafeString(value), 64)
|
||||||
|
if err == nil {
|
||||||
|
values = append(values, float64(floatval))
|
||||||
|
} else {
|
||||||
|
log.Fatalf("%s: Parsing float failed on value %s: %s", sequence.Id(), value, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_array_bool(str []byte, sequence *obiseq.BioSequence) ([]bool, error) {
|
||||||
|
values := make([]bool, 0)
|
||||||
|
jsonparser.ArrayEach(str,
|
||||||
|
func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
|
||||||
|
if dataType == jsonparser.Boolean {
|
||||||
|
boolval, err := jsonparser.ParseBoolean(value)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("%s: Parsing bool failed on value %s: %s", sequence.Id(), value, err)
|
||||||
|
}
|
||||||
|
values = append(values, boolval)
|
||||||
|
}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_array_interface(str []byte, sequence *obiseq.BioSequence) ([]interface{}, error) {
|
||||||
|
values := make([]interface{}, 0)
|
||||||
|
jsonparser.ArrayEach(str,
|
||||||
|
func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
|
||||||
|
switch dataType {
|
||||||
|
case jsonparser.String:
|
||||||
|
values = append(values, string(value))
|
||||||
|
case jsonparser.Number:
|
||||||
|
// Try to parse the number as an int at first then as float if that fails.
|
||||||
|
intval, err := jsonparser.ParseInt(value)
|
||||||
|
if err != nil {
|
||||||
|
floatval, err := strconv.ParseFloat(obiutils.UnsafeString(value), 64)
|
||||||
|
if err != nil {
|
||||||
|
values = append(values, string(value))
|
||||||
|
} else {
|
||||||
|
values = append(values, floatval)
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
values = append(values, intval)
|
||||||
|
}
|
||||||
|
case jsonparser.Boolean:
|
||||||
|
boolval, err := jsonparser.ParseBoolean(value)
|
||||||
|
if err != nil {
|
||||||
|
values = append(values, string(value))
|
||||||
|
} else {
|
||||||
|
values = append(values, boolval)
|
||||||
|
}
|
||||||
|
|
||||||
|
default:
|
||||||
|
values = append(values, string(value))
|
||||||
|
}
|
||||||
|
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return values, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func _parse_json_header_(header string, sequence *obiseq.BioSequence) string {
|
||||||
|
taxonomy := obitax.DefaultTaxonomy()
|
||||||
|
|
||||||
|
annotations := sequence.Annotations()
|
||||||
start := -1
|
start := -1
|
||||||
stop := -1
|
stop := -1
|
||||||
level := 0
|
level := 0
|
||||||
@@ -51,23 +241,136 @@ func _parse_json_header_(header string, annotations obiseq.Annotation) string {
|
|||||||
|
|
||||||
stop++
|
stop++
|
||||||
|
|
||||||
err := json.Unmarshal([]byte(header)[start:stop], &annotations)
|
jsonparser.ObjectEach(obiutils.UnsafeBytes(header[start:stop]),
|
||||||
|
func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) error {
|
||||||
|
var err error
|
||||||
|
|
||||||
for k, v := range annotations {
|
skey := obiutils.UnsafeString(key)
|
||||||
switch vt := v.(type) {
|
|
||||||
case float64:
|
|
||||||
if vt == math.Floor(vt) {
|
|
||||||
annotations[k] = int(vt)
|
|
||||||
}
|
|
||||||
{
|
|
||||||
annotations[k] = vt
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if err != nil {
|
switch {
|
||||||
log.Fatalf("annotation parsing error on %s : %v\n", header, err)
|
case skey == "id":
|
||||||
}
|
sequence.SetId(string(value))
|
||||||
|
case skey == "definition":
|
||||||
|
sequence.SetDefinition(string(value))
|
||||||
|
|
||||||
|
case skey == "count":
|
||||||
|
if dataType != jsonparser.Number {
|
||||||
|
log.Fatalf("%s: Count attribut must be numeric: %s", sequence.Id(), string(value))
|
||||||
|
}
|
||||||
|
count, err := jsonparser.ParseInt(value)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("%s: Cannot parse count %s", sequence.Id(), string(value))
|
||||||
|
}
|
||||||
|
sequence.SetCount(int(count))
|
||||||
|
|
||||||
|
case skey == "obiclean_weight":
|
||||||
|
weight, err := _parse_json_map_int(value, sequence)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("%s: Cannot parse obiclean weight %s", sequence.Id(), string(value))
|
||||||
|
}
|
||||||
|
annotations[skey] = weight
|
||||||
|
|
||||||
|
case skey == "obiclean_status":
|
||||||
|
status, err := _parse_json_map_string(value, sequence)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("%s: Cannot parse obiclean status %s", sequence.Id(), string(value))
|
||||||
|
}
|
||||||
|
annotations[skey] = status
|
||||||
|
|
||||||
|
case strings.HasPrefix(skey, "merged_"):
|
||||||
|
if dataType == jsonparser.Object {
|
||||||
|
data, err := _parse_json_map_int(value, sequence)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("%s: Cannot parse merged slot %s: %v", sequence.Id(), skey, err)
|
||||||
|
} else {
|
||||||
|
annotations[skey] = data
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
log.Fatalf("%s: Cannot parse merged slot %s", sequence.Id(), skey)
|
||||||
|
}
|
||||||
|
|
||||||
|
case skey == "taxid":
|
||||||
|
if dataType == jsonparser.Number || dataType == jsonparser.String {
|
||||||
|
taxid := obiutils.UnsafeString(value)
|
||||||
|
taxon := taxonomy.Taxon(taxid)
|
||||||
|
if taxon != nil {
|
||||||
|
sequence.SetTaxon(taxon)
|
||||||
|
} else {
|
||||||
|
sequence.SetTaxid(string(value))
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
log.Fatalf("%s: Cannot parse taxid %s", sequence.Id(), string(value))
|
||||||
|
}
|
||||||
|
|
||||||
|
case strings.HasSuffix(skey, "_taxid"):
|
||||||
|
if dataType == jsonparser.Number || dataType == jsonparser.String {
|
||||||
|
rank, _ := obiutils.SplitInTwo(skey, '_')
|
||||||
|
|
||||||
|
taxid := obiutils.UnsafeString(value)
|
||||||
|
taxon := taxonomy.Taxon(taxid)
|
||||||
|
|
||||||
|
if taxon != nil {
|
||||||
|
taxid = taxon.String()
|
||||||
|
} else {
|
||||||
|
taxid = string(value)
|
||||||
|
}
|
||||||
|
|
||||||
|
sequence.SetTaxid(taxid, rank)
|
||||||
|
} else {
|
||||||
|
log.Fatalf("%s: Cannot parse taxid %s", sequence.Id(), string(value))
|
||||||
|
}
|
||||||
|
|
||||||
|
default:
|
||||||
|
skey = strings.Clone(skey)
|
||||||
|
switch dataType {
|
||||||
|
case jsonparser.String:
|
||||||
|
annotations[skey] = string(value)
|
||||||
|
case jsonparser.Number:
|
||||||
|
// Try to parse the number as an int at first then as float if that fails.
|
||||||
|
annotations[skey], err = jsonparser.ParseInt(value)
|
||||||
|
if err != nil {
|
||||||
|
annotations[skey], err = strconv.ParseFloat(obiutils.UnsafeString(value), 64)
|
||||||
|
}
|
||||||
|
case jsonparser.Array:
|
||||||
|
annotations[skey], err = _parse_json_array_interface(value, sequence)
|
||||||
|
case jsonparser.Object:
|
||||||
|
annotations[skey], err = _parse_json_map_interface(value, sequence)
|
||||||
|
case jsonparser.Boolean:
|
||||||
|
annotations[skey], err = jsonparser.ParseBoolean(value)
|
||||||
|
case jsonparser.Null:
|
||||||
|
annotations[skey] = nil
|
||||||
|
default:
|
||||||
|
log.Fatalf("Unknown data type %v", dataType)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
annotations[skey] = "NaN"
|
||||||
|
log.Fatalf("%s: Cannot parse value %s assicated to key %s into a %s value",
|
||||||
|
sequence.Id(), string(value), skey, dataType.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
return err
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
// err := json.Unmarshal([]byte(header)[start:stop], &annotations)
|
||||||
|
|
||||||
|
// for k, v := range annotations {
|
||||||
|
// switch vt := v.(type) {
|
||||||
|
// case float64:
|
||||||
|
// if vt == math.Floor(vt) {
|
||||||
|
// annotations[k] = int(vt)
|
||||||
|
// }
|
||||||
|
// {
|
||||||
|
// annotations[k] = vt
|
||||||
|
// }
|
||||||
|
// }
|
||||||
|
// }
|
||||||
|
|
||||||
|
// if err != nil {
|
||||||
|
// log.Fatalf("annotation parsing error on %s : %v\n", header, err)
|
||||||
|
// }
|
||||||
|
|
||||||
return strings.TrimSpace(header[stop:])
|
return strings.TrimSpace(header[stop:])
|
||||||
}
|
}
|
||||||
@@ -78,7 +381,9 @@ func ParseFastSeqJsonHeader(sequence *obiseq.BioSequence) {
|
|||||||
|
|
||||||
definition_part := _parse_json_header_(
|
definition_part := _parse_json_header_(
|
||||||
definition,
|
definition,
|
||||||
sequence.Annotations())
|
sequence,
|
||||||
|
)
|
||||||
|
|
||||||
if len(definition_part) > 0 {
|
if len(definition_part) > 0 {
|
||||||
if sequence.HasDefinition() {
|
if sequence.HasDefinition() {
|
||||||
definition_part = sequence.Definition() + " " + definition_part
|
definition_part = sequence.Definition() + " " + definition_part
|
||||||
|
|||||||
@@ -14,8 +14,8 @@ import (
|
|||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
@@ -69,7 +69,7 @@ func _FastseqReader(source string,
|
|||||||
slice = append(slice, rep)
|
slice = append(slice, rep)
|
||||||
ii++
|
ii++
|
||||||
if ii >= batch_size {
|
if ii >= batch_size {
|
||||||
iterator.Push(obiiter.MakeBioSequenceBatch(i, slice))
|
iterator.Push(obiiter.MakeBioSequenceBatch(source, i, slice))
|
||||||
slice = obiseq.MakeBioSequenceSlice()
|
slice = obiseq.MakeBioSequenceSlice()
|
||||||
i++
|
i++
|
||||||
ii = 0
|
ii = 0
|
||||||
@@ -77,7 +77,7 @@ func _FastseqReader(source string,
|
|||||||
|
|
||||||
}
|
}
|
||||||
if len(slice) > 0 {
|
if len(slice) > 0 {
|
||||||
iterator.Push(obiiter.MakeBioSequenceBatch(i, slice))
|
iterator.Push(obiiter.MakeBioSequenceBatch(source, i, slice))
|
||||||
}
|
}
|
||||||
iterator.Done()
|
iterator.Done()
|
||||||
|
|
||||||
@@ -92,7 +92,7 @@ func ReadFastSeqFromFile(filename string, options ...WithOption) (obiiter.IBioSe
|
|||||||
name := C.CString(filename)
|
name := C.CString(filename)
|
||||||
defer C.free(unsafe.Pointer(name))
|
defer C.free(unsafe.Pointer(name))
|
||||||
|
|
||||||
pointer := C.open_fast_sek_file(name, C.int32_t(obioptions.InputQualityShift()))
|
pointer := C.open_fast_sek_file(name, C.int32_t(obidefault.ReadQualitiesShift()))
|
||||||
|
|
||||||
var err error
|
var err error
|
||||||
err = nil
|
err = nil
|
||||||
@@ -151,7 +151,7 @@ func ReadFastSeqFromStdin(options ...WithOption) obiiter.IBioSequence {
|
|||||||
}(newIter)
|
}(newIter)
|
||||||
|
|
||||||
go _FastseqReader(opt.Source(),
|
go _FastseqReader(opt.Source(),
|
||||||
C.open_fast_sek_stdin(C.int32_t(obioptions.InputQualityShift())),
|
C.open_fast_sek_stdin(C.int32_t(obidefault.ReadQualitiesShift())),
|
||||||
newIter, opt.BatchSize())
|
newIter, opt.BatchSize())
|
||||||
|
|
||||||
log.Debugln("Full file batch mode : ", opt.FullFileBatch())
|
log.Debugln("Full file batch mode : ", opt.FullFileBatch())
|
||||||
|
|||||||
@@ -7,7 +7,6 @@ import (
|
|||||||
"io"
|
"io"
|
||||||
"os"
|
"os"
|
||||||
"strings"
|
"strings"
|
||||||
"sync"
|
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
@@ -76,7 +75,7 @@ func FormatFasta(seq *obiseq.BioSequence, formater FormatHeader) string {
|
|||||||
// - skipEmpty: a boolean indicating whether empty sequences should be skipped or not.
|
// - skipEmpty: a boolean indicating whether empty sequences should be skipped or not.
|
||||||
//
|
//
|
||||||
// It returns a byte array containing the formatted sequences.
|
// It returns a byte array containing the formatted sequences.
|
||||||
func FormatFastaBatch(batch obiiter.BioSequenceBatch, formater FormatHeader, skipEmpty bool) []byte {
|
func FormatFastaBatch(batch obiiter.BioSequenceBatch, formater FormatHeader, skipEmpty bool) *bytes.Buffer {
|
||||||
// Create a buffer to store the formatted sequences
|
// Create a buffer to store the formatted sequences
|
||||||
var bs bytes.Buffer
|
var bs bytes.Buffer
|
||||||
|
|
||||||
@@ -116,7 +115,7 @@ func FormatFastaBatch(batch obiiter.BioSequenceBatch, formater FormatHeader, ski
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Return the byte array representation of the buffer
|
// Return the byte array representation of the buffer
|
||||||
return bs.Bytes()
|
return &bs
|
||||||
}
|
}
|
||||||
|
|
||||||
// WriteFasta writes a given iterator of bio sequences to a file in FASTA format.
|
// WriteFasta writes a given iterator of bio sequences to a file in FASTA format.
|
||||||
@@ -128,20 +127,17 @@ func WriteFasta(iterator obiiter.IBioSequence,
|
|||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
opt := MakeOptions(options)
|
opt := MakeOptions(options)
|
||||||
|
|
||||||
iterator = iterator.Rebatch(opt.BatchSize())
|
|
||||||
file, _ = obiutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
|
file, _ = obiutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
|
||||||
|
|
||||||
newIter := obiiter.MakeIBioSequence()
|
newIter := obiiter.MakeIBioSequence()
|
||||||
|
|
||||||
nwriters := opt.ParallelWorkers()
|
nwriters := opt.ParallelWorkers()
|
||||||
|
|
||||||
obiiter.RegisterAPipe()
|
chunkchan := WriteFileChunk(file, opt.CloseFile())
|
||||||
chunkchan := make(chan FileChunck)
|
|
||||||
|
|
||||||
header_format := opt.FormatFastSeqHeader()
|
header_format := opt.FormatFastSeqHeader()
|
||||||
|
|
||||||
newIter.Add(nwriters)
|
newIter.Add(nwriters)
|
||||||
var waitWriter sync.WaitGroup
|
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
newIter.WaitAndClose()
|
newIter.WaitAndClose()
|
||||||
@@ -149,7 +145,7 @@ func WriteFasta(iterator obiiter.IBioSequence,
|
|||||||
time.Sleep(time.Millisecond)
|
time.Sleep(time.Millisecond)
|
||||||
}
|
}
|
||||||
close(chunkchan)
|
close(chunkchan)
|
||||||
waitWriter.Wait()
|
log.Debugf("Writing fasta file done")
|
||||||
}()
|
}()
|
||||||
|
|
||||||
ff := func(iterator obiiter.IBioSequence) {
|
ff := func(iterator obiiter.IBioSequence) {
|
||||||
@@ -159,10 +155,12 @@ func WriteFasta(iterator obiiter.IBioSequence,
|
|||||||
|
|
||||||
log.Debugf("Formating fasta chunk %d", batch.Order())
|
log.Debugf("Formating fasta chunk %d", batch.Order())
|
||||||
|
|
||||||
chunkchan <- FileChunck{
|
chunkchan <- FileChunk{
|
||||||
FormatFastaBatch(batch, header_format, opt.SkipEmptySequence()),
|
Source: batch.Source(),
|
||||||
batch.Order(),
|
Raw: FormatFastaBatch(batch, header_format, opt.SkipEmptySequence()),
|
||||||
|
Order: batch.Order(),
|
||||||
}
|
}
|
||||||
|
|
||||||
log.Debugf("Fasta chunk %d formated", batch.Order())
|
log.Debugf("Fasta chunk %d formated", batch.Order())
|
||||||
|
|
||||||
newIter.Push(batch)
|
newIter.Push(batch)
|
||||||
@@ -172,43 +170,10 @@ func WriteFasta(iterator obiiter.IBioSequence,
|
|||||||
|
|
||||||
log.Debugln("Start of the fasta file writing")
|
log.Debugln("Start of the fasta file writing")
|
||||||
go ff(iterator)
|
go ff(iterator)
|
||||||
for i := 0; i < nwriters-1; i++ {
|
for i := 1; i < nwriters; i++ {
|
||||||
go ff(iterator.Split())
|
go ff(iterator.Split())
|
||||||
}
|
}
|
||||||
|
|
||||||
next_to_send := 0
|
|
||||||
received := make(map[int]FileChunck, 100)
|
|
||||||
|
|
||||||
waitWriter.Add(1)
|
|
||||||
go func() {
|
|
||||||
for chunk := range chunkchan {
|
|
||||||
if chunk.order == next_to_send {
|
|
||||||
file.Write(chunk.text)
|
|
||||||
log.Debugf("Fasta chunk %d written", chunk.order)
|
|
||||||
next_to_send++
|
|
||||||
chunk, ok := received[next_to_send]
|
|
||||||
for ok {
|
|
||||||
file.Write(chunk.text)
|
|
||||||
log.Debugf("Fasta chunk %d written", chunk.order)
|
|
||||||
delete(received, next_to_send)
|
|
||||||
next_to_send++
|
|
||||||
chunk, ok = received[next_to_send]
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
log.Debugf("Store Fasta chunk %d", chunk.order)
|
|
||||||
received[chunk.order] = chunk
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
file.Close()
|
|
||||||
|
|
||||||
log.Debugln("End of the fasta file writing")
|
|
||||||
obiiter.UnregisterPipe()
|
|
||||||
waitWriter.Done()
|
|
||||||
|
|
||||||
}()
|
|
||||||
|
|
||||||
return newIter, nil
|
return newIter, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -222,7 +187,8 @@ func WriteFasta(iterator obiiter.IBioSequence,
|
|||||||
// The function returns the same bio sequence iterator and an error if any occurred.
|
// The function returns the same bio sequence iterator and an error if any occurred.
|
||||||
func WriteFastaToStdout(iterator obiiter.IBioSequence,
|
func WriteFastaToStdout(iterator obiiter.IBioSequence,
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionDontCloseFile())
|
// options = append(options, OptionDontCloseFile())
|
||||||
|
options = append(options, OptionCloseFile())
|
||||||
return WriteFasta(iterator, os.Stdout, options...)
|
return WriteFasta(iterator, os.Stdout, options...)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -4,7 +4,6 @@ import (
|
|||||||
"bytes"
|
"bytes"
|
||||||
"io"
|
"io"
|
||||||
"os"
|
"os"
|
||||||
"sync"
|
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
@@ -14,6 +13,8 @@ import (
|
|||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
type FormatSeqBatch func(batch obiiter.BioSequenceBatch, formater FormatHeader, skipEmpty bool) *bytes.Buffer
|
||||||
|
|
||||||
func _formatFastq(buff *bytes.Buffer, seq *obiseq.BioSequence, formater FormatHeader) {
|
func _formatFastq(buff *bytes.Buffer, seq *obiseq.BioSequence, formater FormatHeader) {
|
||||||
|
|
||||||
info := ""
|
info := ""
|
||||||
@@ -49,7 +50,7 @@ func FormatFastq(seq *obiseq.BioSequence, formater FormatHeader) string {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func FormatFastqBatch(batch obiiter.BioSequenceBatch,
|
func FormatFastqBatch(batch obiiter.BioSequenceBatch,
|
||||||
formater FormatHeader, skipEmpty bool) []byte {
|
formater FormatHeader, skipEmpty bool) *bytes.Buffer {
|
||||||
var bs bytes.Buffer
|
var bs bytes.Buffer
|
||||||
|
|
||||||
lt := 0
|
lt := 0
|
||||||
@@ -82,14 +83,7 @@ func FormatFastqBatch(batch obiiter.BioSequenceBatch,
|
|||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
chunk := bs.Bytes()
|
return &bs
|
||||||
|
|
||||||
return chunk
|
|
||||||
}
|
|
||||||
|
|
||||||
type FileChunck struct {
|
|
||||||
text []byte
|
|
||||||
order int
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func WriteFastq(iterator obiiter.IBioSequence,
|
func WriteFastq(iterator obiiter.IBioSequence,
|
||||||
@@ -97,7 +91,6 @@ func WriteFastq(iterator obiiter.IBioSequence,
|
|||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
|
|
||||||
opt := MakeOptions(options)
|
opt := MakeOptions(options)
|
||||||
iterator = iterator.Rebatch(opt.BatchSize())
|
|
||||||
|
|
||||||
file, _ = obiutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
|
file, _ = obiutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
|
||||||
|
|
||||||
@@ -105,30 +98,28 @@ func WriteFastq(iterator obiiter.IBioSequence,
|
|||||||
|
|
||||||
nwriters := opt.ParallelWorkers()
|
nwriters := opt.ParallelWorkers()
|
||||||
|
|
||||||
obiiter.RegisterAPipe()
|
chunkchan := WriteFileChunk(file, opt.CloseFile())
|
||||||
chunkchan := make(chan FileChunck)
|
|
||||||
|
|
||||||
header_format := opt.FormatFastSeqHeader()
|
header_format := opt.FormatFastSeqHeader()
|
||||||
|
|
||||||
newIter.Add(nwriters)
|
newIter.Add(nwriters)
|
||||||
|
|
||||||
var waitWriter sync.WaitGroup
|
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
newIter.WaitAndClose()
|
newIter.WaitAndClose()
|
||||||
for len(chunkchan) > 0 {
|
for len(chunkchan) > 0 {
|
||||||
time.Sleep(time.Millisecond)
|
time.Sleep(time.Millisecond)
|
||||||
}
|
}
|
||||||
close(chunkchan)
|
close(chunkchan)
|
||||||
waitWriter.Wait()
|
log.Debugf("Writing fastq file done")
|
||||||
}()
|
}()
|
||||||
|
|
||||||
ff := func(iterator obiiter.IBioSequence) {
|
ff := func(iterator obiiter.IBioSequence) {
|
||||||
for iterator.Next() {
|
for iterator.Next() {
|
||||||
batch := iterator.Get()
|
batch := iterator.Get()
|
||||||
chunk := FileChunck{
|
chunk := FileChunk{
|
||||||
FormatFastqBatch(batch, header_format, opt.SkipEmptySequence()),
|
Source: batch.Source(),
|
||||||
batch.Order(),
|
Raw: FormatFastqBatch(batch, header_format, opt.SkipEmptySequence()),
|
||||||
|
Order: batch.Order(),
|
||||||
}
|
}
|
||||||
chunkchan <- chunk
|
chunkchan <- chunk
|
||||||
newIter.Push(batch)
|
newIter.Push(batch)
|
||||||
@@ -138,54 +129,18 @@ func WriteFastq(iterator obiiter.IBioSequence,
|
|||||||
|
|
||||||
log.Debugln("Start of the fastq file writing")
|
log.Debugln("Start of the fastq file writing")
|
||||||
go ff(iterator)
|
go ff(iterator)
|
||||||
for i := 0; i < nwriters-1; i++ {
|
for i := 1; i < nwriters; i++ {
|
||||||
go ff(iterator.Split())
|
go ff(iterator.Split())
|
||||||
}
|
}
|
||||||
|
|
||||||
next_to_send := 0
|
|
||||||
received := make(map[int]FileChunck, 100)
|
|
||||||
|
|
||||||
waitWriter.Add(1)
|
|
||||||
go func() {
|
|
||||||
for chunk := range chunkchan {
|
|
||||||
if chunk.order == next_to_send {
|
|
||||||
if chunk.text[0] != '@' {
|
|
||||||
log.Panicln("WriteFastq: FASTQ format error")
|
|
||||||
}
|
|
||||||
file.Write(chunk.text)
|
|
||||||
next_to_send++
|
|
||||||
chunk, ok := received[next_to_send]
|
|
||||||
for ok {
|
|
||||||
if chunk.text[0] != '@' {
|
|
||||||
log.Panicln("WriteFastq: FASTQ format error")
|
|
||||||
}
|
|
||||||
file.Write(chunk.text)
|
|
||||||
delete(received, next_to_send)
|
|
||||||
next_to_send++
|
|
||||||
chunk, ok = received[next_to_send]
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
if _, ok := received[chunk.order]; ok {
|
|
||||||
log.Panicln("WriteFastq: Two chunks with the same number")
|
|
||||||
}
|
|
||||||
received[chunk.order] = chunk
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
file.Close()
|
|
||||||
|
|
||||||
log.Debugln("End of the fastq file writing")
|
|
||||||
obiiter.UnregisterPipe()
|
|
||||||
waitWriter.Done()
|
|
||||||
}()
|
|
||||||
|
|
||||||
return newIter, nil
|
return newIter, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func WriteFastqToStdout(iterator obiiter.IBioSequence,
|
func WriteFastqToStdout(iterator obiiter.IBioSequence,
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionDontCloseFile())
|
// options = append(options, OptionDontCloseFile())
|
||||||
|
options = append(options, OptionCloseFile())
|
||||||
|
|
||||||
return WriteFastq(iterator, os.Stdout, options...)
|
return WriteFastq(iterator, os.Stdout, options...)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -5,17 +5,19 @@ import (
|
|||||||
"io"
|
"io"
|
||||||
"slices"
|
"slices"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
)
|
)
|
||||||
|
|
||||||
var _FileChunkSize = 1 << 28
|
type SeqFileChunkParser func(string, io.Reader) (obiseq.BioSequenceSlice, error)
|
||||||
|
|
||||||
type SeqFileChunk struct {
|
type FileChunk struct {
|
||||||
raw io.Reader
|
Source string
|
||||||
order int
|
Raw *bytes.Buffer
|
||||||
|
Order int
|
||||||
}
|
}
|
||||||
|
|
||||||
type ChannelSeqFileChunk chan SeqFileChunk
|
type ChannelFileChunk chan FileChunk
|
||||||
|
|
||||||
type LastSeqRecord func([]byte) int
|
type LastSeqRecord func([]byte) int
|
||||||
|
|
||||||
@@ -32,13 +34,17 @@ type LastSeqRecord func([]byte) int
|
|||||||
//
|
//
|
||||||
// Returns:
|
// Returns:
|
||||||
// None
|
// None
|
||||||
func ReadSeqFileChunk(reader io.Reader,
|
func ReadFileChunk(
|
||||||
|
source string,
|
||||||
|
reader io.Reader,
|
||||||
buff []byte,
|
buff []byte,
|
||||||
splitter LastSeqRecord) ChannelSeqFileChunk {
|
splitter LastSeqRecord) ChannelFileChunk {
|
||||||
var err error
|
var err error
|
||||||
var fullbuff []byte
|
var fullbuff []byte
|
||||||
|
|
||||||
chunk_channel := make(ChannelSeqFileChunk)
|
chunk_channel := make(ChannelFileChunk)
|
||||||
|
|
||||||
|
fileChunkSize := len(buff)
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
size := 0
|
size := 0
|
||||||
@@ -65,11 +71,13 @@ func ReadSeqFileChunk(reader io.Reader,
|
|||||||
// Read from the reader in 1 MB increments until the end of the last entry is found
|
// Read from the reader in 1 MB increments until the end of the last entry is found
|
||||||
for end = splitter(buff); err == nil && end < 0; end = splitter(buff) {
|
for end = splitter(buff); err == nil && end < 0; end = splitter(buff) {
|
||||||
ic++
|
ic++
|
||||||
buff = slices.Grow(buff, _FileChunkSize)
|
buff = slices.Grow(buff, fileChunkSize)
|
||||||
l := len(buff)
|
l := len(buff)
|
||||||
extbuff := buff[l:(l + _FileChunkSize - 1)]
|
extbuff := buff[l:(l + fileChunkSize - 1)]
|
||||||
size, err = io.ReadFull(reader, extbuff)
|
size, err = io.ReadFull(reader, extbuff)
|
||||||
buff = buff[0:(l + size)]
|
buff = buff[0:(l + size)]
|
||||||
|
// log.Warnf("Splitter not found, attempting %d to read in %d B increments : len(buff) = %d/%d", ic, fileChunkSize, len(extbuff), len(buff))
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fullbuff = buff
|
fullbuff = buff
|
||||||
@@ -87,8 +95,10 @@ func ReadSeqFileChunk(reader io.Reader,
|
|||||||
}
|
}
|
||||||
|
|
||||||
if len(buff) > 0 {
|
if len(buff) > 0 {
|
||||||
io := bytes.NewBuffer(slices.Clone(buff))
|
cbuff := slices.Clone(buff)
|
||||||
chunk_channel <- SeqFileChunk{io, i}
|
io := bytes.NewBuffer(cbuff)
|
||||||
|
// log.Warnf("chuck %d :Read %d bytes from file %s", i, io.Len(), source)
|
||||||
|
chunk_channel <- FileChunk{source, io, i}
|
||||||
i++
|
i++
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -96,7 +106,7 @@ func ReadSeqFileChunk(reader io.Reader,
|
|||||||
buff = fullbuff[0:lremain]
|
buff = fullbuff[0:lremain]
|
||||||
lcp := copy(buff, fullbuff[pnext:])
|
lcp := copy(buff, fullbuff[pnext:])
|
||||||
if lcp < lremain {
|
if lcp < lremain {
|
||||||
log.Fatalf("Error copying remaining data of chunck %d : %d < %d", i, lcp, lremain)
|
log.Fatalf("Error copying remaining data of chunk %d : %d < %d", i, lcp, lremain)
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
buff = buff[:0]
|
buff = buff[:0]
|
||||||
@@ -112,7 +122,7 @@ func ReadSeqFileChunk(reader io.Reader,
|
|||||||
// Send the last chunk to the channel
|
// Send the last chunk to the channel
|
||||||
if len(buff) > 0 {
|
if len(buff) > 0 {
|
||||||
io := bytes.NewBuffer(slices.Clone(buff))
|
io := bytes.NewBuffer(slices.Clone(buff))
|
||||||
chunk_channel <- SeqFileChunk{io, i}
|
chunk_channel <- FileChunk{source, io, i}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Close the readers channel when the end of the file is reached
|
// Close the readers channel when the end of the file is reached
|
||||||
61
pkg/obiformats/file_chunk_write.go
Normal file
61
pkg/obiformats/file_chunk_write.go
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
package obiformats
|
||||||
|
|
||||||
|
import (
|
||||||
|
"io"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|
||||||
|
log "github.com/sirupsen/logrus"
|
||||||
|
)
|
||||||
|
|
||||||
|
func WriteFileChunk(
|
||||||
|
writer io.WriteCloser,
|
||||||
|
toBeClosed bool) ChannelFileChunk {
|
||||||
|
|
||||||
|
obiutils.RegisterAPipe()
|
||||||
|
chunk_channel := make(ChannelFileChunk)
|
||||||
|
|
||||||
|
go func() {
|
||||||
|
nextToPrint := 0
|
||||||
|
toBePrinted := make(map[int]FileChunk)
|
||||||
|
for chunk := range chunk_channel {
|
||||||
|
if chunk.Order == nextToPrint {
|
||||||
|
log.Debugf("Writing chunk: %d of length %d bytes",
|
||||||
|
chunk.Order,
|
||||||
|
len(chunk.Raw.Bytes()))
|
||||||
|
|
||||||
|
n, err := writer.Write(chunk.Raw.Bytes())
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("Cannot write chunk %d only %d bytes written on %d sended : %v",
|
||||||
|
chunk.Order, n, len(chunk.Raw.Bytes()), err)
|
||||||
|
}
|
||||||
|
nextToPrint++
|
||||||
|
|
||||||
|
chunk, ok := toBePrinted[nextToPrint]
|
||||||
|
for ok {
|
||||||
|
log.Debug("Writing buffered chunk : ", chunk.Order)
|
||||||
|
_, _ = writer.Write(chunk.Raw.Bytes())
|
||||||
|
delete(toBePrinted, nextToPrint)
|
||||||
|
nextToPrint++
|
||||||
|
chunk, ok = toBePrinted[nextToPrint]
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
toBePrinted[chunk.Order] = chunk
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Debugf("FIle have to be closed : %v", toBeClosed)
|
||||||
|
if toBeClosed {
|
||||||
|
err := writer.Close()
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("Cannot close the writer : %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
obiutils.UnregisterPipe()
|
||||||
|
log.Debugf("The writer has been closed")
|
||||||
|
}()
|
||||||
|
|
||||||
|
return chunk_channel
|
||||||
|
}
|
||||||
@@ -29,27 +29,11 @@ const (
|
|||||||
|
|
||||||
var _seqlenght_rx = regexp.MustCompile(" +([0-9]+) bp")
|
var _seqlenght_rx = regexp.MustCompile(" +([0-9]+) bp")
|
||||||
|
|
||||||
func _ParseGenbankFile(source string,
|
func GenbankChunkParser(withFeatureTable bool) func(string, io.Reader) (obiseq.BioSequenceSlice, error) {
|
||||||
input ChannelSeqFileChunk,
|
return func(source string, input io.Reader) (obiseq.BioSequenceSlice, error) {
|
||||||
out obiiter.IBioSequence,
|
state := inHeader
|
||||||
chunck_order func() int,
|
scanner := bufio.NewReader(input)
|
||||||
withFeatureTable bool,
|
sequences := obiseq.MakeBioSequenceSlice(100)[:0]
|
||||||
batch_size int,
|
|
||||||
total_seq_size int) {
|
|
||||||
state := inHeader
|
|
||||||
previous_chunk := -1
|
|
||||||
|
|
||||||
for chunks := range input {
|
|
||||||
|
|
||||||
if state != inHeader {
|
|
||||||
log.Fatalf("Unexpected state %d starting new chunk (id = %d, previous_chunk = %d)",
|
|
||||||
state, chunks.order, previous_chunk)
|
|
||||||
}
|
|
||||||
|
|
||||||
previous_chunk = chunks.order
|
|
||||||
scanner := bufio.NewReader(chunks.raw)
|
|
||||||
sequences := make(obiseq.BioSequenceSlice, 0, 100)
|
|
||||||
sumlength := 0
|
|
||||||
id := ""
|
id := ""
|
||||||
lseq := -1
|
lseq := -1
|
||||||
scientificName := ""
|
scientificName := ""
|
||||||
@@ -64,7 +48,7 @@ func _ParseGenbankFile(source string,
|
|||||||
nl++
|
nl++
|
||||||
line = string(bline)
|
line = string(bline)
|
||||||
if is_prefix || len(line) > 100 {
|
if is_prefix || len(line) > 100 {
|
||||||
log.Fatalf("Chunk %d : Line too long: %s", chunks.order, line)
|
log.Fatalf("From %s:Line too long: %s", source, line)
|
||||||
}
|
}
|
||||||
processed := false
|
processed := false
|
||||||
for !processed {
|
for !processed {
|
||||||
@@ -165,15 +149,6 @@ func _ParseGenbankFile(source string,
|
|||||||
// sequence.Len(), seqBytes.Len())
|
// sequence.Len(), seqBytes.Len())
|
||||||
|
|
||||||
sequences = append(sequences, sequence)
|
sequences = append(sequences, sequence)
|
||||||
sumlength += sequence.Len()
|
|
||||||
|
|
||||||
if len(sequences) == batch_size || sumlength > total_seq_size {
|
|
||||||
oo := chunck_order()
|
|
||||||
log.Debugln("Pushing sequence batch ", oo, " with ", len(sequences), " sequences")
|
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(oo, sequences))
|
|
||||||
sequences = make(obiseq.BioSequenceSlice, 0, 100)
|
|
||||||
sumlength = 0
|
|
||||||
}
|
|
||||||
|
|
||||||
defBytes = bytes.NewBuffer(obiseq.GetSlice(200))
|
defBytes = bytes.NewBuffer(obiseq.GetSlice(200))
|
||||||
featBytes = new(bytes.Buffer)
|
featBytes = new(bytes.Buffer)
|
||||||
@@ -219,11 +194,24 @@ func _ParseGenbankFile(source string,
|
|||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(sequences) > 0 {
|
return sequences, nil
|
||||||
oo := chunck_order()
|
}
|
||||||
log.Debugln("Pushing sequence batch ", oo, " with ", len(sequences), " sequences")
|
}
|
||||||
out.Push(obiiter.MakeBioSequenceBatch(oo, sequences))
|
|
||||||
|
func _ParseGenbankFile(input ChannelFileChunk,
|
||||||
|
out obiiter.IBioSequence,
|
||||||
|
withFeatureTable bool) {
|
||||||
|
|
||||||
|
parser := GenbankChunkParser(withFeatureTable)
|
||||||
|
|
||||||
|
for chunks := range input {
|
||||||
|
sequences, err := parser(chunks.Source, chunks.Raw)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("File %s : Cannot parse the genbank file : %v", chunks.Source, err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
out.Push(obiiter.MakeBioSequenceBatch(chunks.Source, chunks.Order, sequences))
|
||||||
}
|
}
|
||||||
|
|
||||||
log.Debug("End of the Genbank thread")
|
log.Debug("End of the Genbank thread")
|
||||||
@@ -231,26 +219,31 @@ func _ParseGenbankFile(source string,
|
|||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func ReadGenbank(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
func ReadGenbank(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
opt := MakeOptions(options)
|
opt := MakeOptions(options)
|
||||||
// entry_channel := make(chan _FileChunk)
|
// entry_channel := make(chan _FileChunk)
|
||||||
|
|
||||||
buff := make([]byte, 1024*1024*1024*256)
|
buff := make([]byte, 1024*1024*128) // 128 MB
|
||||||
|
|
||||||
|
entry_channel := ReadFileChunk(
|
||||||
|
opt.Source(),
|
||||||
|
reader,
|
||||||
|
buff,
|
||||||
|
EndOfLastFlatFileEntry,
|
||||||
|
)
|
||||||
|
|
||||||
entry_channel := ReadSeqFileChunk(reader, buff, _EndOfLastEntry)
|
|
||||||
newIter := obiiter.MakeIBioSequence()
|
newIter := obiiter.MakeIBioSequence()
|
||||||
|
|
||||||
nworkers := opt.ParallelWorkers()
|
nworkers := opt.ParallelWorkers()
|
||||||
chunck_order := obiutils.AtomicCounter()
|
|
||||||
|
|
||||||
// for j := 0; j < opt.ParallelWorkers(); j++ {
|
// for j := 0; j < opt.ParallelWorkers(); j++ {
|
||||||
for j := 0; j < nworkers; j++ {
|
for j := 0; j < nworkers; j++ {
|
||||||
newIter.Add(1)
|
newIter.Add(1)
|
||||||
go _ParseGenbankFile(opt.Source(),
|
go _ParseGenbankFile(
|
||||||
entry_channel, newIter, chunck_order,
|
entry_channel,
|
||||||
|
newIter,
|
||||||
opt.WithFeatureTable(),
|
opt.WithFeatureTable(),
|
||||||
opt.BatchSize(),
|
)
|
||||||
opt.TotalSeqSize())
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// go _ReadFlatFileChunk(reader, entry_channel)
|
// go _ReadFlatFileChunk(reader, entry_channel)
|
||||||
@@ -264,7 +257,7 @@ func ReadGenbank(reader io.Reader, options ...WithOption) obiiter.IBioSequence {
|
|||||||
newIter = newIter.CompleteFileIterator()
|
newIter = newIter.CompleteFileIterator()
|
||||||
}
|
}
|
||||||
|
|
||||||
return newIter
|
return newIter, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func ReadGenbankFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
func ReadGenbankFromFile(filename string, options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
@@ -273,9 +266,9 @@ func ReadGenbankFromFile(filename string, options ...WithOption) (obiiter.IBioSe
|
|||||||
|
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
||||||
|
|
||||||
reader, err = Ropen(filename)
|
reader, err = obiutils.Ropen(filename)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("file %s is empty", filename)
|
log.Infof("file %s is empty", filename)
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
@@ -285,5 +278,5 @@ func ReadGenbankFromFile(filename string, options ...WithOption) (obiiter.IBioSe
|
|||||||
return obiiter.NilIBioSequence, err
|
return obiiter.NilIBioSequence, err
|
||||||
}
|
}
|
||||||
|
|
||||||
return ReadGenbank(reader, options...), nil
|
return ReadGenbank(reader, options...)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,14 +3,15 @@ package obiformats
|
|||||||
import (
|
import (
|
||||||
"bufio"
|
"bufio"
|
||||||
"bytes"
|
"bytes"
|
||||||
"github.com/goccy/go-json"
|
|
||||||
"io"
|
"io"
|
||||||
"os"
|
"os"
|
||||||
"strconv"
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
"sync"
|
"sync/atomic"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"github.com/goccy/go-json"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
@@ -57,9 +58,17 @@ func JSONRecord(sequence *obiseq.BioSequence) []byte {
|
|||||||
return text
|
return text
|
||||||
}
|
}
|
||||||
|
|
||||||
func FormatJSONBatch(batch obiiter.BioSequenceBatch) []byte {
|
func FormatJSONBatch(batch obiiter.BioSequenceBatch) *bytes.Buffer {
|
||||||
buff := new(bytes.Buffer)
|
buff := new(bytes.Buffer)
|
||||||
|
|
||||||
json := bufio.NewWriter(buff)
|
json := bufio.NewWriter(buff)
|
||||||
|
|
||||||
|
if batch.Order() == 0 {
|
||||||
|
json.WriteString("[\n")
|
||||||
|
} else {
|
||||||
|
json.WriteString(",\n")
|
||||||
|
}
|
||||||
|
|
||||||
n := batch.Slice().Len() - 1
|
n := batch.Slice().Len() - 1
|
||||||
for i, s := range batch.Slice() {
|
for i, s := range batch.Slice() {
|
||||||
json.WriteString(" ")
|
json.WriteString(" ")
|
||||||
@@ -70,35 +79,36 @@ func FormatJSONBatch(batch obiiter.BioSequenceBatch) []byte {
|
|||||||
}
|
}
|
||||||
|
|
||||||
json.Flush()
|
json.Flush()
|
||||||
|
return buff
|
||||||
return buff.Bytes()
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func WriteJSON(iterator obiiter.IBioSequence,
|
func WriteJSON(iterator obiiter.IBioSequence,
|
||||||
file io.WriteCloser,
|
file io.WriteCloser,
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
|
var latestChunk atomic.Int64
|
||||||
|
|
||||||
opt := MakeOptions(options)
|
opt := MakeOptions(options)
|
||||||
|
|
||||||
file, _ = obiutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
|
file, _ = obiutils.CompressStream(file, opt.CompressedFile(), opt.CloseFile())
|
||||||
|
|
||||||
newIter := obiiter.MakeIBioSequence()
|
newIter := obiiter.MakeIBioSequence()
|
||||||
|
|
||||||
nwriters := opt.ParallelWorkers()
|
nwriters := opt.ParallelWorkers()
|
||||||
|
|
||||||
obiiter.RegisterAPipe()
|
chunkchan := WriteFileChunk(file, opt.CloseFile())
|
||||||
chunkchan := make(chan FileChunck)
|
|
||||||
|
|
||||||
newIter.Add(nwriters)
|
newIter.Add(nwriters)
|
||||||
var waitWriter sync.WaitGroup
|
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
newIter.WaitAndClose()
|
newIter.WaitAndClose()
|
||||||
|
|
||||||
|
chunkchan <- FileChunk{
|
||||||
|
Source: "end",
|
||||||
|
Raw: bytes.NewBuffer([]byte("\n]\n")),
|
||||||
|
Order: int(latestChunk.Load()) + 1,
|
||||||
|
}
|
||||||
for len(chunkchan) > 0 {
|
for len(chunkchan) > 0 {
|
||||||
time.Sleep(time.Millisecond)
|
time.Sleep(time.Millisecond)
|
||||||
}
|
}
|
||||||
close(chunkchan)
|
close(chunkchan)
|
||||||
waitWriter.Wait()
|
|
||||||
}()
|
}()
|
||||||
|
|
||||||
ff := func(iterator obiiter.IBioSequence) {
|
ff := func(iterator obiiter.IBioSequence) {
|
||||||
@@ -106,62 +116,32 @@ func WriteJSON(iterator obiiter.IBioSequence,
|
|||||||
|
|
||||||
batch := iterator.Get()
|
batch := iterator.Get()
|
||||||
|
|
||||||
chunkchan <- FileChunck{
|
ss := FileChunk{
|
||||||
FormatJSONBatch(batch),
|
Source: batch.Source(),
|
||||||
batch.Order(),
|
Raw: FormatJSONBatch(batch),
|
||||||
|
Order: batch.Order(),
|
||||||
}
|
}
|
||||||
|
|
||||||
|
chunkchan <- ss
|
||||||
|
latestChunk.Store(int64(batch.Order()))
|
||||||
newIter.Push(batch)
|
newIter.Push(batch)
|
||||||
}
|
}
|
||||||
newIter.Done()
|
newIter.Done()
|
||||||
}
|
}
|
||||||
|
|
||||||
next_to_send := 0
|
|
||||||
received := make(map[int]FileChunck, 100)
|
|
||||||
|
|
||||||
waitWriter.Add(1)
|
|
||||||
go func() {
|
|
||||||
for chunk := range chunkchan {
|
|
||||||
if chunk.order == next_to_send {
|
|
||||||
if next_to_send > 0 {
|
|
||||||
file.Write([]byte(",\n"))
|
|
||||||
}
|
|
||||||
file.Write(chunk.text)
|
|
||||||
next_to_send++
|
|
||||||
chunk, ok := received[next_to_send]
|
|
||||||
for ok {
|
|
||||||
file.Write(chunk.text)
|
|
||||||
delete(received, next_to_send)
|
|
||||||
next_to_send++
|
|
||||||
chunk, ok = received[next_to_send]
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
received[chunk.order] = chunk
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
file.Write([]byte("\n]\n"))
|
|
||||||
file.Close()
|
|
||||||
|
|
||||||
log.Debugln("End of the JSON file writing")
|
|
||||||
obiiter.UnregisterPipe()
|
|
||||||
waitWriter.Done()
|
|
||||||
|
|
||||||
}()
|
|
||||||
|
|
||||||
log.Debugln("Start of the JSON file writing")
|
log.Debugln("Start of the JSON file writing")
|
||||||
file.Write([]byte("[\n"))
|
for i := 1; i < nwriters; i++ {
|
||||||
go ff(iterator)
|
|
||||||
for i := 0; i < nwriters-1; i++ {
|
|
||||||
go ff(iterator.Split())
|
go ff(iterator.Split())
|
||||||
}
|
}
|
||||||
|
go ff(iterator)
|
||||||
|
|
||||||
return newIter, nil
|
return newIter, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func WriteJSONToStdout(iterator obiiter.IBioSequence,
|
func WriteJSONToStdout(iterator obiiter.IBioSequence,
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionDontCloseFile())
|
options = append(options, OptionCloseFile())
|
||||||
|
|
||||||
return WriteJSON(iterator, os.Stdout, options...)
|
return WriteJSON(iterator, os.Stdout, options...)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,169 +0,0 @@
|
|||||||
package ncbitaxdump
|
|
||||||
|
|
||||||
import (
|
|
||||||
"bufio"
|
|
||||||
"encoding/csv"
|
|
||||||
"fmt"
|
|
||||||
"io"
|
|
||||||
"os"
|
|
||||||
"path"
|
|
||||||
"strconv"
|
|
||||||
"strings"
|
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obitax"
|
|
||||||
)
|
|
||||||
|
|
||||||
func loadNodeTable(reader io.Reader, taxonomy *obitax.Taxonomy) {
|
|
||||||
file := csv.NewReader(reader)
|
|
||||||
file.Comma = '|'
|
|
||||||
file.Comment = '#'
|
|
||||||
file.TrimLeadingSpace = true
|
|
||||||
file.ReuseRecord = true
|
|
||||||
|
|
||||||
n := 0
|
|
||||||
|
|
||||||
for record, err := file.Read(); err == nil; record, err = file.Read() {
|
|
||||||
n++
|
|
||||||
taxid, err := strconv.Atoi(strings.TrimSpace(record[0]))
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Panicf("Cannot read taxon taxid at line %d: %v", n, err)
|
|
||||||
}
|
|
||||||
|
|
||||||
parent, err := strconv.Atoi(strings.TrimSpace(record[1]))
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Panicf("Cannot read taxon parent taxid at line %d: %v", n, err)
|
|
||||||
}
|
|
||||||
|
|
||||||
rank := strings.TrimSpace(record[2])
|
|
||||||
|
|
||||||
taxonomy.AddNewTaxa(taxid, parent, rank, true, true)
|
|
||||||
}
|
|
||||||
|
|
||||||
taxonomy.ReindexParent()
|
|
||||||
}
|
|
||||||
|
|
||||||
func loadNameTable(reader io.Reader, taxonomy *obitax.Taxonomy, onlysn bool) int {
|
|
||||||
// file := csv.NewReader(reader)
|
|
||||||
// file.Comma = '|'
|
|
||||||
// file.Comment = '#'
|
|
||||||
// file.TrimLeadingSpace = true
|
|
||||||
// file.ReuseRecord = true
|
|
||||||
// file.LazyQuotes = true
|
|
||||||
file := bufio.NewReader(reader)
|
|
||||||
|
|
||||||
n := 0
|
|
||||||
l := 0
|
|
||||||
|
|
||||||
for line, prefix, err := file.ReadLine(); err == nil; line, prefix, err = file.ReadLine() {
|
|
||||||
l++
|
|
||||||
if prefix {
|
|
||||||
return -1
|
|
||||||
}
|
|
||||||
|
|
||||||
record := strings.Split(string(line), "|")
|
|
||||||
taxid, err := strconv.Atoi(strings.TrimSpace(record[0]))
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Panicf("Cannot read taxon name taxid at line %d: %v", l, err)
|
|
||||||
}
|
|
||||||
|
|
||||||
name := strings.TrimSpace(record[1])
|
|
||||||
classname := strings.TrimSpace(record[3])
|
|
||||||
|
|
||||||
if !onlysn || classname == "scientific name" {
|
|
||||||
n++
|
|
||||||
taxonomy.AddNewName(taxid, &name, &classname)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return n
|
|
||||||
}
|
|
||||||
|
|
||||||
func loadMergedTable(reader io.Reader, taxonomy *obitax.Taxonomy) int {
|
|
||||||
file := csv.NewReader(reader)
|
|
||||||
file.Comma = '|'
|
|
||||||
file.Comment = '#'
|
|
||||||
file.TrimLeadingSpace = true
|
|
||||||
file.ReuseRecord = true
|
|
||||||
|
|
||||||
n := 0
|
|
||||||
|
|
||||||
for record, err := file.Read(); err == nil; record, err = file.Read() {
|
|
||||||
n++
|
|
||||||
oldtaxid, err := strconv.Atoi(strings.TrimSpace(record[0]))
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Panicf("Cannot read alias taxid at line %d: %v", n, err)
|
|
||||||
}
|
|
||||||
newtaxid, err := strconv.Atoi(strings.TrimSpace(record[1]))
|
|
||||||
|
|
||||||
if err != nil {
|
|
||||||
log.Panicf("Cannot read alias new taxid at line %d: %v", n, err)
|
|
||||||
}
|
|
||||||
|
|
||||||
taxonomy.AddNewAlias(newtaxid, oldtaxid)
|
|
||||||
}
|
|
||||||
|
|
||||||
return n
|
|
||||||
}
|
|
||||||
|
|
||||||
func LoadNCBITaxDump(directory string, onlysn bool) (*obitax.Taxonomy, error) {
|
|
||||||
|
|
||||||
taxonomy := obitax.NewTaxonomy()
|
|
||||||
|
|
||||||
//
|
|
||||||
// Load the Taxonomy nodes
|
|
||||||
//
|
|
||||||
|
|
||||||
log.Printf("Loading Taxonomy nodes\n")
|
|
||||||
|
|
||||||
nodefile, err := os.Open(path.Join(directory, "nodes.dmp"))
|
|
||||||
if err != nil {
|
|
||||||
return nil, fmt.Errorf("cannot open nodes file from '%s'",
|
|
||||||
directory)
|
|
||||||
}
|
|
||||||
defer nodefile.Close()
|
|
||||||
|
|
||||||
buffered := bufio.NewReader(nodefile)
|
|
||||||
loadNodeTable(buffered, taxonomy)
|
|
||||||
log.Printf("%d Taxonomy nodes read\n", taxonomy.Len())
|
|
||||||
|
|
||||||
//
|
|
||||||
// Load the Taxonomy nodes
|
|
||||||
//
|
|
||||||
|
|
||||||
log.Printf("Loading Taxon names\n")
|
|
||||||
|
|
||||||
namefile, nerr := os.Open(path.Join(directory, "names.dmp"))
|
|
||||||
if nerr != nil {
|
|
||||||
return nil, fmt.Errorf("cannot open names file from '%s'",
|
|
||||||
directory)
|
|
||||||
}
|
|
||||||
defer namefile.Close()
|
|
||||||
|
|
||||||
n := loadNameTable(namefile, taxonomy, onlysn)
|
|
||||||
log.Printf("%d taxon names read\n", n)
|
|
||||||
|
|
||||||
//
|
|
||||||
// Load the merged taxa
|
|
||||||
//
|
|
||||||
|
|
||||||
log.Printf("Loading Merged taxa\n")
|
|
||||||
|
|
||||||
aliasfile, aerr := os.Open(path.Join(directory, "merged.dmp"))
|
|
||||||
if aerr != nil {
|
|
||||||
return nil, fmt.Errorf("cannot open merged file from '%s'",
|
|
||||||
directory)
|
|
||||||
}
|
|
||||||
defer aliasfile.Close()
|
|
||||||
|
|
||||||
buffered = bufio.NewReader(aliasfile)
|
|
||||||
n = loadMergedTable(buffered, taxonomy)
|
|
||||||
log.Printf("%d merged taxa read\n", n)
|
|
||||||
|
|
||||||
return taxonomy, nil
|
|
||||||
}
|
|
||||||
@@ -536,6 +536,24 @@ var library_parameter = map[string]func(library *obingslibrary.NGSLibrary, value
|
|||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ReadCSVNGSFilter reads an NGS filter configuration from a CSV file and returns
|
||||||
|
// an NGSLibrary. The CSV file must include columns for 'experiment', 'sample',
|
||||||
|
// 'sample_tag', 'forward_primer', and 'reverse_primer'. Additional columns are
|
||||||
|
// used to annotate PCR samples.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - reader: an io.Reader providing the CSV input.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - A pointer to an NGSLibrary populated with the data from the CSV file.
|
||||||
|
// - An error if the CSV is malformed or required columns are missing.
|
||||||
|
//
|
||||||
|
// The function processes both data records and parameter lines starting with
|
||||||
|
// '@param'. Parameter lines configure various aspects of the library.
|
||||||
|
//
|
||||||
|
// Each row in the CSV is validated to ensure it has the correct number of columns.
|
||||||
|
// Duplicate tag pairs for the same marker result in an error. Primer unicity is
|
||||||
|
// checked, and any unknown parameters are logged as warnings.
|
||||||
func ReadCSVNGSFilter(reader io.Reader) (*obingslibrary.NGSLibrary, error) {
|
func ReadCSVNGSFilter(reader io.Reader) (*obingslibrary.NGSLibrary, error) {
|
||||||
ngsfilter := obingslibrary.MakeNGSLibrary()
|
ngsfilter := obingslibrary.MakeNGSLibrary()
|
||||||
file := csv.NewReader(reader)
|
file := csv.NewReader(reader)
|
||||||
@@ -576,6 +594,7 @@ func ReadCSVNGSFilter(reader io.Reader) (*obingslibrary.NGSLibrary, error) {
|
|||||||
extraColumns := make([]int, 0)
|
extraColumns := make([]int, 0)
|
||||||
|
|
||||||
for i, colName := range header {
|
for i, colName := range header {
|
||||||
|
|
||||||
switch colName {
|
switch colName {
|
||||||
case "experiment":
|
case "experiment":
|
||||||
experimentColIndex = i
|
experimentColIndex = i
|
||||||
@@ -642,6 +661,8 @@ func ReadCSVNGSFilter(reader io.Reader) (*obingslibrary.NGSLibrary, error) {
|
|||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
ngsfilter.CheckPrimerUnicity()
|
||||||
|
|
||||||
for i := 0; i < len(params); i++ {
|
for i := 0; i < len(params); i++ {
|
||||||
param := params[i][1]
|
param := params[i][1]
|
||||||
if len(params[i]) < 3 {
|
if len(params[i]) < 3 {
|
||||||
|
|||||||
@@ -1,13 +1,14 @@
|
|||||||
package obiformats
|
package obiformats
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
)
|
)
|
||||||
|
|
||||||
type __options__ struct {
|
type __options__ struct {
|
||||||
fastseq_header_parser obiseq.SeqAnnotator
|
fastseq_header_parser obiseq.SeqAnnotator
|
||||||
fastseq_header_writer func(*obiseq.BioSequence) string
|
fastseq_header_writer BioSequenceFormater
|
||||||
|
seqBatchFormater FormatSeqBatch
|
||||||
with_progress_bar bool
|
with_progress_bar bool
|
||||||
buffer_size int
|
buffer_size int
|
||||||
batch_size int
|
batch_size int
|
||||||
@@ -19,6 +20,7 @@ type __options__ struct {
|
|||||||
appendfile bool
|
appendfile bool
|
||||||
compressed bool
|
compressed bool
|
||||||
skip_empty bool
|
skip_empty bool
|
||||||
|
with_quality bool
|
||||||
csv_id bool
|
csv_id bool
|
||||||
csv_sequence bool
|
csv_sequence bool
|
||||||
csv_quality bool
|
csv_quality bool
|
||||||
@@ -44,10 +46,11 @@ func MakeOptions(setters []WithOption) Options {
|
|||||||
o := __options__{
|
o := __options__{
|
||||||
fastseq_header_parser: ParseGuessedFastSeqHeader,
|
fastseq_header_parser: ParseGuessedFastSeqHeader,
|
||||||
fastseq_header_writer: FormatFastSeqJsonHeader,
|
fastseq_header_writer: FormatFastSeqJsonHeader,
|
||||||
|
seqBatchFormater: nil,
|
||||||
with_progress_bar: false,
|
with_progress_bar: false,
|
||||||
buffer_size: 2,
|
buffer_size: 2,
|
||||||
parallel_workers: obioptions.CLIReadParallelWorkers(),
|
parallel_workers: obidefault.ReadParallelWorkers(),
|
||||||
batch_size: obioptions.CLIBatchSize(),
|
batch_size: obidefault.BatchSize(),
|
||||||
total_seq_size: 1024 * 1024 * 100, // 100 MB by default
|
total_seq_size: 1024 * 1024 * 100, // 100 MB by default
|
||||||
no_order: false,
|
no_order: false,
|
||||||
full_file_batch: false,
|
full_file_batch: false,
|
||||||
@@ -55,6 +58,7 @@ func MakeOptions(setters []WithOption) Options {
|
|||||||
appendfile: false,
|
appendfile: false,
|
||||||
compressed: false,
|
compressed: false,
|
||||||
skip_empty: false,
|
skip_empty: false,
|
||||||
|
with_quality: true,
|
||||||
csv_id: true,
|
csv_id: true,
|
||||||
csv_definition: false,
|
csv_definition: false,
|
||||||
csv_count: false,
|
csv_count: false,
|
||||||
@@ -103,6 +107,10 @@ func (opt Options) FormatFastSeqHeader() func(*obiseq.BioSequence) string {
|
|||||||
return opt.pointer.fastseq_header_writer
|
return opt.pointer.fastseq_header_writer
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (opt Options) SequenceFormater() FormatSeqBatch {
|
||||||
|
return opt.pointer.seqBatchFormater
|
||||||
|
}
|
||||||
|
|
||||||
func (opt Options) NoOrder() bool {
|
func (opt Options) NoOrder() bool {
|
||||||
return opt.pointer.no_order
|
return opt.pointer.no_order
|
||||||
}
|
}
|
||||||
@@ -127,6 +135,10 @@ func (opt Options) SkipEmptySequence() bool {
|
|||||||
return opt.pointer.skip_empty
|
return opt.pointer.skip_empty
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (opt Options) ReadQualities() bool {
|
||||||
|
return opt.pointer.with_quality
|
||||||
|
}
|
||||||
|
|
||||||
func (opt Options) CSVId() bool {
|
func (opt Options) CSVId() bool {
|
||||||
return opt.pointer.csv_id
|
return opt.pointer.csv_id
|
||||||
}
|
}
|
||||||
@@ -219,8 +231,6 @@ func OptionNoOrder(no_order bool) WithOption {
|
|||||||
return f
|
return f
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
func OptionsCompressed(compressed bool) WithOption {
|
func OptionsCompressed(compressed bool) WithOption {
|
||||||
f := WithOption(func(opt Options) {
|
f := WithOption(func(opt Options) {
|
||||||
opt.pointer.compressed = compressed
|
opt.pointer.compressed = compressed
|
||||||
@@ -237,6 +247,14 @@ func OptionsSkipEmptySequence(skip bool) WithOption {
|
|||||||
return f
|
return f
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func OptionsReadQualities(read bool) WithOption {
|
||||||
|
f := WithOption(func(opt Options) {
|
||||||
|
opt.pointer.with_quality = read
|
||||||
|
})
|
||||||
|
|
||||||
|
return f
|
||||||
|
}
|
||||||
|
|
||||||
func OptionsNewFile() WithOption {
|
func OptionsNewFile() WithOption {
|
||||||
f := WithOption(func(opt Options) {
|
f := WithOption(func(opt Options) {
|
||||||
opt.pointer.appendfile = false
|
opt.pointer.appendfile = false
|
||||||
@@ -271,6 +289,14 @@ func OptionsFastSeqHeaderFormat(format func(*obiseq.BioSequence) string) WithOpt
|
|||||||
return f
|
return f
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func OptionsSequenceFormater(formater FormatSeqBatch) WithOption {
|
||||||
|
f := WithOption(func(opt Options) {
|
||||||
|
opt.pointer.seqBatchFormater = formater
|
||||||
|
})
|
||||||
|
|
||||||
|
return f
|
||||||
|
}
|
||||||
|
|
||||||
func OptionsParallelWorkers(nworkers int) WithOption {
|
func OptionsParallelWorkers(nworkers int) WithOption {
|
||||||
f := WithOption(func(opt Options) {
|
f := WithOption(func(opt Options) {
|
||||||
opt.pointer.parallel_workers = nworkers
|
opt.pointer.parallel_workers = nworkers
|
||||||
|
|||||||
@@ -3,6 +3,8 @@ package obiformats
|
|||||||
import (
|
import (
|
||||||
"bufio"
|
"bufio"
|
||||||
"bytes"
|
"bytes"
|
||||||
|
"encoding/csv"
|
||||||
|
"errors"
|
||||||
"io"
|
"io"
|
||||||
"path"
|
"path"
|
||||||
"regexp"
|
"regexp"
|
||||||
@@ -15,6 +17,8 @@ import (
|
|||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
type SequenceReader func(reader io.Reader, options ...WithOption) (obiiter.IBioSequence, error)
|
||||||
|
|
||||||
// OBIMimeTypeGuesser is a function that takes an io.Reader as input and guesses the MIME type of the data.
|
// OBIMimeTypeGuesser is a function that takes an io.Reader as input and guesses the MIME type of the data.
|
||||||
// It uses several detectors to identify specific file formats, such as FASTA, FASTQ, ecoPCR2, GenBank, and EMBL.
|
// It uses several detectors to identify specific file formats, such as FASTA, FASTQ, ecoPCR2, GenBank, and EMBL.
|
||||||
// The function reads data from the input stream and analyzes it using the mimetype library.
|
// The function reads data from the input stream and analyzes it using the mimetype library.
|
||||||
@@ -37,6 +41,31 @@ import (
|
|||||||
// - io.Reader: A modified reader with the read data.
|
// - io.Reader: A modified reader with the read data.
|
||||||
// - error: Any error encountered during the process.
|
// - error: Any error encountered during the process.
|
||||||
func OBIMimeTypeGuesser(stream io.Reader) (*mimetype.MIME, io.Reader, error) {
|
func OBIMimeTypeGuesser(stream io.Reader) (*mimetype.MIME, io.Reader, error) {
|
||||||
|
csv := func(in []byte, limit uint32) bool {
|
||||||
|
in = dropLastLine(in, limit)
|
||||||
|
|
||||||
|
br := bytes.NewReader(in)
|
||||||
|
r := csv.NewReader(br)
|
||||||
|
r.Comma = ','
|
||||||
|
r.ReuseRecord = true
|
||||||
|
r.LazyQuotes = true
|
||||||
|
r.Comment = '#'
|
||||||
|
|
||||||
|
lines := 0
|
||||||
|
for {
|
||||||
|
_, err := r.Read()
|
||||||
|
if errors.Is(err, io.EOF) {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
lines++
|
||||||
|
}
|
||||||
|
|
||||||
|
return r.FieldsPerRecord > 1 && lines > 1
|
||||||
|
}
|
||||||
|
|
||||||
fastaDetector := func(raw []byte, limit uint32) bool {
|
fastaDetector := func(raw []byte, limit uint32) bool {
|
||||||
ok, err := regexp.Match("^>[^ ]", raw)
|
ok, err := regexp.Match("^>[^ ]", raw)
|
||||||
return ok && err == nil
|
return ok && err == nil
|
||||||
@@ -68,15 +97,17 @@ func OBIMimeTypeGuesser(stream io.Reader) (*mimetype.MIME, io.Reader, error) {
|
|||||||
mimetype.Lookup("text/plain").Extend(ecoPCR2Detector, "text/ecopcr2", ".ecopcr")
|
mimetype.Lookup("text/plain").Extend(ecoPCR2Detector, "text/ecopcr2", ".ecopcr")
|
||||||
mimetype.Lookup("text/plain").Extend(genbankDetector, "text/genbank", ".seq")
|
mimetype.Lookup("text/plain").Extend(genbankDetector, "text/genbank", ".seq")
|
||||||
mimetype.Lookup("text/plain").Extend(emblDetector, "text/embl", ".dat")
|
mimetype.Lookup("text/plain").Extend(emblDetector, "text/embl", ".dat")
|
||||||
|
mimetype.Lookup("text/plain").Extend(csv, "text/csv", ".csv")
|
||||||
|
|
||||||
mimetype.Lookup("application/octet-stream").Extend(fastaDetector, "text/fasta", ".fasta")
|
mimetype.Lookup("application/octet-stream").Extend(fastaDetector, "text/fasta", ".fasta")
|
||||||
mimetype.Lookup("application/octet-stream").Extend(fastqDetector, "text/fastq", ".fastq")
|
mimetype.Lookup("application/octet-stream").Extend(fastqDetector, "text/fastq", ".fastq")
|
||||||
mimetype.Lookup("application/octet-stream").Extend(ecoPCR2Detector, "text/ecopcr2", ".ecopcr")
|
mimetype.Lookup("application/octet-stream").Extend(ecoPCR2Detector, "text/ecopcr2", ".ecopcr")
|
||||||
mimetype.Lookup("application/octet-stream").Extend(genbankDetector, "text/genbank", ".seq")
|
mimetype.Lookup("application/octet-stream").Extend(genbankDetector, "text/genbank", ".seq")
|
||||||
mimetype.Lookup("application/octet-stream").Extend(emblDetector, "text/embl", ".dat")
|
mimetype.Lookup("application/octet-stream").Extend(emblDetector, "text/embl", ".dat")
|
||||||
|
mimetype.Lookup("application/octet-stream").Extend(csv, "text/csv", ".csv")
|
||||||
|
|
||||||
// Create a buffer to store the read data
|
// Create a buffer to store the read data
|
||||||
buf := make([]byte, 1024*128)
|
buf := make([]byte, 1024*1024)
|
||||||
n, err := io.ReadFull(stream, buf)
|
n, err := io.ReadFull(stream, buf)
|
||||||
|
|
||||||
if err != nil && err != io.ErrUnexpectedEOF {
|
if err != nil && err != io.ErrUnexpectedEOF {
|
||||||
@@ -85,6 +116,7 @@ func OBIMimeTypeGuesser(stream io.Reader) (*mimetype.MIME, io.Reader, error) {
|
|||||||
|
|
||||||
// Detect the MIME type using the mimetype library
|
// Detect the MIME type using the mimetype library
|
||||||
mimeType := mimetype.Detect(buf)
|
mimeType := mimetype.Detect(buf)
|
||||||
|
|
||||||
if mimeType == nil {
|
if mimeType == nil {
|
||||||
return nil, nil, err
|
return nil, nil, err
|
||||||
}
|
}
|
||||||
@@ -140,15 +172,15 @@ func OBIMimeTypeGuesser(stream io.Reader) (*mimetype.MIME, io.Reader, error) {
|
|||||||
// - error: An error if any occurred during the reading process.
|
// - error: An error if any occurred during the reading process.
|
||||||
func ReadSequencesFromFile(filename string,
|
func ReadSequencesFromFile(filename string,
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
var file *Reader
|
var file *obiutils.Reader
|
||||||
var reader io.Reader
|
var reader io.Reader
|
||||||
var err error
|
var err error
|
||||||
|
|
||||||
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
options = append(options, OptionsSource(obiutils.RemoveAllExt((path.Base(filename)))))
|
||||||
|
|
||||||
file, err = Ropen(filename)
|
file, err = obiutils.Ropen(filename)
|
||||||
|
|
||||||
if err == ErrNoContent {
|
if err == obiutils.ErrNoContent {
|
||||||
log.Infof("file %s is empty", filename)
|
log.Infof("file %s is empty", filename)
|
||||||
return ReadEmptyFile(options...)
|
return ReadEmptyFile(options...)
|
||||||
}
|
}
|
||||||
@@ -172,11 +204,11 @@ func ReadSequencesFromFile(filename string,
|
|||||||
case "text/fasta":
|
case "text/fasta":
|
||||||
return ReadFasta(reader, options...)
|
return ReadFasta(reader, options...)
|
||||||
case "text/ecopcr2":
|
case "text/ecopcr2":
|
||||||
return ReadEcoPCR(reader, options...), nil
|
return ReadEcoPCR(reader, options...)
|
||||||
case "text/embl":
|
case "text/embl":
|
||||||
return ReadEMBL(reader, options...), nil
|
return ReadEMBL(reader, options...)
|
||||||
case "text/genbank":
|
case "text/genbank":
|
||||||
return ReadGenbank(reader, options...), nil
|
return ReadGenbank(reader, options...)
|
||||||
case "text/csv":
|
case "text/csv":
|
||||||
return ReadCSV(reader, options...)
|
return ReadCSV(reader, options...)
|
||||||
default:
|
default:
|
||||||
|
|||||||
@@ -45,7 +45,8 @@ func WriteSequence(iterator obiiter.IBioSequence,
|
|||||||
|
|
||||||
func WriteSequencesToStdout(iterator obiiter.IBioSequence,
|
func WriteSequencesToStdout(iterator obiiter.IBioSequence,
|
||||||
options ...WithOption) (obiiter.IBioSequence, error) {
|
options ...WithOption) (obiiter.IBioSequence, error) {
|
||||||
options = append(options, OptionDontCloseFile())
|
// options = append(options, OptionDontCloseFile())
|
||||||
|
options = append(options, OptionCloseFile())
|
||||||
return WriteSequence(iterator, os.Stdout, options...)
|
return WriteSequence(iterator, os.Stdout, options...)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
289
pkg/obifp/uint128.go
Normal file
289
pkg/obifp/uint128.go
Normal file
@@ -0,0 +1,289 @@
|
|||||||
|
package obifp
|
||||||
|
|
||||||
|
import (
|
||||||
|
"math"
|
||||||
|
"math/bits"
|
||||||
|
|
||||||
|
log "github.com/sirupsen/logrus"
|
||||||
|
)
|
||||||
|
|
||||||
|
type Uint128 struct {
|
||||||
|
w1 uint64
|
||||||
|
w0 uint64
|
||||||
|
}
|
||||||
|
|
||||||
|
// Zero returns a zero-valued uint128.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint128 value.
|
||||||
|
func (u Uint128) Zero() Uint128 {
|
||||||
|
return Uint128{w1: 0, w0: 0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// MaxValue returns the maximum possible value for a Uint128.
|
||||||
|
//
|
||||||
|
// It returns a Uint128 value with the highest possible values for high and low fields.
|
||||||
|
func (u Uint128) MaxValue() Uint128 {
|
||||||
|
return Uint128{
|
||||||
|
w1: math.MaxUint64,
|
||||||
|
w0: math.MaxUint64,
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsZero checks if the Uint128 value is zero.
|
||||||
|
//
|
||||||
|
// It returns a boolean indicating whether the Uint128 value is zero.
|
||||||
|
func (u Uint128) IsZero() bool {
|
||||||
|
return u.w0 == 0 && u.w1 == 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cast a Uint128 to a Uint64.
|
||||||
|
//
|
||||||
|
// A Warning will be logged if an overflow occurs.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint64 value.
|
||||||
|
func (u Uint128) Uint64() Uint64 {
|
||||||
|
if u.w1 != 0 {
|
||||||
|
log.Warnf("Uint128 overflow at Uint64(%v)", u)
|
||||||
|
}
|
||||||
|
return Uint64{w0: u.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Uint128 cast a Uint128 to a Uint128.
|
||||||
|
//
|
||||||
|
// Which is a no-op.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint128 value.
|
||||||
|
func (u Uint128) Uint128() Uint128 {
|
||||||
|
return u
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cast a Uint128 to a Uint256.
|
||||||
|
//
|
||||||
|
// A Warning will be logged if an overflow occurs.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint256 value.
|
||||||
|
func (u Uint128) Uint256() Uint256 {
|
||||||
|
return Uint256{0, 0, u.w1, u.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Set64(v uint64) Uint128 {
|
||||||
|
|
||||||
|
return Uint128{
|
||||||
|
w1: 0,
|
||||||
|
w0: v,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// LeftShift performs a left shift operation on the Uint128 value by the specified number of bits.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - n: the number of bits to shift the Uint128 value to the left.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - Uint128: the result of the left shift operation.
|
||||||
|
func (u Uint128) LeftShift(n uint) Uint128 {
|
||||||
|
lo, carry := Uint64{w0: u.w0}.LeftShift64(n, 0)
|
||||||
|
hi, _ := Uint64{w0: u.w1}.LeftShift64(n, carry)
|
||||||
|
return Uint128{w1: hi, w0: lo}
|
||||||
|
}
|
||||||
|
|
||||||
|
// RightShift performs a right shift operation on the Uint128 value by the specified number of bits.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - n: the number of bits to shift the Uint128 value to the right.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - Uint128: the result of the right shift operation.
|
||||||
|
func (u Uint128) RightShift(n uint) Uint128 {
|
||||||
|
hi, carry := Uint64{w0: u.w1}.RightShift64(n, 0)
|
||||||
|
lo, _ := Uint64{w0: u.w0}.RightShift64(n, carry)
|
||||||
|
return Uint128{w1: hi, w0: lo}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add performs addition of two Uint128 values and returns the result.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - v: the Uint128 value to add to the receiver.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - Uint128: the result of the addition.
|
||||||
|
func (u Uint128) Add(v Uint128) Uint128 {
|
||||||
|
lo, carry := bits.Add64(u.w0, v.w0, 0)
|
||||||
|
hi, carry := bits.Add64(u.w1, v.w1, carry)
|
||||||
|
if carry != 0 {
|
||||||
|
log.Panicf("Uint128 overflow at Add(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
return Uint128{w1: hi, w0: lo}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Add64(v uint64) Uint128 {
|
||||||
|
lo, carry := bits.Add64(u.w0, v, 0)
|
||||||
|
hi, carry := bits.Add64(u.w1, 0, carry)
|
||||||
|
if carry != 0 {
|
||||||
|
log.Panicf("Uint128 overflow at Add64(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
return Uint128{w1: hi, w0: lo}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Sub(v Uint128) Uint128 {
|
||||||
|
lo, borrow := bits.Sub64(u.w0, v.w0, 0)
|
||||||
|
hi, borrow := bits.Sub64(u.w1, v.w1, borrow)
|
||||||
|
if borrow != 0 {
|
||||||
|
log.Panicf("Uint128 underflow at Sub(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
return Uint128{w1: hi, w0: lo}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Mul(v Uint128) Uint128 {
|
||||||
|
hi, lo := bits.Mul64(u.w0, v.w0)
|
||||||
|
p0, p1 := bits.Mul64(u.w1, v.w0)
|
||||||
|
p2, p3 := bits.Mul64(u.w0, v.w1)
|
||||||
|
hi, c0 := bits.Add64(hi, p1, 0)
|
||||||
|
hi, c1 := bits.Add64(hi, p3, c0)
|
||||||
|
if p0 != 0 || p2 != 0 || c1 != 0 {
|
||||||
|
log.Panicf("Uint128 overflow at Mul(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
return Uint128{w1: hi, w0: lo}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Mul64(v uint64) Uint128 {
|
||||||
|
hi, lo := bits.Mul64(u.w0, v)
|
||||||
|
p0, p1 := bits.Mul64(u.w1, v)
|
||||||
|
hi, c0 := bits.Add64(hi, p1, 0)
|
||||||
|
if p0 != 0 || c0 != 0 {
|
||||||
|
log.Panicf("Uint128 overflow at Mul64(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
return Uint128{w1: hi, w0: lo}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) QuoRem(v Uint128) (q, r Uint128) {
|
||||||
|
if v.w1 == 0 {
|
||||||
|
var r64 uint64
|
||||||
|
q, r64 = u.QuoRem64(v.w0)
|
||||||
|
r = Uint128{w1: 0, w0: r64}
|
||||||
|
} else {
|
||||||
|
// generate a "trial quotient," guaranteed to be within 1 of the actual
|
||||||
|
// quotient, then adjust.
|
||||||
|
n := uint(bits.LeadingZeros64(v.w1))
|
||||||
|
v1 := v.LeftShift(n)
|
||||||
|
u1 := u.RightShift(1)
|
||||||
|
tq, _ := bits.Div64(u1.w1, u1.w0, v1.w1)
|
||||||
|
tq >>= 63 - n
|
||||||
|
if tq != 0 {
|
||||||
|
tq--
|
||||||
|
}
|
||||||
|
q = Uint128{w1: 0, w0: tq}
|
||||||
|
// calculate remainder using trial quotient, then adjust if remainder is
|
||||||
|
// greater than divisor
|
||||||
|
r = u.Sub(v.Mul64(tq))
|
||||||
|
if r.Cmp(v) >= 0 {
|
||||||
|
q = q.Add64(1)
|
||||||
|
r = r.Sub(v)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// QuoRem64 returns q = u/v and r = u%v.
|
||||||
|
func (u Uint128) QuoRem64(v uint64) (q Uint128, r uint64) {
|
||||||
|
if u.w1 < v {
|
||||||
|
q.w0, r = bits.Div64(u.w1, u.w0, v)
|
||||||
|
} else {
|
||||||
|
q.w1, r = bits.Div64(0, u.w1, v)
|
||||||
|
q.w0, r = bits.Div64(r, u.w0, v)
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Div(v Uint128) Uint128 {
|
||||||
|
q, _ := u.QuoRem(v)
|
||||||
|
return q
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Div64(v uint64) Uint128 {
|
||||||
|
q, _ := u.QuoRem64(v)
|
||||||
|
return q
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Mod(v Uint128) Uint128 {
|
||||||
|
_, r := u.QuoRem(v)
|
||||||
|
return r
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Mod64(v uint64) uint64 {
|
||||||
|
_, r := u.QuoRem64(v)
|
||||||
|
return r
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Cmp(v Uint128) int {
|
||||||
|
switch {
|
||||||
|
case u.w1 > v.w1:
|
||||||
|
return 1
|
||||||
|
case u.w1 < v.w1:
|
||||||
|
return -1
|
||||||
|
case u.w0 > v.w0:
|
||||||
|
return 1
|
||||||
|
case u.w0 < v.w0:
|
||||||
|
return -1
|
||||||
|
default:
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Cmp64(v uint64) int {
|
||||||
|
switch {
|
||||||
|
case u.w1 > 0:
|
||||||
|
return 1
|
||||||
|
case u.w0 > v:
|
||||||
|
return 1
|
||||||
|
case u.w0 < v:
|
||||||
|
return -1
|
||||||
|
default:
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Equals(v Uint128) bool {
|
||||||
|
return u.Cmp(v) == 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) LessThan(v Uint128) bool {
|
||||||
|
return u.Cmp(v) < 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) GreaterThan(v Uint128) bool {
|
||||||
|
return u.Cmp(v) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) LessThanOrEqual(v Uint128) bool {
|
||||||
|
return !u.GreaterThan(v)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) GreaterThanOrEqual(v Uint128) bool {
|
||||||
|
return !u.LessThan(v)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) And(v Uint128) Uint128 {
|
||||||
|
return Uint128{w1: u.w1 & v.w1, w0: u.w0 & v.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Or(v Uint128) Uint128 {
|
||||||
|
return Uint128{w1: u.w1 | v.w1, w0: u.w0 | v.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Xor(v Uint128) Uint128 {
|
||||||
|
return Uint128{w1: u.w1 ^ v.w1, w0: u.w0 ^ v.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) Not() Uint128 {
|
||||||
|
return Uint128{w1: ^u.w1, w0: ^u.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint128) AsUint64() uint64 {
|
||||||
|
return u.w0
|
||||||
|
}
|
||||||
250
pkg/obifp/uint128_test.go
Normal file
250
pkg/obifp/uint128_test.go
Normal file
@@ -0,0 +1,250 @@
|
|||||||
|
package obifp
|
||||||
|
|
||||||
|
import (
|
||||||
|
"math"
|
||||||
|
"reflect"
|
||||||
|
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestUint128_Add(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
w := u.Add(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 4, w0: 6}, w)
|
||||||
|
|
||||||
|
u = Uint128{w1: 0, w0: 0}
|
||||||
|
v = Uint128{w1: 0, w0: 0}
|
||||||
|
w = u.Add(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 0}, w)
|
||||||
|
|
||||||
|
u = Uint128{w1: 0, w0: math.MaxUint64}
|
||||||
|
v = Uint128{w1: 0, w0: 1}
|
||||||
|
w = u.Add(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 1, w0: 0}, w)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Add64(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := uint64(3)
|
||||||
|
w := u.Add64(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 1, w0: 5}, w)
|
||||||
|
|
||||||
|
u = Uint128{w1: 0, w0: 0}
|
||||||
|
v = uint64(0)
|
||||||
|
w = u.Add64(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 0}, w)
|
||||||
|
|
||||||
|
u = Uint128{w1: 0, w0: math.MaxUint64}
|
||||||
|
v = uint64(1)
|
||||||
|
w = u.Add64(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 1, w0: 0}, w)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Sub(t *testing.T) {
|
||||||
|
u := Uint128{w1: 4, w0: 6}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
w := u.Sub(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 1, w0: 2}, w)
|
||||||
|
|
||||||
|
u = Uint128{w1: 0, w0: 0}
|
||||||
|
v = Uint128{w1: 0, w0: 0}
|
||||||
|
w = u.Sub(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 0}, w)
|
||||||
|
|
||||||
|
u = Uint128{w1: 1, w0: 0}
|
||||||
|
v = Uint128{w1: 0, w0: 1}
|
||||||
|
w = u.Sub(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: math.MaxUint64}, w)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Mul64(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := uint64(3)
|
||||||
|
w := u.Mul64(v)
|
||||||
|
|
||||||
|
if w.w1 != 3 || w.w0 != 6 {
|
||||||
|
t.Errorf("Mul64(%v, %v) = %v, want %v", u, v, w, Uint128{w1: 3, w0: 6})
|
||||||
|
}
|
||||||
|
|
||||||
|
u = Uint128{w1: 0, w0: 0}
|
||||||
|
v = uint64(0)
|
||||||
|
w = u.Mul64(v)
|
||||||
|
|
||||||
|
if w.w1 != 0 || w.w0 != 0 {
|
||||||
|
t.Errorf("Mul64(%v, %v) = %v, want %v", u, v, w, Uint128{w1: 0, w0: 0})
|
||||||
|
}
|
||||||
|
|
||||||
|
u = Uint128{w1: 0, w0: math.MaxUint64}
|
||||||
|
v = uint64(2)
|
||||||
|
w = u.Mul64(v)
|
||||||
|
|
||||||
|
if w.w1 != 1 || w.w0 != 18446744073709551614 {
|
||||||
|
t.Errorf("Mul64(%v, %v) = %v, want %v", u, v, w, Uint128{w1: 1, w0: 18446744073709551614})
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Mul(t *testing.T) {
|
||||||
|
tests := []struct {
|
||||||
|
name string
|
||||||
|
u Uint128
|
||||||
|
v Uint128
|
||||||
|
expected Uint128
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
name: "simple multiplication",
|
||||||
|
u: Uint128{w1: 1, w0: 2},
|
||||||
|
v: Uint128{w1: 3, w0: 4},
|
||||||
|
expected: Uint128{w1: 10, w0: 8},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "multiplication with overflow",
|
||||||
|
u: Uint128{w1: 0, w0: math.MaxUint64},
|
||||||
|
v: Uint128{w1: 0, w0: 2},
|
||||||
|
expected: Uint128{w1: 1, w0: 18446744073709551614},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "multiplication with zero",
|
||||||
|
u: Uint128{w1: 0, w0: 0},
|
||||||
|
v: Uint128{w1: 0, w0: 0},
|
||||||
|
expected: Uint128{w1: 0, w0: 0},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
name: "multiplication with large numbers",
|
||||||
|
u: Uint128{w1: 100, w0: 200},
|
||||||
|
v: Uint128{w1: 300, w0: 400},
|
||||||
|
expected: Uint128{w1: 100000, w0: 80000},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tt := range tests {
|
||||||
|
t.Run(tt.name, func(t *testing.T) {
|
||||||
|
actual := tt.u.Mul(tt.v)
|
||||||
|
if !reflect.DeepEqual(actual, tt.expected) {
|
||||||
|
t.Errorf("Mul(%v, %v) = %v, want %v", tt.u, tt.v, actual, tt.expected)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_QuoRem(t *testing.T) {
|
||||||
|
u := Uint128{w1: 3, w0: 8}
|
||||||
|
v := Uint128{w1: 0, w0: 4}
|
||||||
|
q, r := u.QuoRem(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 2}, q)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 0}, r)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_QuoRem64(t *testing.T) {
|
||||||
|
u := Uint128{w1: 0, w0: 6}
|
||||||
|
v := uint64(3)
|
||||||
|
q, r := u.QuoRem64(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 2}, q)
|
||||||
|
assert.Equal(t, uint64(0), r)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Div(t *testing.T) {
|
||||||
|
u := Uint128{w1: 3, w0: 8}
|
||||||
|
v := Uint128{w1: 0, w0: 4}
|
||||||
|
q := u.Div(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 2}, q)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Div64(t *testing.T) {
|
||||||
|
u := Uint128{w1: 0, w0: 6}
|
||||||
|
v := uint64(3)
|
||||||
|
q := u.Div64(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 2}, q)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Mod(t *testing.T) {
|
||||||
|
u := Uint128{w1: 3, w0: 8}
|
||||||
|
v := Uint128{w1: 0, w0: 4}
|
||||||
|
r := u.Mod(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 0, w0: 0}, r)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Mod64(t *testing.T) {
|
||||||
|
u := Uint128{w1: 0, w0: 6}
|
||||||
|
v := uint64(3)
|
||||||
|
r := u.Mod64(v)
|
||||||
|
assert.Equal(t, uint64(0), r)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Cmp(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
assert.Equal(t, -1, u.Cmp(v))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Cmp64(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := uint64(3)
|
||||||
|
assert.Equal(t, -1, u.Cmp64(v))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Equals(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 1, w0: 2}
|
||||||
|
assert.Equal(t, true, u.Equals(v))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_LessThan(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
assert.Equal(t, true, u.LessThan(v))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_GreaterThan(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
assert.Equal(t, false, u.GreaterThan(v))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_LessThanOrEqual(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
assert.Equal(t, true, u.LessThanOrEqual(v))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_GreaterThanOrEqual(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
assert.Equal(t, false, u.GreaterThanOrEqual(v))
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_And(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
w := u.And(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 1, w0: 0}, w)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Or(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
w := u.Or(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 3, w0: 6}, w)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Xor(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
v := Uint128{w1: 3, w0: 4}
|
||||||
|
w := u.Xor(v)
|
||||||
|
assert.Equal(t, Uint128{w1: 2, w0: 6}, w)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_Not(t *testing.T) {
|
||||||
|
u := Uint128{w1: 1, w0: 2}
|
||||||
|
w := u.Not()
|
||||||
|
assert.Equal(t, Uint128{w1: math.MaxUint64 - 1, w0: math.MaxUint64 - 2}, w)
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestUint128_AsUint64(t *testing.T) {
|
||||||
|
u := Uint128{w1: 0, w0: 5}
|
||||||
|
v := u.AsUint64()
|
||||||
|
assert.Equal(t, uint64(5), v)
|
||||||
|
}
|
||||||
307
pkg/obifp/uint256.go
Normal file
307
pkg/obifp/uint256.go
Normal file
@@ -0,0 +1,307 @@
|
|||||||
|
package obifp
|
||||||
|
|
||||||
|
import (
|
||||||
|
"math"
|
||||||
|
"math/bits"
|
||||||
|
|
||||||
|
log "github.com/sirupsen/logrus"
|
||||||
|
)
|
||||||
|
|
||||||
|
type Uint256 struct {
|
||||||
|
w3 uint64
|
||||||
|
w2 uint64
|
||||||
|
w1 uint64
|
||||||
|
w0 uint64
|
||||||
|
}
|
||||||
|
|
||||||
|
// Zero returns a zero value of type Uint256.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint256 value of 0.
|
||||||
|
func (u Uint256) Zero() Uint256 {
|
||||||
|
return Uint256{}
|
||||||
|
}
|
||||||
|
|
||||||
|
// MaxValue returns the maximum possible value of type Uint256.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns the maximum value of type Uint256.
|
||||||
|
func (u Uint256) MaxValue() Uint256 {
|
||||||
|
return Uint256{
|
||||||
|
w3: math.MaxUint64,
|
||||||
|
w2: math.MaxUint64,
|
||||||
|
w1: math.MaxUint64,
|
||||||
|
w0: math.MaxUint64,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsZero checks if the Uint256 value is zero.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a boolean indicating if the value is zero.
|
||||||
|
func (u Uint256) IsZero() bool {
|
||||||
|
return u == Uint256{}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cast a Uint256 to a Uint64.
|
||||||
|
//
|
||||||
|
// A Warning will be logged if an overflow occurs.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint64 value.
|
||||||
|
func (u Uint256) Uint64() Uint64 {
|
||||||
|
if u.w3 != 0 || u.w2 != 0 || u.w1 != 0 {
|
||||||
|
log.Warnf("Uint256 overflow at Uint64(%v)", u)
|
||||||
|
}
|
||||||
|
return Uint64{w0: u.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cast a Uint256 to a Uint128.
|
||||||
|
//
|
||||||
|
// A Warning will be logged if an overflow occurs.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint128 value.
|
||||||
|
func (u Uint256) Uint128() Uint128 {
|
||||||
|
if u.w3 != 0 || u.w2 != 0 {
|
||||||
|
log.Warnf("Uint256 overflow at Uint128(%v)", u)
|
||||||
|
}
|
||||||
|
return Uint128{u.w1, u.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cast a Uint128 to a Uint256.
|
||||||
|
//
|
||||||
|
// A Warning will be logged if an overflow occurs.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint256 value.
|
||||||
|
func (u Uint256) Uint256() Uint256 {
|
||||||
|
return u
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) Set64(v uint64) Uint256 {
|
||||||
|
|
||||||
|
return Uint256{
|
||||||
|
w3: 0,
|
||||||
|
w2: 0,
|
||||||
|
w1: 0,
|
||||||
|
w0: v,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) LeftShift(n uint) Uint256 {
|
||||||
|
w0, carry := Uint64{w0: u.w0}.LeftShift64(n, 0)
|
||||||
|
w1, carry := Uint64{w0: u.w1}.LeftShift64(n, carry)
|
||||||
|
w2, carry := Uint64{w0: u.w2}.LeftShift64(n, carry)
|
||||||
|
w3, _ := Uint64{w0: u.w3}.LeftShift64(n, carry)
|
||||||
|
return Uint256{w3, w2, w1, w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) RightShift(n uint) Uint256 {
|
||||||
|
w3, carry := Uint64{w0: u.w3}.RightShift64(n, 0)
|
||||||
|
w2, carry := Uint64{w0: u.w2}.RightShift64(n, carry)
|
||||||
|
w1, carry := Uint64{w0: u.w1}.RightShift64(n, carry)
|
||||||
|
w0, _ := Uint64{w0: u.w0}.RightShift64(n, carry)
|
||||||
|
return Uint256{w3, w2, w1, w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) Cmp(v Uint256) int {
|
||||||
|
switch {
|
||||||
|
case u.w3 > v.w3:
|
||||||
|
return 1
|
||||||
|
case u.w3 < v.w3:
|
||||||
|
return -1
|
||||||
|
case u.w2 > v.w2:
|
||||||
|
return 1
|
||||||
|
case u.w2 < v.w2:
|
||||||
|
return -1
|
||||||
|
case u.w1 > v.w1:
|
||||||
|
return 1
|
||||||
|
case u.w1 < v.w1:
|
||||||
|
return -1
|
||||||
|
case u.w0 > v.w0:
|
||||||
|
return 1
|
||||||
|
case u.w0 < v.w0:
|
||||||
|
return -1
|
||||||
|
default:
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add performs addition of two Uint256 values and returns the result.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - v: the Uint256 value to add to the receiver.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - Uint256: the result of the addition.
|
||||||
|
func (u Uint256) Add(v Uint256) Uint256 {
|
||||||
|
w0, carry := bits.Add64(u.w0, v.w0, 0)
|
||||||
|
w1, carry := bits.Add64(u.w1, v.w1, carry)
|
||||||
|
w2, carry := bits.Add64(u.w2, v.w2, carry)
|
||||||
|
w3, carry := bits.Add64(u.w3, v.w3, carry)
|
||||||
|
if carry != 0 {
|
||||||
|
log.Panicf("Uint256 overflow at Add(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
return Uint256{w3, w2, w1, w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sub performs subtraction of two Uint256 values and returns the result.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - v: the Uint256 value to subtract from the receiver.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - Uint256: the result of the subtraction.
|
||||||
|
func (u Uint256) Sub(v Uint256) Uint256 {
|
||||||
|
w0, borrow := bits.Sub64(u.w0, v.w0, 0)
|
||||||
|
w1, borrow := bits.Sub64(u.w1, v.w1, borrow)
|
||||||
|
w2, borrow := bits.Sub64(u.w2, v.w2, borrow)
|
||||||
|
w3, borrow := bits.Sub64(u.w3, v.w3, borrow)
|
||||||
|
if borrow != 0 {
|
||||||
|
log.Panicf("Uint256 overflow at Sub(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
return Uint256{w3, w2, w1, w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Mul performs multiplication of two Uint256 values and returns the result.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - v: the Uint256 value to multiply with the receiver.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - Uint256: the result of the multiplication.
|
||||||
|
func (u Uint256) Mul(v Uint256) Uint256 {
|
||||||
|
var w0, w1, w2, w3, carry uint64
|
||||||
|
|
||||||
|
w0Low, w0High := bits.Mul64(u.w0, v.w0)
|
||||||
|
w1Low1, w1High1 := bits.Mul64(u.w0, v.w1)
|
||||||
|
w1Low2, w1High2 := bits.Mul64(u.w1, v.w0)
|
||||||
|
w2Low1, w2High1 := bits.Mul64(u.w0, v.w2)
|
||||||
|
w2Low2, w2High2 := bits.Mul64(u.w1, v.w1)
|
||||||
|
w2Low3, w2High3 := bits.Mul64(u.w2, v.w0)
|
||||||
|
w3Low1, w3High1 := bits.Mul64(u.w0, v.w3)
|
||||||
|
w3Low2, w3High2 := bits.Mul64(u.w1, v.w2)
|
||||||
|
w3Low3, w3High3 := bits.Mul64(u.w2, v.w1)
|
||||||
|
w3Low4, w3High4 := bits.Mul64(u.w3, v.w0)
|
||||||
|
|
||||||
|
w0 = w0Low
|
||||||
|
|
||||||
|
w1, carry = bits.Add64(w1Low1, w1Low2, 0)
|
||||||
|
w1, _ = bits.Add64(w1, w0High, carry)
|
||||||
|
|
||||||
|
w2, carry = bits.Add64(w2Low1, w2Low2, 0)
|
||||||
|
w2, carry = bits.Add64(w2, w2Low3, carry)
|
||||||
|
w2, carry = bits.Add64(w2, w1High1, carry)
|
||||||
|
w2, _ = bits.Add64(w2, w1High2, carry)
|
||||||
|
|
||||||
|
w3, carry = bits.Add64(w3Low1, w3Low2, 0)
|
||||||
|
w3, carry = bits.Add64(w3, w3Low3, carry)
|
||||||
|
w3, carry = bits.Add64(w3, w3Low4, carry)
|
||||||
|
w3, carry = bits.Add64(w3, w2High1, carry)
|
||||||
|
w3, carry = bits.Add64(w3, w2High2, carry)
|
||||||
|
w3, carry = bits.Add64(w3, w2High3, carry)
|
||||||
|
|
||||||
|
if w3High1 != 0 || w3High2 != 0 || w3High3 != 0 || w3High4 != 0 || carry != 0 {
|
||||||
|
log.Panicf("Uint256 overflow at Mul(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
|
||||||
|
return Uint256{w3, w2, w1, w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Div performs division of two Uint256 values and returns the result.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - v: the Uint256 value to divide with the receiver.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - Uint256: the result of the division.
|
||||||
|
func (u Uint256) Div(v Uint256) Uint256 {
|
||||||
|
if v.IsZero() {
|
||||||
|
log.Panicf("division by zero")
|
||||||
|
}
|
||||||
|
|
||||||
|
if u.IsZero() || u.LessThan(v) {
|
||||||
|
return Uint256{}
|
||||||
|
}
|
||||||
|
|
||||||
|
if v.Equals(Uint256{0, 0, 0, 1}) {
|
||||||
|
return u // Division by 1
|
||||||
|
}
|
||||||
|
|
||||||
|
var q, r Uint256
|
||||||
|
r = u
|
||||||
|
|
||||||
|
for r.GreaterThanOrEqual(v) {
|
||||||
|
var t Uint256 = v
|
||||||
|
var m Uint256 = Uint256{0, 0, 0, 1}
|
||||||
|
for t.LeftShift(1).LessThanOrEqual(r) {
|
||||||
|
t = t.LeftShift(1)
|
||||||
|
m = m.LeftShift(1)
|
||||||
|
}
|
||||||
|
r = r.Sub(t)
|
||||||
|
q = q.Add(m)
|
||||||
|
}
|
||||||
|
|
||||||
|
return q
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) Equals(v Uint256) bool {
|
||||||
|
return u.Cmp(v) == 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) LessThan(v Uint256) bool {
|
||||||
|
return u.Cmp(v) < 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) GreaterThan(v Uint256) bool {
|
||||||
|
return u.Cmp(v) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) LessThanOrEqual(v Uint256) bool {
|
||||||
|
return !u.GreaterThan(v)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) GreaterThanOrEqual(v Uint256) bool {
|
||||||
|
return !u.LessThan(v)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) And(v Uint256) Uint256 {
|
||||||
|
return Uint256{
|
||||||
|
w3: u.w3 & v.w3,
|
||||||
|
w2: u.w2 & v.w2,
|
||||||
|
w1: u.w1 & v.w1,
|
||||||
|
w0: u.w0 & v.w0,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) Or(v Uint256) Uint256 {
|
||||||
|
return Uint256{
|
||||||
|
w3: u.w3 | v.w3,
|
||||||
|
w2: u.w2 | v.w2,
|
||||||
|
w1: u.w1 | v.w1,
|
||||||
|
w0: u.w0 | v.w0,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) Xor(v Uint256) Uint256 {
|
||||||
|
return Uint256{
|
||||||
|
w3: u.w3 ^ v.w3,
|
||||||
|
w2: u.w2 ^ v.w2,
|
||||||
|
w1: u.w1 ^ v.w1,
|
||||||
|
w0: u.w0 ^ v.w0,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) Not() Uint256 {
|
||||||
|
return Uint256{
|
||||||
|
w3: ^u.w3,
|
||||||
|
w2: ^u.w2,
|
||||||
|
w1: ^u.w1,
|
||||||
|
w0: ^u.w0,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint256) AsUint64() uint64 {
|
||||||
|
return u.w0
|
||||||
|
}
|
||||||
237
pkg/obifp/uint64.go
Normal file
237
pkg/obifp/uint64.go
Normal file
@@ -0,0 +1,237 @@
|
|||||||
|
package obifp
|
||||||
|
|
||||||
|
import (
|
||||||
|
"math"
|
||||||
|
"math/bits"
|
||||||
|
|
||||||
|
log "github.com/sirupsen/logrus"
|
||||||
|
)
|
||||||
|
|
||||||
|
type Uint64 struct {
|
||||||
|
w0 uint64
|
||||||
|
}
|
||||||
|
|
||||||
|
// Zero returns a zero value of type Uint64.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint64 value of 0.
|
||||||
|
func (u Uint64) Zero() Uint64 {
|
||||||
|
return Uint64{0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// MaxValue returns the maximum possible value of type Uint64.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns the maximum value of type Uint64.
|
||||||
|
func (u Uint64) MaxValue() Uint64 {
|
||||||
|
return Uint64{math.MaxUint64}
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsZero checks if the Uint64 value is zero.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a boolean indicating if the value is zero.
|
||||||
|
func (u Uint64) IsZero() bool {
|
||||||
|
return u.w0 == 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cast a Uint64 to a Uint64.
|
||||||
|
//
|
||||||
|
// Which is a no-op.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns the Uint64 value itself.
|
||||||
|
func (u Uint64) Uint64() Uint64 {
|
||||||
|
return u
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cast a Uint64 to a Uint128.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint128 value with the high field set to 0 and the low field set to the value of the Uint64.
|
||||||
|
func (u Uint64) Uint128() Uint128 {
|
||||||
|
return Uint128{w1: 0, w0: u.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cast a Uint64 to a Uint256.
|
||||||
|
//
|
||||||
|
// No parameters.
|
||||||
|
// Returns a Uint256 value with the high fields set to 0 and the low fields set to the value of the Uint64.
|
||||||
|
func (u Uint64) Uint256() Uint256 {
|
||||||
|
return Uint256{w3: 0, w2: 0, w1: 0, w0: u.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Set64(v uint64) Uint64 {
|
||||||
|
|
||||||
|
return Uint64{
|
||||||
|
w0: v,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// LeftShift64 performs a left shift operation on the Uint64 value by n bits, with carry-in from carryIn.
|
||||||
|
//
|
||||||
|
// The carry-in value is used as the first bit of the shifted value.
|
||||||
|
//
|
||||||
|
// The function returns u << n | (carryIn & ((1 << n) - 1)).
|
||||||
|
//
|
||||||
|
// This is a shift left operation, lowest bits are set with the lowest bits of
|
||||||
|
// the carry-in value instead of 0 as they would be in classical a left shift operation.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - n: the number of bits to shift by.
|
||||||
|
// - carryIn: the carry-in value.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - value: the result of the left shift operation.
|
||||||
|
// - carry: the carry-out value.
|
||||||
|
func (u Uint64) LeftShift64(n uint, carryIn uint64) (value, carry uint64) {
|
||||||
|
switch {
|
||||||
|
case n == 0:
|
||||||
|
return u.w0, 0
|
||||||
|
|
||||||
|
case n < 64:
|
||||||
|
return u.w0<<n | (carryIn & ((1 << n) - 1)), u.w0 >> (64 - n)
|
||||||
|
|
||||||
|
case n == 64:
|
||||||
|
return carryIn, u.w0
|
||||||
|
|
||||||
|
case n < 128:
|
||||||
|
return carryIn, u.w0 << (n - 64)
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Warnf("Uint64 overflow at LeftShift64(%v, %v)", u, n)
|
||||||
|
return 0, 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// RightShift64 performs a right shift operation on the Uint64 value by n bits, with carry-out to carry.
|
||||||
|
//
|
||||||
|
// The function returns the result of the right shift operation and the carry-out value.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - n: the number of bits to shift by.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - value: the result of the right shift operation.
|
||||||
|
// - carry: the carry-out value.
|
||||||
|
func (u Uint64) RightShift64(n uint, carryIn uint64) (value, carry uint64) {
|
||||||
|
switch {
|
||||||
|
case n == 0:
|
||||||
|
return u.w0, 0
|
||||||
|
|
||||||
|
case n < 64:
|
||||||
|
return u.w0>>n | (carryIn & ^((1 << (64 - n)) - 1)), u.w0 << (64 - n)
|
||||||
|
|
||||||
|
case n == 64:
|
||||||
|
return carryIn, u.w0
|
||||||
|
|
||||||
|
case n < 128:
|
||||||
|
return carryIn, u.w0 >> (n - 64)
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Warnf("Uint64 overflow at RightShift64(%v, %v)", u, n)
|
||||||
|
return 0, 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Add64(v Uint64, carryIn uint64) (value, carry uint64) {
|
||||||
|
return bits.Add64(u.w0, v.w0, uint64(carryIn))
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Sub64(v Uint64, carryIn uint64) (value, carry uint64) {
|
||||||
|
return bits.Sub64(u.w0, v.w0, uint64(carryIn))
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Mul64(v Uint64) (value, carry uint64) {
|
||||||
|
return bits.Mul64(u.w0, v.w0)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) LeftShift(n uint) Uint64 {
|
||||||
|
sl, _ := u.LeftShift64(n, 0)
|
||||||
|
return Uint64{w0: sl}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) RightShift(n uint) Uint64 {
|
||||||
|
sr, _ := u.RightShift64(n, 0)
|
||||||
|
return Uint64{w0: sr}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Add(v Uint64) Uint64 {
|
||||||
|
value, carry := u.Add64(v, 0)
|
||||||
|
|
||||||
|
if carry != 0 {
|
||||||
|
log.Panicf("Uint64 overflow at Add(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
|
||||||
|
return Uint64{w0: value}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Sub(v Uint64) Uint64 {
|
||||||
|
value, carry := u.Sub64(v, 0)
|
||||||
|
|
||||||
|
if carry != 0 {
|
||||||
|
log.Panicf("Uint64 overflow at Sub(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
|
||||||
|
return Uint64{w0: value}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Mul(v Uint64) Uint64 {
|
||||||
|
value, carry := u.Mul64(v)
|
||||||
|
|
||||||
|
if carry != 0 {
|
||||||
|
log.Panicf("Uint64 overflow at Mul(%v, %v)", u, v)
|
||||||
|
}
|
||||||
|
|
||||||
|
return Uint64{w0: value}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Cmp(v Uint64) int {
|
||||||
|
switch {
|
||||||
|
case u.w0 < v.w0:
|
||||||
|
return -1
|
||||||
|
case u.w0 > v.w0:
|
||||||
|
return 1
|
||||||
|
default:
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Equals(v Uint64) bool {
|
||||||
|
return u.Cmp(v) == 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) LessThan(v Uint64) bool {
|
||||||
|
return u.Cmp(v) < 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) GreaterThan(v Uint64) bool {
|
||||||
|
return u.Cmp(v) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) LessThanOrEqual(v Uint64) bool {
|
||||||
|
return !u.GreaterThan(v)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) GreaterThanOrEqual(v Uint64) bool {
|
||||||
|
return !u.LessThan(v)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) And(v Uint64) Uint64 {
|
||||||
|
return Uint64{w0: u.w0 & v.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Or(v Uint64) Uint64 {
|
||||||
|
return Uint64{w0: u.w0 | v.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Xor(v Uint64) Uint64 {
|
||||||
|
return Uint64{w0: u.w0 ^ v.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) Not() Uint64 {
|
||||||
|
return Uint64{w0: ^u.w0}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (u Uint64) AsUint64() uint64 {
|
||||||
|
return u.w0
|
||||||
|
}
|
||||||
41
pkg/obifp/unint.go
Normal file
41
pkg/obifp/unint.go
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
package obifp
|
||||||
|
|
||||||
|
type FPUint[T Uint64 | Uint128 | Uint256] interface {
|
||||||
|
Zero() T
|
||||||
|
Set64(v uint64) T
|
||||||
|
|
||||||
|
IsZero() bool
|
||||||
|
LeftShift(n uint) T
|
||||||
|
RightShift(n uint) T
|
||||||
|
|
||||||
|
Add(v T) T
|
||||||
|
Sub(v T) T
|
||||||
|
Mul(v T) T
|
||||||
|
//Div(v T) T
|
||||||
|
|
||||||
|
And(v T) T
|
||||||
|
Or(v T) T
|
||||||
|
Xor(v T) T
|
||||||
|
Not() T
|
||||||
|
|
||||||
|
LessThan(v T) bool
|
||||||
|
LessThanOrEqual(v T) bool
|
||||||
|
GreaterThan(v T) bool
|
||||||
|
GreaterThanOrEqual(v T) bool
|
||||||
|
|
||||||
|
AsUint64() uint64
|
||||||
|
|
||||||
|
Uint64 | Uint128 | Uint256
|
||||||
|
}
|
||||||
|
|
||||||
|
func ZeroUint[T FPUint[T]]() T {
|
||||||
|
return *new(T)
|
||||||
|
}
|
||||||
|
|
||||||
|
func OneUint[T FPUint[T]]() T {
|
||||||
|
return ZeroUint[T]().Set64(1)
|
||||||
|
}
|
||||||
|
|
||||||
|
func From64[T FPUint[T]](v uint64) T {
|
||||||
|
return ZeroUint[T]().Set64(v)
|
||||||
|
}
|
||||||
@@ -3,51 +3,108 @@ package obiiter
|
|||||||
import "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
import "git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
|
||||||
type BioSequenceBatch struct {
|
type BioSequenceBatch struct {
|
||||||
slice obiseq.BioSequenceSlice
|
source string
|
||||||
order int
|
slice obiseq.BioSequenceSlice
|
||||||
|
order int
|
||||||
}
|
}
|
||||||
|
|
||||||
var NilBioSequenceBatch = BioSequenceBatch{nil, -1}
|
var NilBioSequenceBatch = BioSequenceBatch{"", nil, -1}
|
||||||
|
|
||||||
func MakeBioSequenceBatch(order int,
|
// MakeBioSequenceBatch creates a new BioSequenceBatch with the given source, order, and sequences.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - source: The source of the BioSequenceBatch.
|
||||||
|
// - order: The order of the BioSequenceBatch.
|
||||||
|
// - sequences: The slice of BioSequence.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - BioSequenceBatch: The newly created BioSequenceBatch.
|
||||||
|
func MakeBioSequenceBatch(
|
||||||
|
source string,
|
||||||
|
order int,
|
||||||
sequences obiseq.BioSequenceSlice) BioSequenceBatch {
|
sequences obiseq.BioSequenceSlice) BioSequenceBatch {
|
||||||
|
|
||||||
return BioSequenceBatch{
|
return BioSequenceBatch{
|
||||||
slice: sequences,
|
source: source,
|
||||||
order: order,
|
slice: sequences,
|
||||||
|
order: order,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Order returns the order of the BioSequenceBatch.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - int: The order of the BioSequenceBatch.
|
||||||
func (batch BioSequenceBatch) Order() int {
|
func (batch BioSequenceBatch) Order() int {
|
||||||
return batch.order
|
return batch.order
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Source returns the source of the BioSequenceBatch.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - string: The source of the BioSequenceBatch.
|
||||||
|
func (batch BioSequenceBatch) Source() string {
|
||||||
|
return batch.source
|
||||||
|
}
|
||||||
|
|
||||||
|
// Reorder updates the order of the BioSequenceBatch and returns the updated batch.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - newOrder: The new order value to assign to the BioSequenceBatch.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - BioSequenceBatch: The updated BioSequenceBatch with the new order value.
|
||||||
func (batch BioSequenceBatch) Reorder(newOrder int) BioSequenceBatch {
|
func (batch BioSequenceBatch) Reorder(newOrder int) BioSequenceBatch {
|
||||||
batch.order = newOrder
|
batch.order = newOrder
|
||||||
return batch
|
return batch
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Slice returns the BioSequenceSlice contained within the BioSequenceBatch.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - obiseq.BioSequenceSlice: The BioSequenceSlice contained within the BioSequenceBatch.
|
||||||
func (batch BioSequenceBatch) Slice() obiseq.BioSequenceSlice {
|
func (batch BioSequenceBatch) Slice() obiseq.BioSequenceSlice {
|
||||||
return batch.slice
|
return batch.slice
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Len returns the number of BioSequence elements in the given BioSequenceBatch.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - batch: The BioSequenceBatch to get the length from.
|
||||||
|
//
|
||||||
|
// Return type:
|
||||||
|
// - int: The number of BioSequence elements in the BioSequenceBatch.
|
||||||
func (batch BioSequenceBatch) Len() int {
|
func (batch BioSequenceBatch) Len() int {
|
||||||
return len(batch.slice)
|
return len(batch.slice)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// NotEmpty returns whether the BioSequenceBatch is empty or not.
|
||||||
|
//
|
||||||
|
// It checks if the BioSequenceSlice contained within the BioSequenceBatch is not empty.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - bool: True if the BioSequenceBatch is not empty, false otherwise.
|
||||||
func (batch BioSequenceBatch) NotEmpty() bool {
|
func (batch BioSequenceBatch) NotEmpty() bool {
|
||||||
return batch.slice.NotEmpty()
|
return batch.slice.NotEmpty()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Pop0 returns and removes the first element of the BioSequenceBatch.
|
||||||
|
//
|
||||||
|
// It does not take any parameters.
|
||||||
|
// It returns a pointer to a BioSequence object.
|
||||||
func (batch BioSequenceBatch) Pop0() *obiseq.BioSequence {
|
func (batch BioSequenceBatch) Pop0() *obiseq.BioSequence {
|
||||||
return batch.slice.Pop0()
|
return batch.slice.Pop0()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// IsNil checks if the BioSequenceBatch's slice is nil.
|
||||||
|
//
|
||||||
|
// This function takes a BioSequenceBatch as a parameter and returns a boolean value indicating whether the slice of the BioSequenceBatch is nil or not.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - batch: The BioSequenceBatch to check for nil slice.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - bool: True if the BioSequenceBatch's slice is nil, false otherwise.
|
||||||
func (batch BioSequenceBatch) IsNil() bool {
|
func (batch BioSequenceBatch) IsNil() bool {
|
||||||
return batch.slice == nil
|
return batch.slice == nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (batch BioSequenceBatch) Recycle(including_seq bool) {
|
|
||||||
batch.slice.Recycle(including_seq)
|
|
||||||
batch.slice = nil
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -10,31 +10,12 @@ import (
|
|||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
"github.com/tevino/abool/v2"
|
"github.com/tevino/abool/v2"
|
||||||
)
|
)
|
||||||
|
|
||||||
var globalLocker sync.WaitGroup
|
|
||||||
var globalLockerCounter = 0
|
|
||||||
|
|
||||||
func RegisterAPipe() {
|
|
||||||
globalLocker.Add(1)
|
|
||||||
globalLockerCounter++
|
|
||||||
log.Debugln(globalLockerCounter, " Pipes are registered now")
|
|
||||||
}
|
|
||||||
|
|
||||||
func UnregisterPipe() {
|
|
||||||
globalLocker.Done()
|
|
||||||
globalLockerCounter--
|
|
||||||
log.Debugln(globalLockerCounter, "are still registered")
|
|
||||||
}
|
|
||||||
|
|
||||||
func WaitForLastPipe() {
|
|
||||||
globalLocker.Wait()
|
|
||||||
}
|
|
||||||
|
|
||||||
// Structure implementing an iterator over bioseq.BioSequenceBatch
|
// Structure implementing an iterator over bioseq.BioSequenceBatch
|
||||||
// based on a channel.
|
// based on a channel.
|
||||||
type _IBioSequence struct {
|
type _IBioSequence struct {
|
||||||
@@ -78,7 +59,7 @@ func MakeIBioSequence() IBioSequence {
|
|||||||
i.lock = &lock
|
i.lock = &lock
|
||||||
ii := IBioSequence{&i}
|
ii := IBioSequence{&i}
|
||||||
|
|
||||||
RegisterAPipe()
|
obiutils.RegisterAPipe()
|
||||||
|
|
||||||
return ii
|
return ii
|
||||||
}
|
}
|
||||||
@@ -245,7 +226,7 @@ func (iterator IBioSequence) Push(batch BioSequenceBatch) {
|
|||||||
|
|
||||||
func (iterator IBioSequence) Close() {
|
func (iterator IBioSequence) Close() {
|
||||||
close(iterator.pointer.channel)
|
close(iterator.pointer.channel)
|
||||||
UnregisterPipe()
|
obiutils.UnregisterPipe()
|
||||||
}
|
}
|
||||||
|
|
||||||
func (iterator IBioSequence) WaitAndClose() {
|
func (iterator IBioSequence) WaitAndClose() {
|
||||||
@@ -424,9 +405,11 @@ func (iterator IBioSequence) Rebatch(size int) IBioSequence {
|
|||||||
order := 0
|
order := 0
|
||||||
iterator = iterator.SortBatches()
|
iterator = iterator.SortBatches()
|
||||||
buffer := obiseq.MakeBioSequenceSlice()
|
buffer := obiseq.MakeBioSequenceSlice()
|
||||||
|
source := ""
|
||||||
|
|
||||||
for iterator.Next() {
|
for iterator.Next() {
|
||||||
seqs := iterator.Get()
|
seqs := iterator.Get()
|
||||||
|
source = seqs.Source()
|
||||||
lc := seqs.Len()
|
lc := seqs.Len()
|
||||||
remains := lc
|
remains := lc
|
||||||
i := 0
|
i := 0
|
||||||
@@ -436,18 +419,17 @@ func (iterator IBioSequence) Rebatch(size int) IBioSequence {
|
|||||||
remains = lc - to_push - i
|
remains = lc - to_push - i
|
||||||
buffer = append(buffer, seqs.Slice()[i:(i+to_push)]...)
|
buffer = append(buffer, seqs.Slice()[i:(i+to_push)]...)
|
||||||
if len(buffer) == size {
|
if len(buffer) == size {
|
||||||
newIter.Push(MakeBioSequenceBatch(order, buffer))
|
newIter.Push(MakeBioSequenceBatch(source, order, buffer))
|
||||||
log.Debugf("Rebatch #%d pushd", order)
|
log.Debugf("Rebatch #%d pushd", order)
|
||||||
order++
|
order++
|
||||||
buffer = obiseq.MakeBioSequenceSlice()
|
buffer = obiseq.MakeBioSequenceSlice()
|
||||||
}
|
}
|
||||||
i += to_push
|
i += to_push
|
||||||
}
|
}
|
||||||
seqs.Recycle(false)
|
|
||||||
}
|
}
|
||||||
log.Debug("End of the rebatch loop")
|
log.Debug("End of the rebatch loop")
|
||||||
if len(buffer) > 0 {
|
if len(buffer) > 0 {
|
||||||
newIter.Push(MakeBioSequenceBatch(order, buffer))
|
newIter.Push(MakeBioSequenceBatch(source, order, buffer))
|
||||||
log.Debugf("Final Rebatch #%d pushd", order)
|
log.Debugf("Final Rebatch #%d pushd", order)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -462,6 +444,39 @@ func (iterator IBioSequence) Rebatch(size int) IBioSequence {
|
|||||||
return newIter
|
return newIter
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (iterator IBioSequence) FilterEmpty() IBioSequence {
|
||||||
|
|
||||||
|
newIter := MakeIBioSequence()
|
||||||
|
|
||||||
|
newIter.Add(1)
|
||||||
|
|
||||||
|
go func() {
|
||||||
|
newIter.WaitAndClose()
|
||||||
|
}()
|
||||||
|
|
||||||
|
go func() {
|
||||||
|
order := 0
|
||||||
|
iterator = iterator.SortBatches()
|
||||||
|
|
||||||
|
for iterator.Next() {
|
||||||
|
seqs := iterator.Get()
|
||||||
|
lc := seqs.Len()
|
||||||
|
|
||||||
|
if lc > 0 {
|
||||||
|
newIter.Push(seqs.Reorder(order))
|
||||||
|
order++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
newIter.Done()
|
||||||
|
}()
|
||||||
|
|
||||||
|
if iterator.IsPaired() {
|
||||||
|
newIter.MarkAsPaired()
|
||||||
|
}
|
||||||
|
|
||||||
|
return newIter
|
||||||
|
}
|
||||||
func (iterator IBioSequence) Recycle() {
|
func (iterator IBioSequence) Recycle() {
|
||||||
|
|
||||||
log.Debugln("Start recycling of Bioseq objects")
|
log.Debugln("Start recycling of Bioseq objects")
|
||||||
@@ -472,7 +487,6 @@ func (iterator IBioSequence) Recycle() {
|
|||||||
o := batch.Order()
|
o := batch.Order()
|
||||||
log.Debugln("Recycling batch #", o)
|
log.Debugln("Recycling batch #", o)
|
||||||
recycled += batch.Len()
|
recycled += batch.Len()
|
||||||
batch.Recycle(true)
|
|
||||||
log.Debugln("Batch #", o, " recycled")
|
log.Debugln("Batch #", o, " recycled")
|
||||||
}
|
}
|
||||||
log.Debugf("End of the recycling of %d Bioseq objects", recycled)
|
log.Debugf("End of the recycling of %d Bioseq objects", recycled)
|
||||||
@@ -480,8 +494,7 @@ func (iterator IBioSequence) Recycle() {
|
|||||||
|
|
||||||
func (iterator IBioSequence) Consume() {
|
func (iterator IBioSequence) Consume() {
|
||||||
for iterator.Next() {
|
for iterator.Next() {
|
||||||
batch := iterator.Get()
|
iterator.Get()
|
||||||
batch.Recycle(false)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -499,7 +512,6 @@ func (iterator IBioSequence) Count(recycle bool) (int, int, int) {
|
|||||||
reads += seq.Count()
|
reads += seq.Count()
|
||||||
nucleotides += seq.Len()
|
nucleotides += seq.Len()
|
||||||
}
|
}
|
||||||
batch.Recycle(recycle)
|
|
||||||
}
|
}
|
||||||
log.Debugf("End of the counting of %d Bioseq objects", variants)
|
log.Debugf("End of the counting of %d Bioseq objects", variants)
|
||||||
return variants, reads, nucleotides
|
return variants, reads, nucleotides
|
||||||
@@ -526,12 +538,14 @@ func (iterator IBioSequence) DivideOn(predicate obiseq.SequencePredicate,
|
|||||||
trueOrder := 0
|
trueOrder := 0
|
||||||
falseOrder := 0
|
falseOrder := 0
|
||||||
iterator = iterator.SortBatches()
|
iterator = iterator.SortBatches()
|
||||||
|
source := ""
|
||||||
|
|
||||||
trueSlice := obiseq.MakeBioSequenceSlice()
|
trueSlice := obiseq.MakeBioSequenceSlice()
|
||||||
falseSlice := obiseq.MakeBioSequenceSlice()
|
falseSlice := obiseq.MakeBioSequenceSlice()
|
||||||
|
|
||||||
for iterator.Next() {
|
for iterator.Next() {
|
||||||
seqs := iterator.Get()
|
seqs := iterator.Get()
|
||||||
|
source = seqs.Source()
|
||||||
for _, s := range seqs.slice {
|
for _, s := range seqs.slice {
|
||||||
if predicate(s) {
|
if predicate(s) {
|
||||||
trueSlice = append(trueSlice, s)
|
trueSlice = append(trueSlice, s)
|
||||||
@@ -540,26 +554,25 @@ func (iterator IBioSequence) DivideOn(predicate obiseq.SequencePredicate,
|
|||||||
}
|
}
|
||||||
|
|
||||||
if len(trueSlice) == size {
|
if len(trueSlice) == size {
|
||||||
trueIter.Push(MakeBioSequenceBatch(trueOrder, trueSlice))
|
trueIter.Push(MakeBioSequenceBatch(source, trueOrder, trueSlice))
|
||||||
trueOrder++
|
trueOrder++
|
||||||
trueSlice = obiseq.MakeBioSequenceSlice()
|
trueSlice = obiseq.MakeBioSequenceSlice()
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(falseSlice) == size {
|
if len(falseSlice) == size {
|
||||||
falseIter.Push(MakeBioSequenceBatch(falseOrder, falseSlice))
|
falseIter.Push(MakeBioSequenceBatch(source, falseOrder, falseSlice))
|
||||||
falseOrder++
|
falseOrder++
|
||||||
falseSlice = obiseq.MakeBioSequenceSlice()
|
falseSlice = obiseq.MakeBioSequenceSlice()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
seqs.Recycle(false)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(trueSlice) > 0 {
|
if len(trueSlice) > 0 {
|
||||||
trueIter.Push(MakeBioSequenceBatch(trueOrder, trueSlice))
|
trueIter.Push(MakeBioSequenceBatch(source, trueOrder, trueSlice))
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(falseSlice) > 0 {
|
if len(falseSlice) > 0 {
|
||||||
falseIter.Push(MakeBioSequenceBatch(falseOrder, falseSlice))
|
falseIter.Push(MakeBioSequenceBatch(source, falseOrder, falseSlice))
|
||||||
}
|
}
|
||||||
|
|
||||||
trueIter.Done()
|
trueIter.Done()
|
||||||
@@ -578,7 +591,7 @@ func (iterator IBioSequence) DivideOn(predicate obiseq.SequencePredicate,
|
|||||||
// A function that takes a predicate and a batch of sequences and returns a filtered batch of sequences.
|
// A function that takes a predicate and a batch of sequences and returns a filtered batch of sequences.
|
||||||
func (iterator IBioSequence) FilterOn(predicate obiseq.SequencePredicate,
|
func (iterator IBioSequence) FilterOn(predicate obiseq.SequencePredicate,
|
||||||
size int, sizes ...int) IBioSequence {
|
size int, sizes ...int) IBioSequence {
|
||||||
nworkers := obioptions.CLIReadParallelWorkers()
|
nworkers := obidefault.ReadParallelWorkers()
|
||||||
|
|
||||||
if len(sizes) > 0 {
|
if len(sizes) > 0 {
|
||||||
nworkers = sizes[0]
|
nworkers = sizes[0]
|
||||||
@@ -630,7 +643,7 @@ func (iterator IBioSequence) FilterOn(predicate obiseq.SequencePredicate,
|
|||||||
|
|
||||||
func (iterator IBioSequence) FilterAnd(predicate obiseq.SequencePredicate,
|
func (iterator IBioSequence) FilterAnd(predicate obiseq.SequencePredicate,
|
||||||
size int, sizes ...int) IBioSequence {
|
size int, sizes ...int) IBioSequence {
|
||||||
nworkers := obioptions.CLIReadParallelWorkers()
|
nworkers := obidefault.ReadParallelWorkers()
|
||||||
|
|
||||||
if len(sizes) > 0 {
|
if len(sizes) > 0 {
|
||||||
nworkers = sizes[0]
|
nworkers = sizes[0]
|
||||||
@@ -686,17 +699,21 @@ func (iterator IBioSequence) FilterAnd(predicate obiseq.SequencePredicate,
|
|||||||
|
|
||||||
// Load all sequences availables from an IBioSequenceBatch iterator into
|
// Load all sequences availables from an IBioSequenceBatch iterator into
|
||||||
// a large obiseq.BioSequenceSlice.
|
// a large obiseq.BioSequenceSlice.
|
||||||
func (iterator IBioSequence) Load() obiseq.BioSequenceSlice {
|
func (iterator IBioSequence) Load() (string, obiseq.BioSequenceSlice) {
|
||||||
|
|
||||||
|
chunk := obiseq.MakeBioSequenceSlice()
|
||||||
|
source := ""
|
||||||
|
|
||||||
chunck := obiseq.MakeBioSequenceSlice()
|
|
||||||
for iterator.Next() {
|
for iterator.Next() {
|
||||||
b := iterator.Get()
|
b := iterator.Get()
|
||||||
|
if source == "" {
|
||||||
|
source = b.Source()
|
||||||
|
}
|
||||||
log.Debugf("append %d sequences", b.Len())
|
log.Debugf("append %d sequences", b.Len())
|
||||||
chunck = append(chunck, b.Slice()...)
|
chunk = append(chunk, b.Slice()...)
|
||||||
b.Recycle(false)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return chunck
|
return source, chunk
|
||||||
}
|
}
|
||||||
|
|
||||||
// CompleteFileIterator generates a new iterator for reading a complete file.
|
// CompleteFileIterator generates a new iterator for reading a complete file.
|
||||||
@@ -718,10 +735,10 @@ func (iterator IBioSequence) CompleteFileIterator() IBioSequence {
|
|||||||
}()
|
}()
|
||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
slice := iterator.Load()
|
source, slice := iterator.Load()
|
||||||
log.Printf("A batch of %d sequence is read", len(slice))
|
log.Printf("A batch of %d sequence is read", len(slice))
|
||||||
if len(slice) > 0 {
|
if len(slice) > 0 {
|
||||||
newIter.Push(MakeBioSequenceBatch(0, slice))
|
newIter.Push(MakeBioSequenceBatch(source, 0, slice))
|
||||||
}
|
}
|
||||||
newIter.Done()
|
newIter.Done()
|
||||||
}()
|
}()
|
||||||
@@ -735,7 +752,7 @@ func (iterator IBioSequence) CompleteFileIterator() IBioSequence {
|
|||||||
|
|
||||||
// It takes a slice of BioSequence objects, and returns an iterator that will return batches of
|
// It takes a slice of BioSequence objects, and returns an iterator that will return batches of
|
||||||
// BioSequence objects
|
// BioSequence objects
|
||||||
func IBatchOver(data obiseq.BioSequenceSlice,
|
func IBatchOver(source string, data obiseq.BioSequenceSlice,
|
||||||
size int, sizes ...int) IBioSequence {
|
size int, sizes ...int) IBioSequence {
|
||||||
|
|
||||||
newIter := MakeIBioSequence()
|
newIter := MakeIBioSequence()
|
||||||
@@ -755,7 +772,7 @@ func IBatchOver(data obiseq.BioSequenceSlice,
|
|||||||
if next > ldata {
|
if next > ldata {
|
||||||
next = ldata
|
next = ldata
|
||||||
}
|
}
|
||||||
newIter.Push(MakeBioSequenceBatch(batchid, data[i:next]))
|
newIter.Push(MakeBioSequenceBatch(source, batchid, data[i:next]))
|
||||||
batchid++
|
batchid++
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -4,9 +4,25 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"sync"
|
"sync"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// IDistribute represents a distribution mechanism for biosequences.
|
||||||
|
// It manages the outputs of biosequences, provides a channel for
|
||||||
|
// new data notifications, and maintains a classifier for sequence
|
||||||
|
// classification. It is designed to facilitate the distribution
|
||||||
|
// of biosequences to various processing components.
|
||||||
|
//
|
||||||
|
// Fields:
|
||||||
|
// - outputs: A map that associates integer keys with corresponding
|
||||||
|
// biosequence outputs (IBioSequence).
|
||||||
|
// - news: A channel that sends notifications of new data available
|
||||||
|
// for processing, represented by integer identifiers.
|
||||||
|
// - classifier: A pointer to a BioSequenceClassifier used to classify
|
||||||
|
// the biosequences during distribution.
|
||||||
|
// - lock: A mutex for synchronizing access to the outputs and other
|
||||||
|
// shared resources to ensure thread safety.
|
||||||
type IDistribute struct {
|
type IDistribute struct {
|
||||||
outputs map[int]IBioSequence
|
outputs map[int]IBioSequence
|
||||||
news chan int
|
news chan int
|
||||||
@@ -26,16 +42,39 @@ func (dist *IDistribute) Outputs(key int) (IBioSequence, error) {
|
|||||||
return iter, nil
|
return iter, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// News returns a channel that provides notifications of new data
|
||||||
|
// available for processing. The channel sends integer identifiers
|
||||||
|
// representing the new data.
|
||||||
func (dist *IDistribute) News() chan int {
|
func (dist *IDistribute) News() chan int {
|
||||||
return dist.news
|
return dist.news
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Classifier returns a pointer to the BioSequenceClassifier
|
||||||
|
// associated with the distribution mechanism. This classifier
|
||||||
|
// is used to classify biosequences during the distribution process.
|
||||||
func (dist *IDistribute) Classifier() *obiseq.BioSequenceClassifier {
|
func (dist *IDistribute) Classifier() *obiseq.BioSequenceClassifier {
|
||||||
return dist.classifier
|
return dist.classifier
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Distribute organizes the biosequences from the iterator into batches
|
||||||
|
// based on the provided classifier and batch sizes. It returns an
|
||||||
|
// IDistribute instance that manages the distribution of the sequences.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - class: A pointer to a BioSequenceClassifier used to classify
|
||||||
|
// the biosequences during distribution.
|
||||||
|
// - sizes: Optional integer values specifying the batch size. If
|
||||||
|
// no sizes are provided, a default batch size of 5000 is used.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// An IDistribute instance that contains the outputs of the
|
||||||
|
// classified biosequences, a channel for new data notifications,
|
||||||
|
// and the classifier used for distribution. The method operates
|
||||||
|
// asynchronously, processing the sequences in separate goroutines.
|
||||||
|
// It ensures that the outputs are closed and cleaned up once
|
||||||
|
// processing is complete.
|
||||||
func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, sizes ...int) IDistribute {
|
func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, sizes ...int) IDistribute {
|
||||||
batchsize := 5000
|
batchsize := obidefault.BatchSize()
|
||||||
|
|
||||||
outputs := make(map[int]IBioSequence, 100)
|
outputs := make(map[int]IBioSequence, 100)
|
||||||
slices := make(map[int]*obiseq.BioSequenceSlice, 100)
|
slices := make(map[int]*obiseq.BioSequenceSlice, 100)
|
||||||
@@ -61,9 +100,12 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
|
|||||||
|
|
||||||
go func() {
|
go func() {
|
||||||
iterator = iterator.SortBatches()
|
iterator = iterator.SortBatches()
|
||||||
|
source := ""
|
||||||
|
|
||||||
for iterator.Next() {
|
for iterator.Next() {
|
||||||
seqs := iterator.Get()
|
seqs := iterator.Get()
|
||||||
|
source = seqs.Source()
|
||||||
|
|
||||||
for _, s := range seqs.Slice() {
|
for _, s := range seqs.Slice() {
|
||||||
key := class.Code(s)
|
key := class.Code(s)
|
||||||
slice, ok := slices[key]
|
slice, ok := slices[key]
|
||||||
@@ -84,18 +126,17 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
|
|||||||
*slice = append(*slice, s)
|
*slice = append(*slice, s)
|
||||||
|
|
||||||
if len(*slice) == batchsize {
|
if len(*slice) == batchsize {
|
||||||
outputs[key].Push(MakeBioSequenceBatch(orders[key], *slice))
|
outputs[key].Push(MakeBioSequenceBatch(source, orders[key], *slice))
|
||||||
orders[key]++
|
orders[key]++
|
||||||
s := obiseq.MakeBioSequenceSlice()
|
s := obiseq.MakeBioSequenceSlice()
|
||||||
slices[key] = &s
|
slices[key] = &s
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
seqs.Recycle(false)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
for key, slice := range slices {
|
for key, slice := range slices {
|
||||||
if len(*slice) > 0 {
|
if len(*slice) > 0 {
|
||||||
outputs[key].Push(MakeBioSequenceBatch(orders[key], *slice))
|
outputs[key].Push(MakeBioSequenceBatch(source, orders[key], *slice))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -20,9 +20,11 @@ func IFragments(minsize, length, overlap, size, nworkers int) Pipeable {
|
|||||||
}()
|
}()
|
||||||
|
|
||||||
f := func(iterator IBioSequence, id int) {
|
f := func(iterator IBioSequence, id int) {
|
||||||
|
source := ""
|
||||||
for iterator.Next() {
|
for iterator.Next() {
|
||||||
news := obiseq.MakeBioSequenceSlice()
|
news := obiseq.MakeBioSequenceSlice()
|
||||||
sl := iterator.Get()
|
sl := iterator.Get()
|
||||||
|
source = sl.Source()
|
||||||
for _, s := range sl.Slice() {
|
for _, s := range sl.Slice() {
|
||||||
|
|
||||||
if s.Len() <= minsize {
|
if s.Len() <= minsize {
|
||||||
@@ -52,8 +54,7 @@ func IFragments(minsize, length, overlap, size, nworkers int) Pipeable {
|
|||||||
s.Recycle()
|
s.Recycle()
|
||||||
}
|
}
|
||||||
} // End of the slice loop
|
} // End of the slice loop
|
||||||
newiter.Push(MakeBioSequenceBatch(sl.Order(), news))
|
newiter.Push(MakeBioSequenceBatch(source, sl.Order(), news))
|
||||||
sl.Recycle(false)
|
|
||||||
} // End of the iterator loop
|
} // End of the iterator loop
|
||||||
|
|
||||||
// if len(news) > 0 {
|
// if len(news) > 0 {
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
package obiiter
|
package obiiter
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -9,9 +10,11 @@ func (b BioSequenceBatch) IsPaired() bool {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (b BioSequenceBatch) PairedWith() BioSequenceBatch {
|
func (b BioSequenceBatch) PairedWith() BioSequenceBatch {
|
||||||
return MakeBioSequenceBatch(b.order,
|
return MakeBioSequenceBatch(
|
||||||
*b.slice.PairedWith())
|
b.Source(),
|
||||||
|
b.order,
|
||||||
|
*b.slice.PairedWith(),
|
||||||
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
func (b *BioSequenceBatch) PairTo(p *BioSequenceBatch) {
|
func (b *BioSequenceBatch) PairTo(p *BioSequenceBatch) {
|
||||||
@@ -38,8 +41,8 @@ func (iter IBioSequence) PairTo(p IBioSequence) IBioSequence {
|
|||||||
|
|
||||||
newIter := MakeIBioSequence()
|
newIter := MakeIBioSequence()
|
||||||
|
|
||||||
iter = iter.SortBatches()
|
iter = iter.SortBatches().Rebatch(obidefault.BatchSize())
|
||||||
p = p.SortBatches()
|
p = p.SortBatches().Rebatch(obidefault.BatchSize())
|
||||||
|
|
||||||
newIter.Add(1)
|
newIter.Add(1)
|
||||||
|
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ package obiiter
|
|||||||
import (
|
import (
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -18,7 +18,7 @@ import (
|
|||||||
func (iterator IBioSequence) MakeIWorker(worker obiseq.SeqWorker,
|
func (iterator IBioSequence) MakeIWorker(worker obiseq.SeqWorker,
|
||||||
breakOnError bool,
|
breakOnError bool,
|
||||||
sizes ...int) IBioSequence {
|
sizes ...int) IBioSequence {
|
||||||
nworkers := obioptions.CLIParallelWorkers()
|
nworkers := obidefault.ParallelWorkers()
|
||||||
|
|
||||||
if len(sizes) > 0 {
|
if len(sizes) > 0 {
|
||||||
nworkers = sizes[0]
|
nworkers = sizes[0]
|
||||||
@@ -34,13 +34,13 @@ func (iterator IBioSequence) MakeIWorker(worker obiseq.SeqWorker,
|
|||||||
// Parameters:
|
// Parameters:
|
||||||
// - predicate: A function that takes a sequence and returns a boolean value indicating whether the sequence satisfies a certain condition.
|
// - predicate: A function that takes a sequence and returns a boolean value indicating whether the sequence satisfies a certain condition.
|
||||||
// - worker: A function that takes a sequence and returns a modified version of the sequence.
|
// - worker: A function that takes a sequence and returns a modified version of the sequence.
|
||||||
// - sizes: Optional. One or more integers representing the number of workers to be used for parallel processing. If not provided, the number of workers will be determined by the obioptions.CLIReadParallelWorkers() function.
|
// - sizes: Optional. One or more integers representing the number of workers to be used for parallel processing. If not provided, the number of workers will be determined by the obidefault.ReadParallelWorkers() function.
|
||||||
//
|
//
|
||||||
// Return:
|
// Return:
|
||||||
// - newIter: A new IBioSequence iterator with the modified sequences.
|
// - newIter: A new IBioSequence iterator with the modified sequences.
|
||||||
func (iterator IBioSequence) MakeIConditionalWorker(predicate obiseq.SequencePredicate,
|
func (iterator IBioSequence) MakeIConditionalWorker(predicate obiseq.SequencePredicate,
|
||||||
worker obiseq.SeqWorker, breakOnError bool, sizes ...int) IBioSequence {
|
worker obiseq.SeqWorker, breakOnError bool, sizes ...int) IBioSequence {
|
||||||
nworkers := obioptions.CLIReadParallelWorkers()
|
nworkers := obidefault.ReadParallelWorkers()
|
||||||
|
|
||||||
if len(sizes) > 0 {
|
if len(sizes) > 0 {
|
||||||
nworkers = sizes[0]
|
nworkers = sizes[0]
|
||||||
@@ -63,7 +63,7 @@ func (iterator IBioSequence) MakeIConditionalWorker(predicate obiseq.SequencePre
|
|||||||
//
|
//
|
||||||
// The function returns a new IBioSequence containing the modified slices.
|
// The function returns a new IBioSequence containing the modified slices.
|
||||||
func (iterator IBioSequence) MakeISliceWorker(worker obiseq.SeqSliceWorker, breakOnError bool, sizes ...int) IBioSequence {
|
func (iterator IBioSequence) MakeISliceWorker(worker obiseq.SeqSliceWorker, breakOnError bool, sizes ...int) IBioSequence {
|
||||||
nworkers := obioptions.CLIParallelWorkers()
|
nworkers := obidefault.ParallelWorkers()
|
||||||
|
|
||||||
if len(sizes) > 0 {
|
if len(sizes) > 0 {
|
||||||
nworkers = sizes[0]
|
nworkers = sizes[0]
|
||||||
|
|||||||
348
pkg/obiitercsv/csv.go
Normal file
348
pkg/obiitercsv/csv.go
Normal file
@@ -0,0 +1,348 @@
|
|||||||
|
package obiitercsv
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"slices"
|
||||||
|
"sync"
|
||||||
|
"sync/atomic"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
"github.com/tevino/abool/v2"
|
||||||
|
|
||||||
|
log "github.com/sirupsen/logrus"
|
||||||
|
)
|
||||||
|
|
||||||
|
type CSVHeader []string
|
||||||
|
type CSVRecord map[string]interface{}
|
||||||
|
type CSVRecordBatch struct {
|
||||||
|
source string
|
||||||
|
data []CSVRecord
|
||||||
|
order int
|
||||||
|
}
|
||||||
|
|
||||||
|
var NilCSVRecordBatch = CSVRecordBatch{"", nil, -1}
|
||||||
|
|
||||||
|
// Structure implementing an iterator over bioseq.BioSequenceBatch
|
||||||
|
// based on a channel.
|
||||||
|
type ICSVRecord struct {
|
||||||
|
channel chan CSVRecordBatch
|
||||||
|
current CSVRecordBatch
|
||||||
|
pushBack *abool.AtomicBool
|
||||||
|
all_done *sync.WaitGroup
|
||||||
|
lock *sync.RWMutex
|
||||||
|
buffer_size int32
|
||||||
|
batch_size int32
|
||||||
|
sequence_format string
|
||||||
|
finished *abool.AtomicBool
|
||||||
|
header CSVHeader
|
||||||
|
}
|
||||||
|
|
||||||
|
var NilIBioSequenceBatch = (*ICSVRecord)(nil)
|
||||||
|
|
||||||
|
func NewICSVRecord() *ICSVRecord {
|
||||||
|
|
||||||
|
i := ICSVRecord{
|
||||||
|
channel: make(chan CSVRecordBatch),
|
||||||
|
current: NilCSVRecordBatch,
|
||||||
|
pushBack: abool.New(),
|
||||||
|
batch_size: -1,
|
||||||
|
sequence_format: "",
|
||||||
|
finished: abool.New(),
|
||||||
|
header: make(CSVHeader, 0),
|
||||||
|
}
|
||||||
|
|
||||||
|
waiting := sync.WaitGroup{}
|
||||||
|
i.all_done = &waiting
|
||||||
|
lock := sync.RWMutex{}
|
||||||
|
i.lock = &lock
|
||||||
|
|
||||||
|
obiutils.RegisterAPipe()
|
||||||
|
|
||||||
|
return &i
|
||||||
|
}
|
||||||
|
|
||||||
|
func MakeCSVRecordBatch(source string, order int, data []CSVRecord) CSVRecordBatch {
|
||||||
|
return CSVRecordBatch{
|
||||||
|
source: source,
|
||||||
|
order: order,
|
||||||
|
data: data,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (batch *CSVRecordBatch) Order() int {
|
||||||
|
return batch.order
|
||||||
|
}
|
||||||
|
|
||||||
|
func (batch *CSVRecordBatch) Source() string {
|
||||||
|
return batch.source
|
||||||
|
}
|
||||||
|
|
||||||
|
func (batch *CSVRecordBatch) Slice() []CSVRecord {
|
||||||
|
return batch.data
|
||||||
|
}
|
||||||
|
|
||||||
|
// NotEmpty returns whether the BioSequenceBatch is empty or not.
|
||||||
|
//
|
||||||
|
// It checks if the BioSequenceSlice contained within the BioSequenceBatch is not empty.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - bool: True if the BioSequenceBatch is not empty, false otherwise.
|
||||||
|
func (batch *CSVRecordBatch) NotEmpty() bool {
|
||||||
|
return len(batch.data) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsNil checks if the BioSequenceBatch's slice is nil.
|
||||||
|
//
|
||||||
|
// This function takes a BioSequenceBatch as a parameter and returns a boolean value indicating whether the slice of the BioSequenceBatch is nil or not.
|
||||||
|
//
|
||||||
|
// Parameters:
|
||||||
|
// - batch: The BioSequenceBatch to check for nil slice.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - bool: True if the BioSequenceBatch's slice is nil, false otherwise.
|
||||||
|
func (batch *CSVRecordBatch) IsNil() bool {
|
||||||
|
return batch.data == nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Add(n int) {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.Add method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.all_done.Add(n)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Done() {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.Done method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.all_done.Done()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Unlock() {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.Unlock method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.lock.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Lock() {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.Lock method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.lock.Lock()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) RLock() {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.RLock method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.lock.RLock()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) RUnlock() {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.RUnlock method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.lock.RUnlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Wait() {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.Wait method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.all_done.Wait()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Channel() chan CSVRecordBatch {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.Channel method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
return iterator.channel
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) IsNil() bool {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.IsNil method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
return iterator == nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) BatchSize() int {
|
||||||
|
if iterator == nil {
|
||||||
|
log.Panic("call of ICSVRecord.BatchSize method on NilIBioSequenceBatch")
|
||||||
|
}
|
||||||
|
|
||||||
|
return int(atomic.LoadInt32(&iterator.batch_size))
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) SetBatchSize(size int) error {
|
||||||
|
if size >= 0 {
|
||||||
|
atomic.StoreInt32(&iterator.batch_size, int32(size))
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return fmt.Errorf("size (%d) cannot be negative", size)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Split() *ICSVRecord {
|
||||||
|
iterator.lock.RLock()
|
||||||
|
defer iterator.lock.RUnlock()
|
||||||
|
i := ICSVRecord{
|
||||||
|
channel: iterator.channel,
|
||||||
|
current: NilCSVRecordBatch,
|
||||||
|
pushBack: abool.New(),
|
||||||
|
all_done: iterator.all_done,
|
||||||
|
buffer_size: iterator.buffer_size,
|
||||||
|
batch_size: iterator.batch_size,
|
||||||
|
sequence_format: iterator.sequence_format,
|
||||||
|
finished: iterator.finished,
|
||||||
|
header: iterator.header,
|
||||||
|
}
|
||||||
|
lock := sync.RWMutex{}
|
||||||
|
i.lock = &lock
|
||||||
|
|
||||||
|
return &i
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Header() CSVHeader {
|
||||||
|
return iterator.header
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) SetHeader(header CSVHeader) {
|
||||||
|
iterator.header = header
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) AppendField(field string) {
|
||||||
|
iterator.header.AppendField(field)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Next() bool {
|
||||||
|
if iterator.pushBack.IsSet() {
|
||||||
|
iterator.pushBack.UnSet()
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
if iterator.finished.IsSet() {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
next, ok := (<-iterator.channel)
|
||||||
|
|
||||||
|
if ok {
|
||||||
|
iterator.current = next
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.current = NilCSVRecordBatch
|
||||||
|
iterator.finished.Set()
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) PushBack() {
|
||||||
|
if !iterator.current.IsNil() {
|
||||||
|
iterator.pushBack.Set()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// The 'Get' method returns the instance of BioSequenceBatch
|
||||||
|
// currently pointed by the iterator. You have to use the
|
||||||
|
// 'Next' method to move to the next entry before calling
|
||||||
|
// 'Get' to retreive the following instance.
|
||||||
|
func (iterator *ICSVRecord) Get() CSVRecordBatch {
|
||||||
|
return iterator.current
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Push(batch CSVRecordBatch) {
|
||||||
|
if batch.IsNil() {
|
||||||
|
log.Panicln("A Nil batch is pushed on the channel")
|
||||||
|
}
|
||||||
|
// if batch.Len() == 0 {
|
||||||
|
// log.Panicln("An empty batch is pushed on the channel")
|
||||||
|
// }
|
||||||
|
|
||||||
|
iterator.channel <- batch
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Close() {
|
||||||
|
close(iterator.channel)
|
||||||
|
obiutils.UnregisterPipe()
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) WaitAndClose() {
|
||||||
|
iterator.Wait()
|
||||||
|
|
||||||
|
for len(iterator.Channel()) > 0 {
|
||||||
|
time.Sleep(time.Millisecond)
|
||||||
|
}
|
||||||
|
|
||||||
|
iterator.Close()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Finished returns 'true' value if no more data is available
|
||||||
|
// from the iterator.
|
||||||
|
func (iterator *ICSVRecord) Finished() bool {
|
||||||
|
return iterator.finished.IsSet()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sorting the batches of sequences.
|
||||||
|
func (iterator *ICSVRecord) SortBatches(sizes ...int) *ICSVRecord {
|
||||||
|
|
||||||
|
newIter := NewICSVRecord()
|
||||||
|
|
||||||
|
newIter.Add(1)
|
||||||
|
|
||||||
|
go func() {
|
||||||
|
newIter.WaitAndClose()
|
||||||
|
}()
|
||||||
|
|
||||||
|
next_to_send := 0
|
||||||
|
//log.Println("wait for batch #", next_to_send)
|
||||||
|
received := make(map[int]CSVRecordBatch)
|
||||||
|
go func() {
|
||||||
|
for iterator.Next() {
|
||||||
|
batch := iterator.Get()
|
||||||
|
// log.Println("\nPushd seq #\n", batch.order, next_to_send)
|
||||||
|
|
||||||
|
if batch.order == next_to_send {
|
||||||
|
newIter.channel <- batch
|
||||||
|
next_to_send++
|
||||||
|
//log.Println("\nwait for batch #\n", next_to_send)
|
||||||
|
batch, ok := received[next_to_send]
|
||||||
|
for ok {
|
||||||
|
newIter.channel <- batch
|
||||||
|
delete(received, next_to_send)
|
||||||
|
next_to_send++
|
||||||
|
batch, ok = received[next_to_send]
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
received[batch.order] = batch
|
||||||
|
}
|
||||||
|
}
|
||||||
|
newIter.Done()
|
||||||
|
}()
|
||||||
|
|
||||||
|
return newIter
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
func (iterator *ICSVRecord) Consume() {
|
||||||
|
for iterator.Next() {
|
||||||
|
iterator.Get()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (head *CSVHeader) AppendField(field string) {
|
||||||
|
if !slices.Contains(*head, field) {
|
||||||
|
*head = append(*head, field)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -10,6 +10,7 @@ import (
|
|||||||
"slices"
|
"slices"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obistats"
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -39,6 +40,25 @@ var iupac = map[byte][]uint64{
|
|||||||
'n': {0, 1, 2, 3},
|
'n': {0, 1, 2, 3},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
var revcompnuc = map[byte]byte{
|
||||||
|
'a': 't',
|
||||||
|
'c': 'g',
|
||||||
|
'g': 'c',
|
||||||
|
't': 'a',
|
||||||
|
'u': 'a',
|
||||||
|
'r': 'y',
|
||||||
|
'y': 'r',
|
||||||
|
's': 's',
|
||||||
|
'w': 'w',
|
||||||
|
'k': 'm',
|
||||||
|
'm': 'k',
|
||||||
|
'b': 'v',
|
||||||
|
'd': 'h',
|
||||||
|
'h': 'd',
|
||||||
|
'v': 'b',
|
||||||
|
'n': 'n',
|
||||||
|
}
|
||||||
|
|
||||||
var decode = map[uint64]byte{
|
var decode = map[uint64]byte{
|
||||||
0: 'a',
|
0: 'a',
|
||||||
1: 'c',
|
1: 'c',
|
||||||
@@ -293,10 +313,48 @@ func (g *DeBruijnGraph) LongestPath(max_length int) []uint64 {
|
|||||||
return path
|
return path
|
||||||
}
|
}
|
||||||
|
|
||||||
func (g *DeBruijnGraph) LongestConsensus(id string) (*obiseq.BioSequence, error) {
|
func (g *DeBruijnGraph) LongestConsensus(id string, min_cov float64) (*obiseq.BioSequence, error) {
|
||||||
|
if g.Len() == 0 {
|
||||||
|
return nil, fmt.Errorf("graph is empty")
|
||||||
|
}
|
||||||
//path := g.LongestPath(max_length)
|
//path := g.LongestPath(max_length)
|
||||||
path := g.HaviestPath()
|
path := g.HaviestPath()
|
||||||
s := g.DecodePath(path)
|
|
||||||
|
spath := path
|
||||||
|
|
||||||
|
if min_cov > 0 {
|
||||||
|
wp := make([]uint, len(path))
|
||||||
|
|
||||||
|
for i, n := range path {
|
||||||
|
wp[i] = g.graph[n]
|
||||||
|
}
|
||||||
|
|
||||||
|
mp := uint(float64(obistats.Mode(wp))*min_cov + 0.5)
|
||||||
|
|
||||||
|
from := 0
|
||||||
|
for i, n := range path {
|
||||||
|
if g.graph[n] < mp {
|
||||||
|
from = i + 1
|
||||||
|
} else {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
to := len(path)
|
||||||
|
|
||||||
|
for i := len(path) - 1; i >= 0; i-- {
|
||||||
|
n := path[i]
|
||||||
|
if g.graph[n] < mp {
|
||||||
|
to = i
|
||||||
|
} else {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
spath = path[from:to]
|
||||||
|
}
|
||||||
|
|
||||||
|
s := g.DecodePath(spath)
|
||||||
|
|
||||||
if len(s) > 0 {
|
if len(s) > 0 {
|
||||||
seq := obiseq.NewBioSequence(
|
seq := obiseq.NewBioSequence(
|
||||||
@@ -368,6 +426,48 @@ func (g *DeBruijnGraph) Weight(index uint64) int {
|
|||||||
return int(val)
|
return int(val)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// WeightMode returns the mode weight of the nodes in the DeBruijnGraph.
|
||||||
|
//
|
||||||
|
// It iterates over each count in the graph map and updates the max value if the current count is greater.
|
||||||
|
// Finally, it returns the mode weight as an integer.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - int: the mode weight value.
|
||||||
|
func (g *DeBruijnGraph) WeightMode() int {
|
||||||
|
dist := make(map[uint]int)
|
||||||
|
|
||||||
|
for _, w := range g.graph {
|
||||||
|
dist[w]++
|
||||||
|
}
|
||||||
|
|
||||||
|
maxi := 0
|
||||||
|
wmax := uint(0)
|
||||||
|
|
||||||
|
for k, v := range dist {
|
||||||
|
if v > maxi && k > 1 {
|
||||||
|
maxi = v
|
||||||
|
wmax = k
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return int(wmax)
|
||||||
|
}
|
||||||
|
|
||||||
|
// WeightMean returns the mean weight of the nodes in the DeBruijnGraph.
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// - float64: the mean weight of the nodes in the graph.
|
||||||
|
func (g *DeBruijnGraph) WeightMean() float64 {
|
||||||
|
sum := 0
|
||||||
|
|
||||||
|
for _, w := range g.graph {
|
||||||
|
i := int(w)
|
||||||
|
sum += i
|
||||||
|
}
|
||||||
|
|
||||||
|
return float64(sum) / float64(len(g.graph))
|
||||||
|
}
|
||||||
|
|
||||||
// append appends a sequence of nucleotides to the DeBruijnGraph.
|
// append appends a sequence of nucleotides to the DeBruijnGraph.
|
||||||
//
|
//
|
||||||
// Parameters:
|
// Parameters:
|
||||||
@@ -618,7 +718,7 @@ func (g *DeBruijnGraph) HaviestPath() []uint64 {
|
|||||||
}
|
}
|
||||||
|
|
||||||
if slices.Contains(heaviestPath, currentNode) {
|
if slices.Contains(heaviestPath, currentNode) {
|
||||||
log.Fatalf("Cycle detected %v -> %v (%v) len(%v)", heaviestPath, currentNode, startNodes, len(heaviestPath))
|
log.Panicf("Cycle detected %v -> %v (%v) len(%v), graph: %v", heaviestPath, currentNode, startNodes, len(heaviestPath), g.Len())
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -97,8 +97,19 @@ func Index4mer(seq *obiseq.BioSequence, index *[][]int, buffer *[]byte) [][]int
|
|||||||
}
|
}
|
||||||
|
|
||||||
// FastShiftFourMer runs a Fast algorithm (similar to the one used in FASTA) to compare two sequences.
|
// FastShiftFourMer runs a Fast algorithm (similar to the one used in FASTA) to compare two sequences.
|
||||||
// The returned values are two integer values. The shift between both the sequences and the count of
|
//
|
||||||
// matching 4mer when this shift is applied between both the sequences.
|
// Parameters:
|
||||||
|
// - index: A precomputed index of 4mer positions in a reference sequence.
|
||||||
|
// - shifts: A map to store the shift and count of matching 4mers.
|
||||||
|
// - lindex: The length of the indexed reference sequence.
|
||||||
|
// - seq: The sequence to be compared with the reference sequence.
|
||||||
|
// - relscore: A boolean indicating whether to calculate the relative score.
|
||||||
|
// - buffer: A byte buffer for encoding the sequence.
|
||||||
|
//
|
||||||
|
// Return type:
|
||||||
|
// - int: The shift between the two sequences with the maximum score.
|
||||||
|
// - int: The count of matching 4mers at the maximum score.
|
||||||
|
// - float64: The maximum score.
|
||||||
func FastShiftFourMer(index [][]int, shifts *map[int]int, lindex int, seq *obiseq.BioSequence, relscore bool, buffer *[]byte) (int, int, float64) {
|
func FastShiftFourMer(index [][]int, shifts *map[int]int, lindex int, seq *obiseq.BioSequence, relscore bool, buffer *[]byte) (int, int, float64) {
|
||||||
|
|
||||||
iternal_buffer := Encode4mer(seq, buffer)
|
iternal_buffer := Encode4mer(seq, buffer)
|
||||||
@@ -126,12 +137,15 @@ func FastShiftFourMer(index [][]int, shifts *map[int]int, lindex int, seq *obise
|
|||||||
score := float64(count)
|
score := float64(count)
|
||||||
if relscore {
|
if relscore {
|
||||||
over := -shift
|
over := -shift
|
||||||
if shift > 0 {
|
switch {
|
||||||
|
case shift > 0:
|
||||||
over += lindex
|
over += lindex
|
||||||
} else {
|
case shift < 0:
|
||||||
over = seq.Len() - over
|
over = seq.Len() - over
|
||||||
|
default:
|
||||||
|
over = min(lindex, seq.Len())
|
||||||
}
|
}
|
||||||
score = score / float64(over)
|
score = score / float64(over-3)
|
||||||
}
|
}
|
||||||
if score > maxscore {
|
if score > maxscore {
|
||||||
maxshift = shift
|
maxshift = shift
|
||||||
|
|||||||
297
pkg/obikmer/kmermap.go
Normal file
297
pkg/obikmer/kmermap.go
Normal file
@@ -0,0 +1,297 @@
|
|||||||
|
package obikmer
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"sort"
|
||||||
|
"unsafe"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obifp"
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
|
"github.com/schollz/progressbar/v3"
|
||||||
|
log "github.com/sirupsen/logrus"
|
||||||
|
)
|
||||||
|
|
||||||
|
type KmerMap[T obifp.FPUint[T]] struct {
|
||||||
|
index map[T][]*obiseq.BioSequence
|
||||||
|
Kmersize uint
|
||||||
|
kmermask T
|
||||||
|
|
||||||
|
leftMask T
|
||||||
|
rightMask T
|
||||||
|
sparseMask T
|
||||||
|
|
||||||
|
SparseAt int
|
||||||
|
}
|
||||||
|
|
||||||
|
type KmerMatch map[*obiseq.BioSequence]int
|
||||||
|
|
||||||
|
func (k *KmerMap[T]) KmerSize() uint {
|
||||||
|
return k.Kmersize
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMap[T]) Len() int {
|
||||||
|
return len(k.index)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMap[T]) KmerAsString(kmer T) string {
|
||||||
|
buff := make([]byte, k.Kmersize)
|
||||||
|
ks := int(k.Kmersize)
|
||||||
|
|
||||||
|
if k.SparseAt >= 0 {
|
||||||
|
ks--
|
||||||
|
}
|
||||||
|
|
||||||
|
for i, j := 0, int(k.Kmersize)-1; i < ks; i++ {
|
||||||
|
code := kmer.And(obifp.From64[T](3)).AsUint64()
|
||||||
|
buff[j] = decode[code]
|
||||||
|
j--
|
||||||
|
if k.SparseAt >= 0 && j == k.SparseAt {
|
||||||
|
buff[j] = '#'
|
||||||
|
j--
|
||||||
|
}
|
||||||
|
kmer = kmer.RightShift(2)
|
||||||
|
}
|
||||||
|
|
||||||
|
return string(buff)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMap[T]) NormalizedKmerSlice(sequence *obiseq.BioSequence, buff *[]T) []T {
|
||||||
|
|
||||||
|
if sequence.Len() < int(k.Kmersize) {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
makeSparseAt := func(kmer T) T {
|
||||||
|
|
||||||
|
if k.SparseAt == -1 {
|
||||||
|
return kmer
|
||||||
|
}
|
||||||
|
|
||||||
|
return kmer.And(k.leftMask).RightShift(2).Or(kmer.And(k.rightMask))
|
||||||
|
}
|
||||||
|
|
||||||
|
normalizedKmer := func(fw, rv T) T {
|
||||||
|
|
||||||
|
fw = makeSparseAt(fw)
|
||||||
|
rv = makeSparseAt(rv)
|
||||||
|
|
||||||
|
if fw.LessThan(rv) {
|
||||||
|
return fw
|
||||||
|
}
|
||||||
|
|
||||||
|
return rv
|
||||||
|
}
|
||||||
|
|
||||||
|
current := obifp.ZeroUint[T]()
|
||||||
|
ccurrent := obifp.ZeroUint[T]()
|
||||||
|
lshift := uint(2 * (k.Kmersize - 1))
|
||||||
|
|
||||||
|
sup := sequence.Len() - int(k.Kmersize) + 1
|
||||||
|
|
||||||
|
var rep []T
|
||||||
|
if buff == nil {
|
||||||
|
rep = make([]T, 0, sup)
|
||||||
|
} else {
|
||||||
|
rep = (*buff)[:0]
|
||||||
|
}
|
||||||
|
|
||||||
|
nuc := sequence.Sequence()
|
||||||
|
|
||||||
|
size := 0
|
||||||
|
for i := 0; i < len(nuc); i++ {
|
||||||
|
current = current.LeftShift(2)
|
||||||
|
ccurrent = ccurrent.RightShift(2)
|
||||||
|
|
||||||
|
code := iupac[nuc[i]]
|
||||||
|
ccode := iupac[revcompnuc[nuc[i]]]
|
||||||
|
|
||||||
|
if len(code) != 1 {
|
||||||
|
current = obifp.ZeroUint[T]()
|
||||||
|
ccurrent = obifp.ZeroUint[T]()
|
||||||
|
size = 0
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
current = current.Or(obifp.From64[T](uint64(code[0])))
|
||||||
|
ccurrent = ccurrent.Or(obifp.From64[T](uint64(ccode[0])).LeftShift(lshift))
|
||||||
|
|
||||||
|
size++
|
||||||
|
|
||||||
|
if size == int(k.Kmersize) {
|
||||||
|
|
||||||
|
kmer := normalizedKmer(current, ccurrent)
|
||||||
|
rep = append(rep, kmer)
|
||||||
|
size--
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
return rep
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMap[T]) Push(sequence *obiseq.BioSequence, maxocc ...int) {
|
||||||
|
maxoccurs := -1
|
||||||
|
if len(maxocc) > 0 {
|
||||||
|
maxoccurs = maxocc[0]
|
||||||
|
}
|
||||||
|
kmers := k.NormalizedKmerSlice(sequence, nil)
|
||||||
|
for _, kmer := range kmers {
|
||||||
|
seqs := k.index[kmer]
|
||||||
|
if maxoccurs == -1 || len(seqs) <= maxoccurs {
|
||||||
|
seqs = append(seqs, sequence)
|
||||||
|
k.index[kmer] = seqs
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMap[T]) Query(sequence *obiseq.BioSequence) KmerMatch {
|
||||||
|
kmers := k.NormalizedKmerSlice(sequence, nil)
|
||||||
|
seqs := make([]*obiseq.BioSequence, 0)
|
||||||
|
rep := make(KmerMatch)
|
||||||
|
|
||||||
|
for _, kmer := range kmers {
|
||||||
|
if candidates, ok := k.index[kmer]; ok {
|
||||||
|
seqs = append(seqs, candidates...)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
sort.Slice(seqs,
|
||||||
|
func(i, j int) bool {
|
||||||
|
return uintptr(unsafe.Pointer(seqs[i])) < uintptr(unsafe.Pointer(seqs[j]))
|
||||||
|
})
|
||||||
|
|
||||||
|
prevseq := (*obiseq.BioSequence)(nil)
|
||||||
|
n := 0
|
||||||
|
for _, seq := range seqs {
|
||||||
|
|
||||||
|
if seq != prevseq {
|
||||||
|
if prevseq != nil && prevseq != sequence {
|
||||||
|
rep[prevseq] = n
|
||||||
|
}
|
||||||
|
n = 1
|
||||||
|
prevseq = seq
|
||||||
|
}
|
||||||
|
|
||||||
|
n++
|
||||||
|
}
|
||||||
|
|
||||||
|
if prevseq != nil {
|
||||||
|
rep[prevseq] = n
|
||||||
|
}
|
||||||
|
|
||||||
|
return rep
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMatch) FilterMinCount(mincount int) {
|
||||||
|
for seq, count := range *k {
|
||||||
|
if count < mincount {
|
||||||
|
delete(*k, seq)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMatch) Len() int {
|
||||||
|
return len(*k)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMatch) Sequences() obiseq.BioSequenceSlice {
|
||||||
|
ks := make([]*obiseq.BioSequence, 0, len(*k))
|
||||||
|
|
||||||
|
for seq := range *k {
|
||||||
|
ks = append(ks, seq)
|
||||||
|
}
|
||||||
|
|
||||||
|
return ks
|
||||||
|
}
|
||||||
|
|
||||||
|
func (k *KmerMatch) Max() *obiseq.BioSequence {
|
||||||
|
max := 0
|
||||||
|
var maxseq *obiseq.BioSequence
|
||||||
|
for seq, n := range *k {
|
||||||
|
if max < n {
|
||||||
|
max = n
|
||||||
|
maxseq = seq
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return maxseq
|
||||||
|
}
|
||||||
|
|
||||||
|
func NewKmerMap[T obifp.FPUint[T]](
|
||||||
|
sequences obiseq.BioSequenceSlice,
|
||||||
|
kmersize uint,
|
||||||
|
sparse bool,
|
||||||
|
maxoccurs int) *KmerMap[T] {
|
||||||
|
idx := make(map[T][]*obiseq.BioSequence)
|
||||||
|
|
||||||
|
sparseAt := -1
|
||||||
|
|
||||||
|
if sparse && kmersize%2 == 0 {
|
||||||
|
log.Warnf("Kmer size must be odd when using sparse mode")
|
||||||
|
kmersize++
|
||||||
|
}
|
||||||
|
|
||||||
|
if !sparse && kmersize%2 == 1 {
|
||||||
|
log.Warnf("Kmer size must be even when not using sparse mode")
|
||||||
|
kmersize--
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
if sparse {
|
||||||
|
sparseAt = int(kmersize / 2)
|
||||||
|
}
|
||||||
|
|
||||||
|
kmermask := obifp.OneUint[T]().LeftShift(kmersize * 2).Sub(obifp.OneUint[T]())
|
||||||
|
leftMask := obifp.ZeroUint[T]()
|
||||||
|
rightMask := obifp.ZeroUint[T]()
|
||||||
|
|
||||||
|
if sparseAt >= 0 {
|
||||||
|
if sparseAt >= int(kmersize) {
|
||||||
|
sparseAt = -1
|
||||||
|
} else {
|
||||||
|
pos := kmersize - 1 - uint(sparseAt)
|
||||||
|
left := uint(sparseAt) * 2
|
||||||
|
right := pos * 2
|
||||||
|
|
||||||
|
leftMask = obifp.OneUint[T]().LeftShift(left).Sub(obifp.OneUint[T]()).LeftShift(right + 2)
|
||||||
|
rightMask = obifp.OneUint[T]().LeftShift(right).Sub(obifp.OneUint[T]())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
kmap := &KmerMap[T]{
|
||||||
|
Kmersize: kmersize,
|
||||||
|
kmermask: kmermask,
|
||||||
|
leftMask: leftMask,
|
||||||
|
rightMask: rightMask,
|
||||||
|
index: idx,
|
||||||
|
SparseAt: sparseAt,
|
||||||
|
}
|
||||||
|
|
||||||
|
n := len(sequences)
|
||||||
|
pbopt := make([]progressbar.Option, 0, 5)
|
||||||
|
pbopt = append(pbopt,
|
||||||
|
progressbar.OptionSetWriter(os.Stderr),
|
||||||
|
progressbar.OptionSetWidth(15),
|
||||||
|
progressbar.OptionShowCount(),
|
||||||
|
progressbar.OptionShowIts(),
|
||||||
|
progressbar.OptionSetDescription("Indexing kmers"),
|
||||||
|
)
|
||||||
|
|
||||||
|
bar := progressbar.NewOptions(n, pbopt...)
|
||||||
|
|
||||||
|
for i, sequence := range sequences {
|
||||||
|
kmap.Push(sequence, maxoccurs)
|
||||||
|
if i%100 == 0 {
|
||||||
|
bar.Add(100)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if maxoccurs >= 0 {
|
||||||
|
for k, s := range kmap.index {
|
||||||
|
if len(s) >= maxoccurs {
|
||||||
|
delete(kmap.index, k)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return kmap
|
||||||
|
}
|
||||||
@@ -4,9 +4,10 @@ import (
|
|||||||
"bytes"
|
"bytes"
|
||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
|
"reflect"
|
||||||
|
|
||||||
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiiter"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
lua "github.com/yuin/gopher-lua"
|
lua "github.com/yuin/gopher-lua"
|
||||||
@@ -20,6 +21,8 @@ import (
|
|||||||
func NewInterpreter() *lua.LState {
|
func NewInterpreter() *lua.LState {
|
||||||
lua := lua.NewState()
|
lua := lua.NewState()
|
||||||
|
|
||||||
|
registerMutexType(lua)
|
||||||
|
|
||||||
RegisterObilib(lua)
|
RegisterObilib(lua)
|
||||||
RegisterObiContext(lua)
|
RegisterObiContext(lua)
|
||||||
|
|
||||||
@@ -117,11 +120,17 @@ func LuaWorker(proto *lua.FunctionProto) obiseq.SeqWorker {
|
|||||||
case *obiseq.BioSequenceSlice:
|
case *obiseq.BioSequenceSlice:
|
||||||
return *val, err
|
return *val, err
|
||||||
default:
|
default:
|
||||||
return nil, fmt.Errorf("worker function doesn't return the correct type")
|
r := reflect.TypeOf(val)
|
||||||
|
return nil, fmt.Errorf("worker function doesn't return the correct type %s", r)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil, fmt.Errorf("worker function doesn't return the correct type")
|
// If worker retuns nothing then it is considered as nil biosequence
|
||||||
|
if _, ok = lreponse.(*lua.LNilType); ok {
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil, fmt.Errorf("worker function doesn't return the correct type %T", lreponse)
|
||||||
}
|
}
|
||||||
|
|
||||||
return f
|
return f
|
||||||
@@ -145,7 +154,7 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
|
|||||||
newIter := obiiter.MakeIBioSequence()
|
newIter := obiiter.MakeIBioSequence()
|
||||||
|
|
||||||
if nworkers <= 0 {
|
if nworkers <= 0 {
|
||||||
nworkers = obioptions.CLIParallelWorkers()
|
nworkers = obidefault.ParallelWorkers()
|
||||||
}
|
}
|
||||||
|
|
||||||
newIter.Add(nworkers)
|
newIter.Add(nworkers)
|
||||||
@@ -225,8 +234,7 @@ func LuaProcessor(iterator obiiter.IBioSequence, name, program string, breakOnEr
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
newIter.Push(obiiter.MakeBioSequenceBatch(seqs.Order(), ns))
|
newIter.Push(obiiter.MakeBioSequenceBatch(seqs.Source(), seqs.Order(), ns))
|
||||||
seqs.Recycle(false)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
newIter.Done()
|
newIter.Done()
|
||||||
|
|||||||
@@ -36,6 +36,8 @@ func obicontextGetSet(interpreter *lua.LState) int {
|
|||||||
__lua_obicontext.Store(key, float64(val))
|
__lua_obicontext.Store(key, float64(val))
|
||||||
case lua.LString:
|
case lua.LString:
|
||||||
__lua_obicontext.Store(key, string(val))
|
__lua_obicontext.Store(key, string(val))
|
||||||
|
case *lua.LUserData:
|
||||||
|
__lua_obicontext.Store(key, val.Value)
|
||||||
case *lua.LTable:
|
case *lua.LTable:
|
||||||
__lua_obicontext.Store(key, Table2Interface(interpreter, val))
|
__lua_obicontext.Store(key, Table2Interface(interpreter, val))
|
||||||
default:
|
default:
|
||||||
|
|||||||
@@ -1,6 +1,8 @@
|
|||||||
package obilua
|
package obilua
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"sync"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
lua "github.com/yuin/gopher-lua"
|
lua "github.com/yuin/gopher-lua"
|
||||||
@@ -44,8 +46,12 @@ func pushInterfaceToLua(L *lua.LState, val interface{}) {
|
|||||||
pushSliceNumericToLua(L, v)
|
pushSliceNumericToLua(L, v)
|
||||||
case []bool:
|
case []bool:
|
||||||
pushSliceBoolToLua(L, v)
|
pushSliceBoolToLua(L, v)
|
||||||
|
case []interface{}:
|
||||||
|
pushSliceInterfaceToLua(L, v)
|
||||||
case nil:
|
case nil:
|
||||||
L.Push(lua.LNil)
|
L.Push(lua.LNil)
|
||||||
|
case *sync.Mutex:
|
||||||
|
pushMutexToLua(L, v)
|
||||||
default:
|
default:
|
||||||
log.Fatalf("Cannot deal with value (%T) : %v", val, val)
|
log.Fatalf("Cannot deal with value (%T) : %v", val, val)
|
||||||
}
|
}
|
||||||
@@ -74,6 +80,29 @@ func pushMapStringInterfaceToLua(L *lua.LState, m map[string]interface{}) {
|
|||||||
L.Push(luaTable)
|
L.Push(luaTable)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func pushSliceInterfaceToLua(L *lua.LState, s []interface{}) {
|
||||||
|
// Create a new Lua table
|
||||||
|
luaTable := L.NewTable()
|
||||||
|
// Iterate over the Go map and set the key-value pairs in the Lua table
|
||||||
|
for _, value := range s {
|
||||||
|
switch v := value.(type) {
|
||||||
|
case int:
|
||||||
|
luaTable.Append(lua.LNumber(v))
|
||||||
|
case float64:
|
||||||
|
luaTable.Append(lua.LNumber(v))
|
||||||
|
case bool:
|
||||||
|
luaTable.Append(lua.LBool(v))
|
||||||
|
case string:
|
||||||
|
luaTable.Append(lua.LString(v))
|
||||||
|
default:
|
||||||
|
log.Fatalf("Doesn't deal with slice containing value %v of type %T", v, v)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Push the Lua table onto the stack
|
||||||
|
L.Push(luaTable)
|
||||||
|
}
|
||||||
|
|
||||||
// pushMapStringIntToLua creates a new Lua table and iterates over the Go map to set key-value pairs in the Lua table. It then pushes the Lua table onto the stack.
|
// pushMapStringIntToLua creates a new Lua table and iterates over the Go map to set key-value pairs in the Lua table. It then pushes the Lua table onto the stack.
|
||||||
//
|
//
|
||||||
// L *lua.LState - the Lua state
|
// L *lua.LState - the Lua state
|
||||||
|
|||||||
63
pkg/obilua/mutex.go
Normal file
63
pkg/obilua/mutex.go
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
package obilua
|
||||||
|
|
||||||
|
import (
|
||||||
|
lua "github.com/yuin/gopher-lua"
|
||||||
|
|
||||||
|
"sync"
|
||||||
|
)
|
||||||
|
|
||||||
|
const luaMutexTypeName = "Mutex"
|
||||||
|
|
||||||
|
func registerMutexType(luaState *lua.LState) {
|
||||||
|
mutexType := luaState.NewTypeMetatable(luaMutexTypeName)
|
||||||
|
luaState.SetGlobal(luaMutexTypeName, mutexType)
|
||||||
|
|
||||||
|
luaState.SetField(mutexType, "new", luaState.NewFunction(newMutex))
|
||||||
|
|
||||||
|
luaState.SetField(mutexType, "__index",
|
||||||
|
luaState.SetFuncs(luaState.NewTable(),
|
||||||
|
mutexMethods))
|
||||||
|
}
|
||||||
|
|
||||||
|
func mutex2Lua(interpreter *lua.LState, mutex *sync.Mutex) lua.LValue {
|
||||||
|
ud := interpreter.NewUserData()
|
||||||
|
ud.Value = mutex
|
||||||
|
interpreter.SetMetatable(ud, interpreter.GetTypeMetatable(luaMutexTypeName))
|
||||||
|
|
||||||
|
return ud
|
||||||
|
}
|
||||||
|
|
||||||
|
func pushMutexToLua(luaState *lua.LState, mutex *sync.Mutex) {
|
||||||
|
luaState.Push(mutex2Lua(luaState, mutex))
|
||||||
|
}
|
||||||
|
func newMutex(luaState *lua.LState) int {
|
||||||
|
m := &sync.Mutex{}
|
||||||
|
pushMutexToLua(luaState, m)
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
var mutexMethods = map[string]lua.LGFunction{
|
||||||
|
"lock": mutexLock,
|
||||||
|
"unlock": mutexUnlock,
|
||||||
|
}
|
||||||
|
|
||||||
|
func checkMutex(L *lua.LState) *sync.Mutex {
|
||||||
|
ud := L.CheckUserData(1)
|
||||||
|
if mutex, ok := ud.Value.(*sync.Mutex); ok {
|
||||||
|
return mutex
|
||||||
|
}
|
||||||
|
L.ArgError(1, "Mutex expected")
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func mutexLock(L *lua.LState) int {
|
||||||
|
mutex := checkMutex(L)
|
||||||
|
mutex.Lock()
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func mutexUnlock(L *lua.LState) int {
|
||||||
|
mutex := checkMutex(L)
|
||||||
|
mutex.Unlock()
|
||||||
|
return 0
|
||||||
|
}
|
||||||
@@ -47,18 +47,21 @@ func newObiSeq(luaState *lua.LState) int {
|
|||||||
}
|
}
|
||||||
|
|
||||||
var bioSequenceMethods = map[string]lua.LGFunction{
|
var bioSequenceMethods = map[string]lua.LGFunction{
|
||||||
"id": bioSequenceGetSetId,
|
"id": bioSequenceGetSetId,
|
||||||
"sequence": bioSequenceGetSetSequence,
|
"sequence": bioSequenceGetSetSequence,
|
||||||
"qualities": bioSequenceGetSetQualities,
|
"qualities": bioSequenceGetSetQualities,
|
||||||
"definition": bioSequenceGetSetDefinition,
|
"definition": bioSequenceGetSetDefinition,
|
||||||
"count": bioSequenceGetSetCount,
|
"count": bioSequenceGetSetCount,
|
||||||
"taxid": bioSequenceGetSetTaxid,
|
"taxid": bioSequenceGetSetTaxid,
|
||||||
"attribute": bioSequenceGetSetAttribute,
|
"attribute": bioSequenceGetSetAttribute,
|
||||||
"len": bioSequenceGetLength,
|
"len": bioSequenceGetLength,
|
||||||
"has_sequence": bioSequenceHasSequence,
|
"has_sequence": bioSequenceHasSequence,
|
||||||
"has_qualities": bioSequenceHasQualities,
|
"has_qualities": bioSequenceHasQualities,
|
||||||
"source": bioSequenceGetSource,
|
"source": bioSequenceGetSource,
|
||||||
"md5": bioSequenceGetMD5,
|
"md5": bioSequenceGetMD5,
|
||||||
|
"md5_string": bioSequenceGetMD5String,
|
||||||
|
"subsequence": bioSequenceGetSubsequence,
|
||||||
|
"reverse_complement": bioSequenceGetRevcomp,
|
||||||
}
|
}
|
||||||
|
|
||||||
// checkBioSequence checks if the first argument in the Lua stack is a *obiseq.BioSequence.
|
// checkBioSequence checks if the first argument in the Lua stack is a *obiseq.BioSequence.
|
||||||
@@ -147,10 +150,10 @@ func bioSequenceGetSetCount(luaState *lua.LState) int {
|
|||||||
func bioSequenceGetSetTaxid(luaState *lua.LState) int {
|
func bioSequenceGetSetTaxid(luaState *lua.LState) int {
|
||||||
s := checkBioSequence(luaState)
|
s := checkBioSequence(luaState)
|
||||||
if luaState.GetTop() == 2 {
|
if luaState.GetTop() == 2 {
|
||||||
s.SetTaxid(luaState.CheckInt(2))
|
s.SetTaxid(luaState.CheckString(2))
|
||||||
return 0
|
return 0
|
||||||
}
|
}
|
||||||
luaState.Push(lua.LNumber(s.Taxid()))
|
luaState.Push(lua.LString(s.Taxid()))
|
||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -224,3 +227,30 @@ func bioSequenceGetMD5(luaState *lua.LState) int {
|
|||||||
luaState.Push(rt)
|
luaState.Push(rt)
|
||||||
return 1
|
return 1
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func bioSequenceGetMD5String(luaState *lua.LState) int {
|
||||||
|
s := checkBioSequence(luaState)
|
||||||
|
md5 := s.MD5String()
|
||||||
|
luaState.Push(lua.LString(md5))
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
func bioSequenceGetSubsequence(luaState *lua.LState) int {
|
||||||
|
s := checkBioSequence(luaState)
|
||||||
|
start := luaState.CheckInt(2)
|
||||||
|
end := luaState.CheckInt(3)
|
||||||
|
subseq, err := s.Subsequence(start, end, false)
|
||||||
|
if err != nil {
|
||||||
|
luaState.RaiseError("%s : Error on subseq: %v", s.Id(), err)
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
luaState.Push(obiseq2Lua(luaState, subseq))
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
func bioSequenceGetRevcomp(luaState *lua.LState) int {
|
||||||
|
s := checkBioSequence(luaState)
|
||||||
|
revcomp := s.ReverseComplement(false)
|
||||||
|
luaState.Push(obiseq2Lua(luaState, revcomp))
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|||||||
@@ -2,9 +2,10 @@ package obingslibrary
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"fmt"
|
"fmt"
|
||||||
"log"
|
|
||||||
"strings"
|
"strings"
|
||||||
|
|
||||||
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiapat"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiapat"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ package obingslibrary
|
|||||||
import (
|
import (
|
||||||
"fmt"
|
"fmt"
|
||||||
"math"
|
"math"
|
||||||
"sort"
|
"slices"
|
||||||
|
|
||||||
log "github.com/sirupsen/logrus"
|
log "github.com/sirupsen/logrus"
|
||||||
|
|
||||||
@@ -87,6 +87,8 @@ func lookForTag(seq string, delimiter byte) string {
|
|||||||
|
|
||||||
i := len(seq) - 1
|
i := len(seq) - 1
|
||||||
|
|
||||||
|
// log.Warnf("Provided fragment : %s", string(seq))
|
||||||
|
|
||||||
for i >= 0 && seq[i] != delimiter {
|
for i >= 0 && seq[i] != delimiter {
|
||||||
i--
|
i--
|
||||||
}
|
}
|
||||||
@@ -107,6 +109,7 @@ func lookForTag(seq string, delimiter byte) string {
|
|||||||
return ""
|
return ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// log.Warnf("extracted : %s", string(seq[begin:end]))
|
||||||
return seq[begin:end]
|
return seq[begin:end]
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -172,11 +175,12 @@ func (marker *Marker) beginDelimitedTagExtractor(
|
|||||||
begin int,
|
begin int,
|
||||||
forward bool) string {
|
forward bool) string {
|
||||||
|
|
||||||
taglength := marker.Forward_spacer + marker.Forward_tag_length
|
// log.Warn("beginDelimitedTagExtractor")
|
||||||
|
taglength := 2*marker.Forward_spacer + marker.Forward_tag_length
|
||||||
delimiter := marker.Forward_tag_delimiter
|
delimiter := marker.Forward_tag_delimiter
|
||||||
|
|
||||||
if !forward {
|
if !forward {
|
||||||
taglength = marker.Reverse_spacer + marker.Reverse_tag_length
|
taglength = 2*marker.Reverse_spacer + marker.Reverse_tag_length
|
||||||
delimiter = marker.Reverse_tag_delimiter
|
delimiter = marker.Reverse_tag_delimiter
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -240,6 +244,7 @@ func (marker *Marker) endDelimitedTagExtractor(
|
|||||||
sequence *obiseq.BioSequence,
|
sequence *obiseq.BioSequence,
|
||||||
end int,
|
end int,
|
||||||
forward bool) string {
|
forward bool) string {
|
||||||
|
// log.Warn("endDelimitedTagExtractor")
|
||||||
|
|
||||||
taglength := marker.Reverse_spacer + marker.Reverse_tag_length
|
taglength := marker.Reverse_spacer + marker.Reverse_tag_length
|
||||||
delimiter := marker.Reverse_tag_delimiter
|
delimiter := marker.Reverse_tag_delimiter
|
||||||
@@ -335,25 +340,37 @@ func (marker *Marker) beginTagExtractor(
|
|||||||
sequence *obiseq.BioSequence,
|
sequence *obiseq.BioSequence,
|
||||||
begin int,
|
begin int,
|
||||||
forward bool) string {
|
forward bool) string {
|
||||||
|
// log.Warnf("Forward : %v -> %d %c", forward, marker.Forward_spacer, marker.Forward_tag_delimiter)
|
||||||
|
// log.Warnf("Forward : %v -> %d %c", forward, marker.Reverse_spacer, marker.Reverse_tag_delimiter)
|
||||||
if forward {
|
if forward {
|
||||||
|
if marker.Forward_tag_length == 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
if marker.Forward_tag_delimiter == 0 {
|
if marker.Forward_tag_delimiter == 0 {
|
||||||
return marker.beginFixedTagExtractor(sequence, begin, forward)
|
return marker.beginFixedTagExtractor(sequence, begin, forward)
|
||||||
} else {
|
} else {
|
||||||
if marker.Forward_tag_indels == 0 {
|
if marker.Forward_tag_indels == 0 {
|
||||||
|
// log.Warnf("Delimited tag for forward primers %s", marker.forward.String())
|
||||||
return marker.beginDelimitedTagExtractor(sequence, begin, forward)
|
return marker.beginDelimitedTagExtractor(sequence, begin, forward)
|
||||||
} else {
|
} else {
|
||||||
// log.Warn("Rescue tag for forward primers")
|
// log.Warnf("Rescue tag for forward primers %s", marker.forward.String())
|
||||||
return marker.beginRescueTagExtractor(sequence, begin, forward)
|
return marker.beginRescueTagExtractor(sequence, begin, forward)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
|
if marker.Reverse_tag_length == 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
if marker.Reverse_tag_delimiter == 0 {
|
if marker.Reverse_tag_delimiter == 0 {
|
||||||
return marker.beginFixedTagExtractor(sequence, begin, forward)
|
return marker.beginFixedTagExtractor(sequence, begin, forward)
|
||||||
} else {
|
} else {
|
||||||
if marker.Reverse_tag_indels == 0 {
|
if marker.Reverse_tag_indels == 0 {
|
||||||
|
// log.Warnf("Delimited tag for reverse/complement primers %s", marker.creverse.String())
|
||||||
return marker.beginDelimitedTagExtractor(sequence, begin, forward)
|
return marker.beginDelimitedTagExtractor(sequence, begin, forward)
|
||||||
} else {
|
} else {
|
||||||
// log.Warn("Rescue tag for reverse/complement primers")
|
// log.Warnf("Rescue tag for reverse/complement primers %s", marker.creverse.String())
|
||||||
return marker.beginRescueTagExtractor(sequence, begin, forward)
|
return marker.beginRescueTagExtractor(sequence, begin, forward)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -365,24 +382,34 @@ func (marker *Marker) endTagExtractor(
|
|||||||
end int,
|
end int,
|
||||||
forward bool) string {
|
forward bool) string {
|
||||||
if forward {
|
if forward {
|
||||||
|
if marker.Reverse_tag_length == 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
if marker.Reverse_tag_delimiter == 0 {
|
if marker.Reverse_tag_delimiter == 0 {
|
||||||
return marker.endFixedTagExtractor(sequence, end, forward)
|
return marker.endFixedTagExtractor(sequence, end, forward)
|
||||||
} else {
|
} else {
|
||||||
if marker.Reverse_tag_indels == 0 {
|
if marker.Reverse_tag_indels == 0 {
|
||||||
|
// log.Warnf("Delimited tag for reverse primers %s", marker.reverse.String())
|
||||||
return marker.endDelimitedTagExtractor(sequence, end, forward)
|
return marker.endDelimitedTagExtractor(sequence, end, forward)
|
||||||
} else {
|
} else {
|
||||||
// log.Warn("Rescue tag for reverse primers")
|
// log.Warnf("Rescue tag for reverse primers %s", marker.reverse.String())
|
||||||
return marker.endRescueTagExtractor(sequence, end, forward)
|
return marker.endRescueTagExtractor(sequence, end, forward)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
|
if marker.Forward_tag_length == 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
if marker.Forward_tag_delimiter == 0 {
|
if marker.Forward_tag_delimiter == 0 {
|
||||||
return marker.endFixedTagExtractor(sequence, end, forward)
|
return marker.endFixedTagExtractor(sequence, end, forward)
|
||||||
} else {
|
} else {
|
||||||
if marker.Forward_tag_indels == 0 {
|
if marker.Forward_tag_indels == 0 {
|
||||||
|
// log.Warnf("Delimited tag for forward/complement primers %s", marker.cforward.String())
|
||||||
return marker.endDelimitedTagExtractor(sequence, end, forward)
|
return marker.endDelimitedTagExtractor(sequence, end, forward)
|
||||||
} else {
|
} else {
|
||||||
// log.Warn("Rescue tag for forward/complement primers")
|
// log.Warnf("Rescue tag for forward/complement primers %s", marker.cforward.String())
|
||||||
return marker.endRescueTagExtractor(sequence, end, forward)
|
return marker.endRescueTagExtractor(sequence, end, forward)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -609,9 +636,7 @@ func (library *NGSLibrary) ExtractMultiBarcode(sequence *obiseq.BioSequence) (ob
|
|||||||
}
|
}
|
||||||
|
|
||||||
if len(matches) > 0 {
|
if len(matches) > 0 {
|
||||||
sort.Slice(matches, func(i, j int) bool {
|
slices.SortFunc(matches, func(a, b PrimerMatch) int { return a.Begin - b.Begin })
|
||||||
return matches[i].Begin < matches[j].Begin
|
|
||||||
})
|
|
||||||
|
|
||||||
state := 0
|
state := 0
|
||||||
var from PrimerMatch
|
var from PrimerMatch
|
||||||
|
|||||||
@@ -36,8 +36,9 @@ func MakeNGSLibrary() NGSLibrary {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func (library *NGSLibrary) GetMarker(forward, reverse string) (*Marker, bool) {
|
func (library *NGSLibrary) GetMarker(forward, reverse string) (*Marker, bool) {
|
||||||
pair := PrimerPair{strings.ToLower(forward),
|
forward = strings.ToLower(forward)
|
||||||
strings.ToLower(reverse)}
|
reverse = strings.ToLower(reverse)
|
||||||
|
pair := PrimerPair{forward, reverse}
|
||||||
marker, ok := (library.Markers)[pair]
|
marker, ok := (library.Markers)[pair]
|
||||||
|
|
||||||
if ok {
|
if ok {
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
package obingslibrary
|
package obingslibrary
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obioptions"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obidefault"
|
||||||
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiseq"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -126,8 +126,8 @@ func MakeOptions(setters []WithOption) Options {
|
|||||||
allowedMismatch: 0,
|
allowedMismatch: 0,
|
||||||
allowsIndel: false,
|
allowsIndel: false,
|
||||||
withProgressBar: false,
|
withProgressBar: false,
|
||||||
parallelWorkers: obioptions.CLIParallelWorkers(),
|
parallelWorkers: obidefault.ParallelWorkers(),
|
||||||
batchSize: obioptions.CLIBatchSize(),
|
batchSize: obidefault.BatchSize(),
|
||||||
}
|
}
|
||||||
|
|
||||||
opt := Options{&o}
|
opt := Options{&o}
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user