Last commit version

This commit is contained in:
Eric Coissac
2025-03-01 08:24:26 +01:00
parent 13cd4c86ac
commit 1e541eac4c

View File

@ -1,19 +1,29 @@
# OBITools release notes # OBITools release notes
## Latest changes ## March 2nd, 2025. Release 4.3.0
A new documentation website is available at https://obitools4.metabarcoding.org.
Its development is still in progress.
### Breaking changes ### Breaking changes
- In `obimultiplex`, the short version of the **--tag-list** option used to specify the list - In `obimultiplex`, the short version of the **--tag-list** option used to
of tags and primers to be used for the demultiplexing has been changed from `-t` to `-s`. specify the list of tags and primers to be used for the demultiplexing has
been changed from `-t` to `-s`.
- The command `obifind` is now renamed `obitaxonomy`. - The command `obifind` is now renamed `obitaxonomy`.
- The **--taxdump** option used to specify the path to the taxdump containing the NCBI taxonomy - The **--taxdump** option used to specify the path to the taxdump containing
has been renamed to **--taxonomy**. the NCBI taxonomy has been renamed to **--taxonomy**.
### Bug fixes ### Bug fixes
- Correction of a bug when using paired sequence file with the **--out** option.
- Correction of a bug in `obitag` when trying to annotate very short sequence of
4 bases or less.
- In `obipairing`, correct the stats `seq_a_single` and `seq_b_single` when - In `obipairing`, correct the stats `seq_a_single` and `seq_b_single` when
on right alignment mode on right alignment mode
@ -21,16 +31,32 @@
the batch size and not reading the qualities from the fastq files as `obiuniq` the batch size and not reading the qualities from the fastq files as `obiuniq`
is producing only fasta output without qualities. is producing only fasta output without qualities.
- In `obitag`, correct the wrong assignment of the **obitag_bestmatch**
attribute.
- In `obiclean`, the **--no-progress-bar** option disables all progress bars,
not just the data.
- Several fixes in reading FASTA and FASTQ files, including some code
simplification and factorization.
- Fixed a bug in all obitools that caused the same file to be processed
multiple times, when specifying a directory name as input.
### New features ### New features
- `obigrep` add a new **--valid-taxid** option to keep only sequence with a
valid taxid
- `obiclean` add a new **--min-sample-count** option with a default value of 1, - `obiclean` add a new **--min-sample-count** option with a default value of 1,
asking to filter out sequences which are not occurring in at least the asking to filter out sequences which are not occurring in at least the
specified number of samples. specified number of samples.
- `obitoaxonomy` a new **--dump|D** option allows for dumping a sub-taxonomy. - `obitoaxonomy` a new **--dump|D** option allows for dumping a sub-taxonomy.
- Taxonomy dump can now be provided as a four-columns CSV file to the **--taxonomy** - Taxonomy dump can now be provided as a four-columns CSV file to the
option. **--taxonomy** option.
- NCBI Taxonomy dump does not need to be uncompressed and unarchived anymore. The - NCBI Taxonomy dump does not need to be uncompressed and unarchived anymore. The
path of the tar and gziped dump file can be directly specified using the path of the tar and gziped dump file can be directly specified using the
@ -44,8 +70,23 @@
- `md5_string()`: returning the MD5 check sum as a hexadecimal string, - `md5_string()`: returning the MD5 check sum as a hexadecimal string,
- `subsequence(from,to)`: allows extracting a subsequence on a 0 based - `subsequence(from,to)`: allows extracting a subsequence on a 0 based
coordinate system, upper bound excluded like in go. coordinate system, upper bound excluded like in go.
- `reverse_complement`: returning a sequence object corresponding to the reverse complement - `reverse_complement`: returning a sequence object corresponding to the
of the current sequence. reverse complement of the current sequence.
### Enhancement
- In every *OBITools* command, the progress bar is automatically deactivated
when the standard error output is redirected.
- Because Genbank and ENA:EMBL contain very large sequences, while OBITools4
are optimized As Genbank and ENA:EMBL contain very large sequences, while
OBITools4 is optimized for short sequences, `obipcr` faces some problems
with excessive consumption of computer resources, especially memory. Several
improvements in the tuning of the default `obipcr` parameters and some new
features, currently only available for FASTA and FASTQ file readers, have
been implemented to limit the memory impact of `obipcr` without changing the
computational efficiency too much.
- Logging system and therefore format, have been homogenized.
### Change of git repository ### Change of git repository
@ -54,35 +95,16 @@
Take care for using the new install script for retrieving the new version. Take care for using the new install script for retrieving the new version.
```bash ```bash
curl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh \ curl -L https://metabarcoding.org/obitools4/install.sh \
| bash | bash
``` ```
or with options: or with options:
```bash ```bash
curl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh \ curl -L https://metabarcoding.org/obitools4/install.sh \
| bash -s -- --install-dir test_install --obitools-prefix k | bash -s -- --install-dir test_install --obitools-prefix k
``` ```
### CPU limitation
- By default, *OBITools4* tries to use all the computing power available on
your computer. In some circumstances this can be problematic (e.g. if you
are running on a computer cluster managed by your university). You can limit
the number of CPU cores used by *OBITools4* or by using the **--max-cpu**
option or by setting the **OBIMAXCPU** environment variable. Some strange
behavior of *OBITools4* has been observed when users try to limit the
maximum number of usable CPU cores to one. This seems to be caused by the Go
language, and it is not obvious to get *OBITools4* to run correctly on a
single core in all circumstances. Therefore, if you ask to use a single
core, **OBITools4** will print a warning message and actually set this
parameter to two cores. If you really want a single core, you can use the
**--force-one-core** option. But be aware that this can lead to incorrect
calculations.
### New features
- The output of the obitools will evolve to produce results only in standard - The output of the obitools will evolve to produce results only in standard
formats such as fasta and fastq. For non-sequential data, the output will be formats such as fasta and fastq. For non-sequential data, the output will be
in CSV format, with the separator `,`, the decimal separator `.`, and a in CSV format, with the separator `,`, the decimal separator `.`, and a
@ -162,30 +184,22 @@
Special data lines starting with `@param` in the first column allow configuring the algorithm. The options **--template** provided an over Special data lines starting with `@param` in the first column allow configuring the algorithm. The options **--template** provided an over
commented example of the CSV format, including all the possible options. commented example of the CSV format, including all the possible options.
### Enhancement ### CPU limitation
- In every *OBITools* command, the progress bar is automatically deactivated - By default, *OBITools4* tries to use all the computing power available on
when the standard error output is redirected. your computer. In some circumstances this can be problematic (e.g. if you
- Because Genbank and ENA:EMBL contain very large sequences, while OBITools4 are running on a computer cluster managed by your university). You can limit
are optimized As Genbank and ENA:EMBL contain very large sequences, while the number of CPU cores used by *OBITools4* or by using the **--max-cpu**
OBITools4 is optimized for short sequences, `obipcr` faces some problems option or by setting the **OBIMAXCPU** environment variable. Some strange
with excessive consumption of computer resources, especially memory. Several behavior of *OBITools4* has been observed when users try to limit the
improvements in the tuning of the default `obipcr` parameters and some new maximum number of usable CPU cores to one. This seems to be caused by the Go
features, currently only available for FASTA and FASTQ file readers, have language, and it is not obvious to get *OBITools4* to run correctly on a
been implemented to limit the memory impact of `obipcr` without changing the single core in all circumstances. Therefore, if you ask to use a single
computational efficiency too much. core, **OBITools4** will print a warning message and actually set this
- Logging system and therefore format, have been homogenized. parameter to two cores. If you really want a single core, you can use the
**--force-one-core** option. But be aware that this can lead to incorrect
calculations.
### Bug
- In `obitag`, correct the wrong assignment of the **obitag_bestmatch**
attribute.
- In `obiclean`, the **--no-progress-bar** option disables all progress bars,
not just the data.
- Several fixes in reading FASTA and FASTQ files, including some code
simplification and factorization.
- Fixed a bug in all obitools that caused the same file to be processed
multiple times, when specifying a directory name as input.
## April 2nd, 2024. Release 4.2.0 ## April 2nd, 2024. Release 4.2.0