obitools4/doc/book/index.qmd

# Preface {.unnumbered}

The first version of *OBITools* started to be developed in 2005. This was at the beginning of the DNA metabarcoding story at the Laboratoire d'Ecologie Alpine (LECA) in Grenoble. At that time, with Pierre Taberlet and François Pompanon, we were thinking about the potential of this new methodology under development. PIerre and François developed more the laboratory methods, while I was thinking more about the tools for analysing the sequences produced. Two ideas were behind this development. I wanted something modular, and something easy to extend. To achieve the first goal, I decided to implement obitools as a suite of unix commands mimicking the classic unix commands but dedicated to sequence files. The basic unix commands are very useful for automatically manipulating, parsing and editing text files. They work in flow, line by line on the input text. The result is a new text file that can be used as input for the next command. Such a design makes it possible to quickly develop a text processing pipeline by chaining simple elementary operations. The *OBITools* are the exact counterpart of these basic Unix commands, but the basic information they process is a sequence (potentially spanning several lines of text), not a single line of text. Most *OBITools* consume sequence files and produce sequence files. Thus, the principles of chaining and modularity are respected. In order to be able to easily extend the *OBITools* to keep up with our evolving ideas about processing DNA metabarcoding data, it was decided to develop them using an interpreted language: Python. Python 2, the version available at the time, allowed us to develop the *OBITools* efficiently. When parts of the algorithms were computationally demanding, they were implemented in C and linked to the Python code. Even though Python is not the most efficient language available, even though computers were not as powerful as they are today, the size of the data we could produce using 454 sequencers or early solexa machines was small enough to be processed in a reasonable time.

The first public version of obitools was [*OBITools2*](https://metabarcoding.org/obitools) [@Boyer2016-gq], this was actually a cleaned up and documented version of *OBITools* that had been running at LECA for years and was not really distributed except to a few collaborators. This is where *OBITools* started its public life from then on. The DNA metabarcoding spring schools provided and still provide user training every year. But *OBITools2* soon suffered from two limitations: it was developed in Python2, which was increasingly abandoned in favour of Python3, and the data size kept increasing with the new illumina machines. Python's intrinsic slowness coupled with the increasing size of the datasets made OBITools computation times increasingly long. The abandonment of all maintenance of Python2 by its developers also imposed the need for a new version of OBITools.

[*OBITools3*](https://metabarcoding.org/obitools3) was the first response to this crisis. Developed and maintained by [Céline Mercier](https://www.celine-mercier.info), *OBITools3* attempted to address several limitations of *OBITools2*. It is a complete new code, mainly developed in Python3, with most of the lower layer code written in C for efficiency. OBITools3 has also abandoned text files for binary files for the same reason of efficiency. They have been replaced by a database structure that keeps track of every operation performed on the data.

Here we present *OBITools4* which can be seen as a return to the origins of OBITools. While *OBITools3* offered traceability of analyses, which is in line with the concept of open science, and faster execution, *OBITools2* was more versatile and not only usable for the analysis of DNA metabarcoding data. *OBITools4* is the third full implementation of *OBITools*. The idea behind this new version is to go back to the original design of *OBITools* which ran on text files containing sequences, like the classic Unix commands, but running at least as fast as *OBITools3* and taking advantage of the multicore architecture of all modern laptops. For this, the idea of relying on an interpreted language was abandoned. The *OBITools4* are now fully implemented in the [GO](https://go.dev) language with the exception of a few small pieces of specific code already implemented very efficiently in C. *OBITools4* also implement a new format for the annotations inserted in the header of every sequences. Rather tha relying on a format specific to *OBITools*, by default *OBITools4* use the [JSON](https://www.json.org) format. This simplifies the writing of parsers in any languages, and thus allows obitools to easiestly interact with other software.