161 lines
4.8 KiB
ReStructuredText
Executable File
161 lines
4.8 KiB
ReStructuredText
Executable File
*********************************************
|
|
The OBItools3 Data Management System (OBIDMS)
|
|
*********************************************
|
|
|
|
A complete DNA metabarcoding experiment relies on several kinds of data.
|
|
|
|
- The sequence data resulting from the sequencing of the PCR products,
|
|
- The description of the samples including all their metadata,
|
|
- One or several reference databases used for the taxonomic annotation,
|
|
- One or several taxonomy databases.
|
|
|
|
Up to now, each of these categories of data were stored in separate
|
|
files, and nothing made it mandatory to keep them together.
|
|
|
|
|
|
The `Data Management System` (DMS) of OBITools3 can be viewed like a basic
|
|
database system.
|
|
|
|
|
|
OBIDMS UML
|
|
==========
|
|
|
|
.. image:: ./UML/OBIDMS_UML.png
|
|
|
|
:download:`html version of the OBIDMS UML file <UML/ObiDMS_UML.class.violet.html>`
|
|
|
|
|
|
An OBIDMS directory contains :
|
|
* one `OBIDMS history file <#obidms-history-files>`_
|
|
* OBIDMS column directories
|
|
|
|
|
|
OBIDMS column directories
|
|
=========================
|
|
|
|
OBIDMS column directories contain :
|
|
* all the different versions of one OBIDMS column, under the form of different files (`OBIDMS column files <#obidms-column-files>`_)
|
|
* one `OBIDMS version file <#obidms-version-files>`_
|
|
|
|
The directory name is the column attribute with the extension ``.obicol``.
|
|
|
|
Example: ``count.obicol``
|
|
|
|
|
|
OBIDMS column files
|
|
===================
|
|
|
|
Each OBIDMS column file contains :
|
|
* a header of a size equal to a multiple of PAGESIZE (PAGESIZE being equal to 4096 bytes
|
|
on most systems) containing metadata
|
|
* Lines of data with the same `OBIType <types.html#obitypes>`_
|
|
|
|
|
|
Header
|
|
------
|
|
|
|
The header of an OBIDMS column contains :
|
|
|
|
* Endian byte order
|
|
* Header size (PAGESIZE multiple)
|
|
* Number of lines of data
|
|
* Number of lines of data used
|
|
* `OBIType <types.html#obitypes>`_ (type of the data)
|
|
* Date of creation of the file
|
|
* Version of the OBIDMS column
|
|
* The column name
|
|
* Eventual comments
|
|
|
|
|
|
Data
|
|
----
|
|
|
|
A line of data corresponds to a vector of elements. Each element is associated with an element name.
|
|
Elements names are stored in the header. The correspondance between an element and its name is done
|
|
using their order in the lists of elements and elements names. This structure allows the storage of
|
|
dictionary-like data.
|
|
|
|
Example: In the header, the attribute ``elements_names`` will be associated with the value ``"sample_1;
|
|
sample_2;sample_3"``, and a line of data with the type ``OBInt_t`` will be stored as an ``OBInt_t`` vector
|
|
of size three e.g. ``5|8|4``.
|
|
|
|
|
|
Mandatory columns
|
|
-----------------
|
|
|
|
Some columns must exist in an OBIDMS directory :
|
|
* sequence identifiers column (type ``OBIStr_t``)
|
|
|
|
|
|
File name
|
|
---------
|
|
|
|
Each file is named with the attribute associated to the data it contains, and the number of
|
|
its version, separated by an ``@``, and with the extension ``.odc``.
|
|
|
|
Example : ``count@3.odc``
|
|
|
|
|
|
Modifications
|
|
-------------
|
|
|
|
An OBIDMS column file can only be modified by the process that created it, and while its status is set to Open.
|
|
|
|
When a process wants to modify an OBIDMS column file that is closed, it must first clone it. Cloning creates a new version of the
|
|
file that belongs to the process, i.e., only that process can modify that file, as long as its status is set to Open. Once the process
|
|
has finished writing the new version of the column file, it sets the column file's status to Closed, and the file can never be modified
|
|
again.
|
|
|
|
That means that one column is stored in one file (if there is only one version)
|
|
or more (if there are several versions), and that there is one file per version.
|
|
|
|
All the versions of one column are stored in one directory.
|
|
|
|
|
|
Versioning
|
|
----------
|
|
|
|
The first version of a column file is numbered 0, and each new version increments that
|
|
number by 1.
|
|
|
|
The number of the latest version of an OBIDMS column is stored in the `OBIDMS version file <#obidms-version-files>`_ of its directory.
|
|
|
|
|
|
OBIDMS version files
|
|
====================
|
|
|
|
Each OBIDMS column is associated with an OBIDMS version file in its directory, that contains the number of the latest
|
|
version of the column.
|
|
|
|
File name
|
|
---------
|
|
|
|
OBIDMS version files are named with the attribute associated to the data contained in the column, and
|
|
have the extension ``.odv``.
|
|
|
|
Example : ``count.odv``
|
|
|
|
|
|
OBIDMS views
|
|
============
|
|
|
|
An OBIDMS view consists of a list of OBIDMS columns and lines. A view includes one version
|
|
of each mandatory column. Only one version of each column is included. All the columns of
|
|
one view contain the same number of lines in the same order.
|
|
|
|
|
|
OBIDMS history file
|
|
===================
|
|
|
|
An OBIDMS history file consists of an ordered list of views and commands, those commands leading
|
|
from one view to the next one.
|
|
|
|
This history can be represented in the form of a ?? showing all the
|
|
operations ever done in the OBIDMS directory and the views in between them :
|
|
|
|
.. image:: ./images/history.png
|
|
:width: 150 px
|
|
:align: center
|
|
|
|
|