updated the documentation with the special values, and the idea of

column directories and column group directories.
This commit is contained in:
Celine Mercier
2015-05-28 16:33:45 +02:00
parent 32fd7b5a6b
commit f4a123cd17
5 changed files with 1666 additions and 1591 deletions

View File

@ -2,19 +2,19 @@
The OBItools3 Data Management System (OBIDMS) The OBItools3 Data Management System (OBIDMS)
********************************************* *********************************************
A complete DNA Metabarcoding experiment rely on several kinds of data. A complete DNA metabarcoding experiment relies on several kinds of data.
- The sequence data resulting of the PCR products sequencing, - The sequence data resulting from the sequencing of the PCR products,
- The description of the samples including all their metadata, - The description of the samples including all their metadata,
- One or several refence database used for the taxonomical annotation - One or several reference databases used for the taxonomic annotation,
- One or several taxonomies. - One or several taxonomy databases.
Up to now each of these categories of data were stored in separate Up to now, each of these categories of data were stored in separate
files an nothing obliged to keep them together. files, and nothing made it mandatory to keep them together.
The `Data Management System` (DMS) of OBITools3 can be considered The `Data Management System` (DMS) of OBITools3 can be regarded as a basic
as a basic database system. database system.
OBIDMS UML OBIDMS UML
@ -25,11 +25,30 @@ OBIDMS UML
:download:`html version of the OBIDMS UML file <UML/ObiDMS_UML.class.violet.html>` :download:`html version of the OBIDMS UML file <UML/ObiDMS_UML.class.violet.html>`
An OBIDMS directory consists of : An OBIDMS directory contains :
* OBIDMS column files * one `OBIDMS history file <#obidms-history-files>`_
* OBIDMS release files * Two different kinds of directories :
* OBIDMS dictionary files * OBIDMS column directories
* one OBIDMS history file * OBIDMS column group directories containing OBIDMS column directories
OBIDMS column directories
=========================
OBIDMS column directories contain :
* all the different versions of one OBIDMS column, under the form of different files (`OBIDMS column files <#obidms-column-files>`_)
* one `OBIDMS release file <#obidms-release-files>`_
The directory name is the column attribute, or sub-attribute if the column directory is in a column group directory.
OBIDMS column group directories
===============================
OBIDMS column group directories contain OBIDMS column directories. They are used to store dictionary-like data, where
each key corresponds to an OBIDMS column.
The directory name is the dictionary attribute. Each key is considered a sub-attribute and is associated to its column.
OBIDMS column files OBIDMS column files
@ -38,7 +57,7 @@ OBIDMS column files
Each OBIDMS column file contains : Each OBIDMS column file contains :
* a header of a size equal to a multiple of PAGESIZE (PAGESIZE being equal to 4096 bytes * a header of a size equal to a multiple of PAGESIZE (PAGESIZE being equal to 4096 bytes
on most systems) containing metadata on most systems) containing metadata
* one column of data with the same OBIType * one column of data with the same `OBIType <types.html#obitypes>`_
Header Header
@ -48,27 +67,26 @@ The header of an OBIDMS column contains :
* Endian byte order * Endian byte order
* Header size (PAGESIZE multiple) * Header size (PAGESIZE multiple)
* * Number of lines of data
* File status : Open/Closed * Number of lines of data used
* Owner : PID of the process that created the file and is the only one allowed to modify it if it is open * `OBIType <types.html#obitypes>`_ (type of the data)
* Number of lines (total or without the header?) * Date of creation of the file
* OBIType * Version of the OBIDMS column
* Date of creation * The column name
* Version of the file
* Eventual comments * Eventual comments
Data Data
---- ----
A column of data with the same OBIType. A column of data with the same `OBIType <types.html#obitypes>`_.
Mandatory columns Mandatory columns
----------------- -----------------
Some columns must exist in an OBIDMS directory : Some columns must exist in an OBIDMS directory :
* sequence identifiers column (type *OBIStr_t*) * sequence identifiers column (type ``OBIStr_t``)
File name File name
@ -83,8 +101,7 @@ Example : ``count@3.odc``
Modifications Modifications
------------- -------------
An OBIDMS column file can only be modified by the process that created it, if its status is set to Open. Those informations are An OBIDMS column file can only be modified by the process that created it, and while its status is set to Open.
contained in the `header <#header>`_.
When a process wants to modify an OBIDMS column file that is closed, it must first clone it. Cloning creates a new version of the When a process wants to modify an OBIDMS column file that is closed, it must first clone it. Cloning creates a new version of the
file that belongs to the process, i.e., only that process can modify that file, as long as its status is set to Open. Once the process file that belongs to the process, i.e., only that process can modify that file, as long as its status is set to Open. Once the process
@ -94,6 +111,8 @@ again.
That means that one column is stored in one file (if there is only one version) That means that one column is stored in one file (if there is only one version)
or more (if there are several versions), and that there is one file per version. or more (if there are several versions), and that there is one file per version.
All the versions of one column are stored in one directory.
Versioning Versioning
---------- ----------
@ -101,13 +120,13 @@ Versioning
The first version of a column file is numbered 0, and each new version increments that The first version of a column file is numbered 0, and each new version increments that
number by 1. number by 1.
The number of the latest version of an OBIDMS column is stored in an `OBIDMS release file <formats.html#obidms-release-files>`_. The number of the latest version of an OBIDMS column is stored in the `OBIDMS release file <formats.html#obidms-release-files>`_ of its directory.
OBIDMS release files OBIDMS release files
==================== ====================
Each OBIDMS column is associated with an OBIDMS release file that contains the number of the latest Each OBIDMS column is associated with an OBIDMS release file in its dorectory, that contains the number of the latest
version of the column. version of the column.
File name File name
@ -139,20 +158,3 @@ operations ever done in the OBIDMS directory and the views in between them :
.. image:: ./images/history.png .. image:: ./images/history.png
:width: 150 px :width: 150 px
:align: center :align: center
OBIType header file
========================
.. doxygenfile:: obitypes.h
OBIIntColumn header file
========================
.. doxygenfile:: obiintcolumn.h
OBIColumn header file
=====================
.. doxygenfile:: obicolumn.h

Binary file not shown.

Before

Width:  |  Height:  |  Size: 63 KiB

After

Width:  |  Height:  |  Size: 66 KiB

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,52 @@
==============
Special values
==============
NA values
=========
All OBITypes have an associated NA (Not Available) value.
NA values are implemented by specifying an explicit NA value for each type, corresponding to the R standards:
* For the types ``OBIInt_t``, ``OBIBool_t``, ``OBIIdx_t`` and ``OBITaxid_t``, the NA value is ``INT_MIN``.
* For the type ``OBIChar_t``: the NA value is ``\0`` (?).
* For the type ``OBIStr_t`` : the NA value is a tab followed by a space.
* For the type ``OBIFloat_t``::
typedef union
{
double value;
unsigned int word[2];
} ieee_double;
static double NA_value(void)
{
volatile ieee_double x;
x.word[hw] = 0x7ff00000;
x.word[lw] = 1954;
return x.value;
}
Minimum and maximum values for ``OBIInt_t``
===========================================
* Maximum value : ``INT_MAX``
* Minimum value : ``INT_MIN(-1?)``
Infinity values for the type ``OBIFloat_t``
===========================================
* Positive infinity : ``INFINITY`` (should be defined in ``<math.h>``)
* Negative infinity : ``-INFINITY``
NaN value for the type ``OBIFloat_t``
=====================================
* NaN (Not a Number) value : ``NAN`` (should be defined in ``<math.h>`` but probably needs to be tested)

View File

@ -6,20 +6,12 @@ OBITypes
.. image:: ./UML/OBITypes_UML.png .. image:: ./UML/OBITypes_UML.png
:download:`html version of the OBITypes UML file <UML/OBITypes_UML.class.violet.html>` :download:`html version of the OBITypes UML file <UML/OBITypes_UML.class.violet.html>`
.. note::
All OBITypes have an associated NA (Not Available) value.
We have currently two ideas for implementing NA values:
- By specifying an explicit NA value for each type
- By adding to each column of an OBIDMS a bit vector
indicating if the value is defined or not.
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
The elementary types <elementary> The elementary types <elementary>
The containers <containers> The containers <containers>
Special values <specialvalues>