updated documentation

This commit is contained in:
Celine Mercier
2015-05-18 17:57:10 +02:00
parent 196140af35
commit 1d39dcc559
7 changed files with 893 additions and 757 deletions

File diff suppressed because it is too large Load Diff

View File

@ -3,6 +3,66 @@ Formats
#######
********
OBITypes
********
.. note::
All OBITypes have an associated NA (Not Available) value.
Atomic types
============
========= ========= ============ ==============================
Type C type OBIType Definition
========= ========= ============ ==============================
integer int32_t OBIInt_t a signed integer value
float double OBIFloat_t a floating value
boolean ? OBIBool_t a boolean true/false value
char char OBIChar_t a character
index size_t OBIIdx_t an index in a data structure
========= ========= ============ ==============================
Character string type
=====================
================ ====== ======== ==================
Type C type OBIType Definition
================ ====== ======== ==================
Character string char* OBIStr_t a character string
================ ====== ======== ==================
Container types
===============
Lists
-----
Lists of values with an atomic OBIType.
Ensembles
---------
Ensembles of values with an atomic OBIType.
Dictionaries
------------
* Dictionaries of *OBIIdx_t* values indexed by *OBIStr_t* values, typically used for the storage of DNA sequences.
* Bit arrays for data presence/absence informations in the above dictionaries.
The taxid type
--------------
A couple of (*OBIInt_t* value, *OBIStr_t* value) corresponding to the taxid and a reference to a taxonomic database.
*********************************************
The OBItools3 Data Management System (OBIDMS)
*********************************************
@ -10,7 +70,8 @@ The OBItools3 Data Management System (OBIDMS)
An OBIDMS directory consists of :
* OBIDMS column files
* OBIDMS release files
* an OBIDMS history file
* OBIDMS dictionary files
* one OBIDMS history file
OBIDMS column files
@ -19,9 +80,37 @@ OBIDMS column files
Each OBIDMS column file contains :
* a header of a size equal to a multiple of PAGESIZE (PAGESIZE being equal to 4096 bytes
on most systems) containing metadata
* one column of data of the same type
* one column of data with the same OBIType
Header
------
The header of an OBIDMS column contains :
OBIDMS column files are read-only.
* Endian byte order
* Header size (PAGESIZE multiple)
*
* File status : Open/Closed
* Owner : PID of the process that created the file and is the only one allowed to modify it if it is open
* Number of lines (total or without the header?)
* OBIType
* Date of creation
* Version of the file
* Eventual comments
Data
----
A column of data with the same OBIType.
Mandatory columns
-----------------
Some columns must exist in an OBIDMS directory :
* sequence identifiers column (type *OBIStr_t*)
File name
@ -33,47 +122,30 @@ its version, separated by an ``@``, and with the extension ``.odc``.
Example : ``count@3.odc``
Header
------
Modifications
-------------
The header of an OBIDMS column contains :
* Endian byte order
* PAGESIZE value / Size of the header
* Number of lines (total or without the header?)
* Data type (int, str...)
* Date of creation
* Version of the file
* Eventual comments
An OBIDMS column file can only be modified by the process that created it, if its status is set to Open. Those informations are
contained in the `header <#header>`_.
When a process wants to modify an OBIDMS column file that is closed, it must first clone it. Cloning creates a new version of the
file that belongs to the process, i.e., only that process can modify that file, as long as its status is set to Open. Once the process
has finished writing the new version of the column file, it sets the column file's status to Closed, and the file can never be modified
again.
Data
----
A column of data of the same type.
That means that one column is stored in one file (if there is only one version)
or more (if there are several versions), and that there is one file per version.
Versioning
----------
OBIDMS column files are read-only, and any modification leads to the creation of a new version
of the column file. That means that one column is stored in one file (if there is only one version)
or more (if there are several versions), and that there is one file per version.
The first version of a column file is numbered 1, and each new version increments that
number by 1.
The number of the latest version of an OBIDMS column is stored in an `OBIDMS release file <formats.html#obidms-release-files>`_.
Mandatory columns
-----------------
Some columns must exist in an OBIDMS directory :
* sequence identifiers column
OBIDMS release files
====================
@ -89,21 +161,28 @@ have the extension ``.odr``.
Example : ``count.odr``
OBIDMS history file
===================
An OBIDMS history file consists of data that can be represented in the form of a directed acyclic
graph presenting the history of all the operations ever done in the OBIDMS directory.
OBIDMS views
============
An OBIDMS view corresponds to a list of OBIDMS columns and lines. A view includes one version
An OBIDMS view consists of a list of OBIDMS columns and lines. A view includes one version
of each mandatory column. Only one version of each column is included. All the columns of
one view contain the same number of lines in the same order.
OBIDMS history file
===================
An OBIDMS history file consists of an ordered list of views and commands, those commands leading
from one view to the next one.
This history can be represented in the form of a --- showing all the
operations ever done in the OBIDMS directory and the views in between them :
.. image:: ./images/history.png
:width: 150 px
:align: center
OBIDMS UML
==========

View File

@ -100,12 +100,12 @@ Naming conventions
******************
.. todo::
Look for usual naming conventions
Look for common naming conventions
*****************
Programming rules
*****************
* The *int* type should never be used
*

Binary file not shown.

Before

Width:  |  Height:  |  Size: 62 KiB

After

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

View File

@ -11,6 +11,7 @@ OBITools3 documentation
Programming guidelines <guidelines>
Formats <formats>
Pistes de reflexion <pistes>
Indices and tables

20
doc/source/pistes.rst Normal file
View File

@ -0,0 +1,20 @@
###################
Pistes de reflexion
###################
******************************
Ce que l'on veut pouvoir faire
******************************
* Gerer les valeurs manquantes
* Modifier une colonne en cours d'ecriture (mmap)
* Ajouter des valeurs a la fin d'une colonne en cours d'ecriture (mmap)
******
Divers
******
* Si l'ordre d'une colonne est change, elle est reecrite (pas d'index).
* Truc pour verrouiller l'acces en lecture a un programme a la fois...