Compare commits

...

51 Commits

Author SHA1 Message Date
d1d26b9028 Simplify the code 2016-08-04 08:00:54 +02:00
8f0462c407 Merge branch 'master' into Eric_version_for_sequence
Conflicts:
	python/obitools3/obidms/_obidmscolumn_seq.pyx
2016-08-03 10:09:20 +02:00
000b9999ad Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2016-07-03 09:22:22 +02:00
aff9831c13 Substitute fprintf call by fputs call to conform with the new ubuntu
compilation rules
2016-07-03 09:21:56 +02:00
448fa8d325 first trial for a fasta formater 2016-07-03 09:18:52 +02:00
6af62d8124 Change a fprintf without argument to a fputs to comply with the new
default parameter on ubuntu
2016-07-03 08:25:06 +02:00
0869b9ba3f Closes issue #47 by storing each view in a separate file named with the
view's name and created upon view creation.
2016-06-30 11:41:30 +02:00
ad2af0b512 Some comments updated 2016-06-16 11:26:54 +02:00
38e603ed57 Deleted some redundant cython code 2016-06-10 10:34:47 +02:00
f438c3d913 OBIQUAL columns can now handle multiple elements per line 2016-06-09 15:54:36 +02:00
2a1ea3ba3f Setting NA values is now handled properly for OBI_SEQ, OBI_STR and
OBI_QUAL columns
2016-06-09 14:22:36 +02:00
fc3641d7ff Read-only AVLs are now hard-linked instead of copied when cloning an AVL
group to make it writable. Also fixed several bugs when handling AVL
groups.
2016-06-03 19:02:46 +02:00
799b942017 Deleted old debugging print 2016-06-03 18:57:32 +02:00
6e3f5b230e Fixed typo in doc 2016-06-03 18:56:45 +02:00
2f57f80c63 Fixed a bug where an unmapped variable would be read 2016-06-03 18:55:58 +02:00
2962c4d250 Goes with previous commit 2016-06-03 18:54:25 +02:00
69bf7ec2e7 NA value for OBI_STR and OBI_SEQ columns is now NULL 2016-06-03 18:53:22 +02:00
bac7ce7184 Start of the implementation of the export methods 2016-06-02 19:10:33 +02:00
f186395661 Trap potential exception generated by char* to bytes casts 2016-05-29 21:18:20 +02:00
85395dfc1a value returned for sequence is now bytes and no more str 2016-05-29 13:53:32 +02:00
f830389974 Add some comment on the location of the align method. 2016-05-29 12:58:31 +02:00
2e35229357 Add conversion checking on the value of a seq column 2016-05-29 12:54:13 +02:00
a8ed57dc6e few small changes 2016-05-21 12:29:55 +02:00
c3274d419c remove an extra debug log 2016-05-21 12:29:08 +02:00
cca0dbb46b Close issue #54 by adding a read1 method to the MagicKeyFile class 2016-05-21 12:24:48 +02:00
5a78157112 increase parsing speed of the header 2016-05-21 10:29:11 +02:00
0b9a41d952 Patch a bug about the reading of the last sequence 2016-05-21 10:28:03 +02:00
e681ca646d Fixed a problem with some columns being shorter in views and triggering
errors when trying to get values. Temporary fix that needs discussion
2016-05-20 18:45:29 +02:00
3b59043ea8 Major update: New column type to store sequence qualities. Closes #41 2016-05-20 16:45:22 +02:00
ffff91e76c Fixed variable name that had been accidentally changed for better
clarity
2016-05-18 13:27:41 +02:00
6a8df069ad Indexers are now cloned if needed to modify them after they've been
closed. Obligatory indexers' names now follow the same pattern as other
indexers (columnname_version). Closes #46 and #49.
2016-05-18 13:23:48 +02:00
8ae7644945 First version of quality handling (not working yet) and now it is
checked that a column is writable before enlarging it
2016-05-11 16:38:14 +02:00
b3c47809da First version of alignment functions (imported from suma* programs) 2016-05-11 16:36:23 +02:00
3567681339 Now when a column is added to a view, if there is a line selection, all
columns in the view are cloned first
2016-05-11 16:34:20 +02:00
757ef8509a Deleting CeCILL license duplicates 2016-05-09 11:17:45 +02:00
f961621f5d Minor improvements in _obidms Cython layer 2016-05-04 13:43:26 +02:00
bc12360490 Reworked and commented a bit the cython layer for dms, columns and views 2016-05-02 15:16:06 +02:00
872071b104 Removed a list of column pointers kept in the OBIView class that was not
really needed
2016-05-02 14:23:42 +02:00
32cc8968e8 Adding CeCILL license 2016-05-02 11:51:59 +02:00
d6481f0db8 Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2016-04-29 17:46:59 +02:00
a32920e401 Relative paths when creating or opening a DMS now work 2016-04-29 17:46:36 +02:00
31cf27d676 Added indexer function that returns the name of the indexer 2016-04-29 16:18:56 +02:00
baba2d742e commenting _obidms.pyx 2016-04-29 16:07:03 +02:00
5bd12079ae Added comments about listing columns and indexers in obidms functions 2016-04-29 16:06:01 +02:00
072ee5ac03 Re-re-fixed line breaks in README file 2016-04-29 15:44:40 +02:00
9fe21316ff Refixed line breaks in README file 2016-04-29 15:39:46 +02:00
3dc3aaa46b Fixed line breaks in README file 2016-04-29 15:36:58 +02:00
b371030edd Adding README file 2016-04-29 15:35:08 +02:00
b3976fa461 Merge branch 'luke_tests' 2016-04-28 11:17:24 +02:00
6ea2cfb9ca Merging luke_tests branch without the commit turning inline functions in macros 2016-04-28 11:17:18 +02:00
2d8c06f7b7 Fixed variable initialization for error detection 2016-04-26 14:38:46 +02:00
74 changed files with 4891 additions and 1165 deletions

40
README.md Normal file
View File

@ -0,0 +1,40 @@
The `OBITools3`: A package for the management of analyses and data in DNA metabarcoding
---------------------------------------------
DNA metabarcoding offers new perspectives for biodiversity research [1]. This approach of ecosystem studies relies heavily on the use of Next-Generation Sequencing (NGS), and consequently requires the ability to to treat large volumes of data. The `OBITools` package satisfies this requirement thanks to a set of programs specifically designed for analyzing NGS data in a DNA metabarcoding context [2] - <http://metabarcoding.org/obitools>. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to setup tailored-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses.
**The `OBITools3`.** This new version of the `OBITools` looks to significantly improve the storage efficiency and the data processing speed. To this end, the `OBITools3` rely on an ad hoc database system, inside which all the data that a DNA metabarcoding experiment must consider is stored: the sequences, the metadata (describing for instance the samples), the database containing the reference sequences used for the taxonomic annotation, as well as the taxonomic databases. Besides the gain in efficiency, this new structure allows an easier access to all the data associated with an experiment.
**Column-oriented storage.** An analysis pipeline corresponds to a succession of commands, each computing one step of the analysis, and where the result of the command *n* is used by the command *n+1*. DNA metabarcoding data can easily be represented in the form of tables, and each command can be regarded as an operation transforming one or several 'input' tables into one or several 'output' tables, which can be used by the next command. Many of the basic operations in a pipeline copy without modification an important part of the input tables to the result tables, and use for their calculations only a small part of the input data. In the original `OBITools`, those tables are kept in the form of annotated sequence files in the FASTA or FASTQ format. This has two consequences: i) keeping the transitional results of the analysis pipeline means using disk space for an important volume of redundant data, ii) The coding and decoding of informations that are not actually used represent an important part of the treatment process. The new database system used by the `OBITools3` (called DMS for Data Management System) relies on column-oriented storage. The columns are immutable and can be assembled in views representing the data tables. This way, the data not modified by a command in an input table can easily be associated to the result table without duplicating any information ; and the data not used at all by a command can be associated with the result table without being read. This strategy results in a gain in disk space efficiency by limiting data redundancy, as well as a gain in execution time by limiting data reading, writing and conversion operations. Finally, as a mean to optimize data access, each column is stored in a binary file directly mapped in memory for reading and writing operations.
**Storage optimization.** DNA metabarcoding data is intrinsically very redundant. For example, the same sequence corresponding to a species will be present several thousand times across all samples. In order to limit the disk space used and make comparison operations more efficient, data in the form of character strings is stored in columns using a complex indexing structure, efficient on millions of values, coupling hash functions, Bloom filters and AVL trees. Finally, DNA sequences are compressed by encoding each nucleotide on two or four bits depending on whether the sequences contain only the four nucleotides (A, C, G, T) or use the IUPAC codes.
**Saving the data processing history.** The totality of the informations used by the `OBITools3` is stored in immutable data structures in the DMS. If a command has to modify a column used as input to produce its result, a new version of that column is created, leaving the initial version intact. This storage system enables to keep, at minimal cost, the totality of the transitional results produced by the pipeline. The storage of metadata describing all the operations that have produced a view (a result table) in the DMS makes possible the creation of an oriented hypergraph, where each node corresponds to a view and each arrow to an operation. By retracing the dependency relationships in this hypergraph, it is possible to rebuild *a posteriori* the entirety of the process that has produced a result table.
**Tools.** The `OBITools3` offer the same tools as the original `OBITools`. Eventually, new versions of `ecoPrimers` (PCR primer design) [3], `ecoPCR` (*in silico* PCR) [4], as well as `Sumatra` (sequence alignment) and `Sumaclust` (sequence alignment and clustering) [5] will be added, taking advantage of the database structure developed for the `OBITools3`.
**Implementation and disponibility.** The lower layers managing the DMS as well as all the compute-intensive functions are coded in `C99` for efficiency reasons. A `Cython` (<http://www.cython.org>) object layer allows for a simple but efficient implementation of the `OBITools3` commands in `Python 3.5`. The `OBITools3` are still in development, and the first functional versions are expected for autumn 2016.
**References.**
1. Taberlet P, Coissac E, Hajibabaei M, Rieseberg LH: Environmental DNA. Mol Ecol 2012:17891793.
2. Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E: OBITools: a Unix-inspired software package for DNA metabarcoding. Mol Ecol Resour 2015:n/an/a.
3. Riaz T, Shehzad W, Viari A, Pompanon F, Taberlet P, Coissac E: ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic Acids Res 2011, 39:e145.
4. Ficetola GF, Coissac E, Zundel S, Riaz T, Shehzad W, Bessière J, Taberlet P, Pompanon F: An in silico approach for the evaluation of DNA barcodes. BMC Genomics 2010, 11:434.
5. Mercier C, Boyer F, Bonin A, Coissac E (2013) SUMATRA and SUMACLUST: fast and exact comparison and clustering of sequences. Available: <http://metabarcoding.org/sumatra> and <http://metabarcoding.org/sumaclust>

View File

@ -5,6 +5,7 @@ from ..utils cimport str2bytes
cdef extern from "stdio.h":
struct FILE
int fprintf(FILE *stream, char *format, ...)
int fputs(char *string, FILE *stream)
FILE* stderr
ctypedef unsigned int off_t "unsigned long long"

View File

@ -126,7 +126,7 @@ cdef class ProgressBar:
if twentyth != self.lastlog:
if self.ontty:
<void>fprintf(stderr,b'\n')
<void>fputs(b'\n',stderr)
self.logger.info('%s %5.1f %% remain : %02d:%02d:%02d' % (
bytes2str(self.head),

View File

@ -35,7 +35,6 @@ def addOptions(parser):
type=str,
help="Name of the default DMS for reading and writing data")
group.add_argument('--destination-view','-v',
action="store", dest="import:destview",
metavar='<VIEW NAME>',
@ -96,12 +95,14 @@ def run(config):
inputs = uopen(config['import']['filename'])
get_quality = False
if config['import']['seqinformat']=='fasta':
iseq = fastaIterator(inputs)
view_type="NUC_SEQS_VIEW"
elif config['import']['seqinformat']=='fastq':
iseq = fastqIterator(inputs)
view_type="NUC_SEQS_VIEW"
get_quality = True
else:
raise RuntimeError('No file format specified')
@ -120,13 +121,15 @@ def run(config):
view[i].set_id(seq['id'])
view[i].set_definition(seq['definition'])
view[i].set_sequence(seq['sequence'])
if get_quality :
view[i].set_quality(seq['quality'])
for tag in seq['tags'] :
#print(tag, seq['tags'][tag])
#if seq['tags'][tag] not in NA_list :
view[i][tag] = seq['tags'][tag]
i+=1
#print(view)
#print(i)
print(view.__repr__())
view.save_and_close()

View File

@ -9,6 +9,7 @@ cdef class MagicKeyFile:
cdef int pos
cpdef bytes read(self,int size=?)
cpdef bytes read1(self,int size=?)
cpdef int tell(self)

View File

@ -54,14 +54,17 @@ cdef class MagicKeyFile:
r = self.binary.read(size)
return r
cpdef bytes read1(self,int size=-1):
return self.read(size)
cpdef int tell(self):
cdef int p
if self.pos < self.keylength:
p = self.pos
else:
p = self.tell()
p = self.binary.tell()
return p

View File

View File

@ -0,0 +1,10 @@
from ..utils cimport bytes2str
from .header cimport HeaderFormat
from cython.view cimport array as cvarray
cdef class FastaFormat:
cdef HeaderFormat headerFormater
cdef size_t sequenceBufferLength
cdef char* sequenceBuffer

View File

@ -0,0 +1,32 @@
cimport cython
from libc.stdlib cimport malloc, free, realloc
from libc.string cimport strncpy
cdef class FastaFormat:
def __init__(self, list tags=[], bint printNAKeys=False):
self.headerFormater = HeaderFormat(True,
tags,
printNAKeys)
@cython.boundscheck(False)
def __call__(self, dict data):
cdef bytes brawseq = data['sequence']
cdef size_t lseq = len(brawseq)
cdef size_t k=0
cdef list lines = []
for k in range(0,lseq,60):
lines.append(brawseq[k:(k+60)])
brawseq = b'\n'.join(lines)
return "%s\n%s" % (self.headerFormater(data),bytes2str(brawseq))

View File

@ -0,0 +1,7 @@
cdef class HeaderFormat:
cdef str start
cdef set tags
cdef bint printNaKeys
cdef size_t headerBufferLength

View File

@ -0,0 +1,60 @@
cdef class HeaderFormat:
def __init__(self, bint fastaHeader=True, list tags=[], bint printNAKeys=False):
'''
@param fastaHeader:
@type fastaHeader: `bool`
@param tags:
@type tags: `list` of `bytes`
@param printNAKeys:
@type printNAKeys: `bool`
'''
self.tags = set(tags)
self.printNaKeys = printNAKeys
if fastaHeader:
self.start=">"
else:
self.start="@"
self.headerBufferLength = 1000
#self.headerBuffer = []
def __call__(self, dict data):
cdef str header
cdef dict tags = data['tags']
cdef set ktags
cdef list lines = [""]
cdef str tagline
if self.tags is not None and self.tags:
ktags = self.tags
else:
ktags = set(tags.keys())
for k in ktags:
if k in tags:
value = tags[k]
if value is not None or self.printNaKeys:
lines.append("%s=%s;" % (k,tags[k]))
if len(lines) > 1:
tagline=" ".join(lines)
else:
tagline=""
if data['definition'] is not None:
header = "%s%s%s %s" % (self.start,data['id'],
tagline,
data['definition'])
else:
header = "%s%s%s" % (self.start,data['id'],
tagline)
return header

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -6,11 +6,12 @@ from .capi.obiview cimport Obiview_p
from .capi.obitypes cimport obiversion_t, OBIType_t, index_t
from ._obitaxo cimport OBI_Taxonomy
cdef class OBIDMS_column:
cdef OBIDMS_column_p* pointer
cdef OBIDMS dms
cdef Obiview_p view
cdef OBIView view
cdef str data_type
cdef str dms_name
cdef str column_name
@ -45,7 +46,6 @@ cdef class OBIView:
cdef str name
cdef str comments
cdef dict columns
cdef dict columns_pp # TODO this dict might be unnecessary
cdef OBIDMS dms
cpdef delete_column(self, str column_name)
@ -70,6 +70,7 @@ cdef class OBIView_NUC_SEQS(OBIView):
cdef OBIDMS_column ids
cdef OBIDMS_column sequences
cdef OBIDMS_column definitions
cdef OBIDMS_column qualities
cpdef delete_column(self, str column_name)
@ -90,5 +91,5 @@ cdef class OBIDMS:
cpdef OBIView open_view(self, str view_name)
cpdef OBIView new_view(self, str view_name, object view_to_clone=*, list line_selection=*, str view_type=*, str comments=*)
cpdef dict read_view_infos(self, str view_name)
cpdef dict read_views(self)
# cpdef dict read_views(self) TODO

View File

@ -17,6 +17,7 @@ from .capi.obitypes cimport const_char_p, \
OBI_FLOAT, \
OBI_BOOL, \
OBI_CHAR, \
OBI_QUAL, \
OBI_STR, \
OBI_SEQ, \
name_data_type, \
@ -43,6 +44,9 @@ from ._obidmscolumn_bool cimport OBIDMS_column_bool, \
from ._obidmscolumn_char cimport OBIDMS_column_char, \
OBIDMS_column_multi_elts_char
from ._obidmscolumn_qual cimport OBIDMS_column_qual, \
OBIDMS_column_multi_elts_qual
from ._obidmscolumn_str cimport OBIDMS_column_str, \
OBIDMS_column_multi_elts_str
@ -50,19 +54,19 @@ from ._obidmscolumn_seq cimport OBIDMS_column_seq, \
OBIDMS_column_multi_elts_seq
from .capi.obiview cimport Obiview_p, \
Obiviews_infos_all_p, \
Obiview_infos_p, \
Column_reference_p, \
obi_new_view_nuc_seqs, \
obi_new_view, \
obi_new_view_cloned_from_name, \
obi_new_view_nuc_seqs_cloned_from_name, \
obi_view_map_file, \
obi_view_unmap_file, \
obi_open_view, \
obi_read_view_infos, \
obi_close_view_infos, \
obi_view_delete_column, \
obi_view_add_column, \
obi_view_get_column, \
obi_view_get_column, \
obi_view_get_pointer_on_column_in_view, \
obi_select_line, \
obi_select_lines, \
@ -70,7 +74,8 @@ from .capi.obiview cimport Obiview_p, \
VIEW_TYPE_NUC_SEQS, \
NUC_SEQUENCE_COLUMN, \
ID_COLUMN, \
DEFINITION_COLUMN
DEFINITION_COLUMN, \
QUALITY_COLUMN
from libc.stdlib cimport malloc
from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
@ -84,13 +89,13 @@ cdef class OBIDMS_column :
cdef OBIDMS_column_p column_p
cdef OBIDMS_column_p* column_pp
column_pp = <OBIDMS_column_p*> PyCapsule_GetPointer(((view.columns_pp)[column_name]), NULL) # or use C function
column_pp = obi_view_get_pointer_on_column_in_view(view.pointer, str2bytes(column_name))
column_p = column_pp[0] # TODO ugly cython dereferencing but can't find better
# Fill structure
self.pointer = column_pp
self.dms = view.dms
self.view = view.pointer # TODO pointer or instance?
self.view = view
self.data_type = bytes2str(name_data_type((column_p.header).returned_data_type))
self.column_name = bytes2str((column_p.header).name)
self.nb_elements_per_line = (column_p.header).nb_elements_per_line
@ -120,7 +125,7 @@ cdef class OBIDMS_column :
yield self.get_line(line_nb)
cpdef update_pointer(self):
self.pointer = <OBIDMS_column_p*> obi_view_get_pointer_on_column_in_view(self.view, str2bytes(self.column_name))
self.pointer = <OBIDMS_column_p*> obi_view_get_pointer_on_column_in_view(self.view.pointer, str2bytes(self.column_name))
cpdef list get_elements_names(self):
return self.elements_names
@ -137,12 +142,15 @@ cdef class OBIDMS_column :
cpdef str get_comments(self):
return bytes2str((self.pointer)[0].header.comments)
def __repr__(self) :
def __str__(self) :
cdef str to_print
to_print = ''
for line in self :
to_print = to_print + str(line) + "\n"
return to_print
def __repr__(self) :
return (self.column_name + ", version " + str((self.pointer)[0].header.version) + ", data type: " + self.data_type)
cpdef close(self):
if obi_close_column((self.pointer)[0]) < 0 :
@ -183,6 +191,11 @@ cdef class OBIDMS_column :
subclass = OBIDMS_column_char
else :
subclass = OBIDMS_column_multi_elts_char
elif col_type == OBI_QUAL :
if col_one_element_per_line :
subclass = OBIDMS_column_qual
else :
subclass = OBIDMS_column_multi_elts_qual
elif col_type == OBI_STR :
if col_one_element_per_line :
subclass = OBIDMS_column_str
@ -236,28 +249,28 @@ cdef class OBIDMS_column_line :
cdef class OBIView :
def __init__(self, OBIDMS dms, str view_name, bint new=False, object view_to_clone=None, list line_selection=None, str comments=""):
cdef Obiview_p view = NULL
cdef int i
cdef list col_list
cdef str col_name
cdef OBIDMS_column column
cdef OBIDMS_column_p column_p
cdef OBIDMS_column_p* column_pp
cdef Obiview_p view = NULL
cdef int i
cdef list col_list
cdef str col_name
cdef OBIDMS_column column
cdef OBIDMS_column_p column_p
cdef OBIDMS_column_header_p header
cdef index_t* line_selection_p
cdef object col_capsule
cdef index_t* line_selection_p
self.dms = dms
# Create the C array for the line selection if needed
if line_selection is not None :
line_selection_p = <index_t*> malloc((len(line_selection) + 1) * sizeof(index_t))
for i in range(len(line_selection)) :
line_selection_p[i] = line_selection[i] # TODO type problem?
line_selection_p[i] = line_selection[i]
line_selection_p[len(line_selection)] = -1
else :
line_selection_p = NULL
# Create the view if needed
if new :
if view_to_clone is not None :
if type(view_to_clone) == str :
@ -266,102 +279,73 @@ cdef class OBIView :
view = obi_new_view(dms.pointer, str2bytes(view_name), (<OBIView> view_to_clone).pointer, line_selection_p, str2bytes(comments))
elif view_to_clone is None :
view = obi_new_view(dms.pointer, str2bytes(view_name), NULL, line_selection_p, str2bytes(comments))
# Else, open the existing view
elif not new :
if view_name is not None :
view = obi_open_view(dms.pointer, str2bytes(view_name))
elif view_name is None :
view = obi_open_view(dms.pointer, NULL)
view = obi_open_view(dms.pointer, NULL) # TODO discuss
if view == NULL :
raise Exception("Error creating/opening view")
raise Exception("Error creating/opening a view")
self.pointer = view
self.name = bytes2str(view.name)
self.name = bytes2str(view.infos.name)
# go through columns to build list and open python object (TODO make separate function?)
# Go through columns to build list of corresponding python instances
self.columns = {}
self.columns_pp = {}
i = 0
while i < view.column_count :
column_pp = <OBIDMS_column_p*> ((view.columns)+i)
column_p = <OBIDMS_column_p> (view.columns)[i]
header = (column_p).header
col_name = bytes2str(header.name)
col_capsule = PyCapsule_New(column_pp, NULL, NULL) # TODO discuss
(self.columns_pp)[col_name] = col_capsule
subclass = OBIDMS_column.get_subclass_type(column_p)
for i in range(view.infos.column_count) :
column_p = <OBIDMS_column_p> (view.columns)[i]
header = (column_p).header
col_name = bytes2str(header.name)
subclass = OBIDMS_column.get_subclass_type(column_p)
self.columns[col_name] = subclass(self, col_name)
i+=1
def __repr__(self) :
cdef str s
cdef OBIDMS_column column
cdef OBIDMS_column_p column_p
s = self.name
s = s + ", " + self.comments + ", " + str(self.pointer.line_count) + " lines"
for column_name in self.columns : # TODO make function in OBIDMS_column class
column = self.columns[column_name]
column_p = (column.pointer)[0]
s = s + "\n" + column_name + ", version " + str(column_p.header.version) + ", data type: " + column.data_type
s = str(self.name) + ", " + str(self.comments) + ", " + str(self.pointer.infos.line_count) + " lines\n"
for column_name in self.columns :
s = s + self.columns[column_name].__repr__() + '\n'
return s
cpdef delete_column(self, str column_name) :
cdef int i
cdef Obiview_p view
cdef Obiview_p view_p
cdef OBIDMS_column column
cdef OBIDMS_column_p column_p
cdef OBIDMS_column_p* column_pp
cdef OBIDMS_column_header_p header
cdef str column_n
view = self.pointer
if obi_view_delete_column(view, str2bytes(column_name)) < 0 :
if obi_view_delete_column(view_p, str2bytes(column_name)) < 0 :
raise Exception("Problem deleting a column from a view")
# Update the dictionaries of column pointers and column objects, and update pointers in column objects (make function?):
(self.columns).pop(column_name)
(self.columns_pp).pop(column_name)
i = 0
while i < view.column_count :
column_pp = <OBIDMS_column_p*> ((view.columns)+i)
column_p = <OBIDMS_column_p> (view.columns)[i]
header = (column_p).header
col_name = bytes2str(header.name)
col_capsule = PyCapsule_New(column_pp, NULL, NULL)
(self.columns_pp)[col_name] = col_capsule
i+=1
for column_n in self.columns :
(self.columns[column_n]).update_pointer()
cpdef add_column(self,
str column_name,
obiversion_t version_number=-1,
str type='',
index_t nb_lines=0,
index_t nb_elements_per_line=1, # TODO 1?
index_t nb_elements_per_line=1,
list elements_names=None,
str indexer_name="",
str comments="",
bint create=True # TODO
bint create=True
) :
cdef bytes column_name_b
cdef bytes elements_names_b
cdef object subclass
cdef OBIDMS_column_p* column_pp
cdef OBIDMS_column_p column_p
column_name_b = str2bytes(column_name)
@ -380,6 +364,8 @@ cdef class OBIView :
data_type = OBI_BOOL
elif type == 'OBI_CHAR' :
data_type = OBI_CHAR
elif type == 'OBI_QUAL' :
data_type = OBI_QUAL
elif type == 'OBI_STR' :
data_type = OBI_STR
elif type == 'OBI_SEQ' :
@ -387,22 +373,16 @@ cdef class OBIView :
else :
raise Exception("Invalid provided data type")
if (obi_view_add_column(self.pointer, column_name_b, version_number, # should return pointer on column?
if (obi_view_add_column(self.pointer, column_name_b, version_number, # TODO should return pointer on column?
data_type, nb_lines, nb_elements_per_line,
elements_names_b, str2bytes(indexer_name),
str2bytes(comments), create) < 0) :
raise Exception("Problem adding a column in a view")
# Store the column pointer
column_pp = obi_view_get_pointer_on_column_in_view(self.pointer, column_name_b)
if column_pp == NULL :
raise Exception("Problem getting a column in a view")
col_capsule = PyCapsule_New(column_pp, NULL, NULL) # TODO
(self.columns_pp)[column_name] = col_capsule
# Get the column pointer
column_p = obi_view_get_column(self.pointer, column_name_b)
# Open and store the subclass
column_p = column_pp[0] # TODO ugly cython dereferencing
subclass = OBIDMS_column.get_subclass_type(column_p)
(self.columns)[column_name] = subclass(self, column_name)
@ -416,12 +396,12 @@ cdef class OBIView :
# iter on each line of all columns
# Declarations
cdef index_t lines_used
cdef index_t line_nb
cdef OBIView_line line # TODO for NUC SEQS View
cdef index_t lines_used
cdef index_t line_nb
cdef OBIView_line line # TODO Check that this works for NUC SEQ views
# Yield each line TODO line class
lines_used = (self.pointer).line_count
# Yield each line
lines_used = self.pointer.infos.line_count
for line_nb in range(lines_used) :
line = self[line_nb]
@ -431,7 +411,7 @@ cdef class OBIView :
def __getitem__(self, object item) :
if type(item) == str :
return (self.columns)[item]
elif type(item) == int : # TODO int?
elif type(item) == int :
return OBIView_line(self, item)
@ -444,7 +424,7 @@ cdef class OBIView :
cdef index_t* line_selection_p
line_selection_p = <index_t*> malloc((len(line_selection) + 1) * sizeof(index_t))
for i in range(len(line_selection)) :
line_selection_p[i] = line_selection[i] # TODO type problem?
line_selection_p[i] = line_selection[i]
line_selection_p[len(line_selection)] = -1
if obi_select_lines(self.pointer, line_selection_p) < 0 :
raise Exception("Problem selecting a list of lines")
@ -468,22 +448,21 @@ cdef class OBIView_NUC_SEQS(OBIView):
def __init__(self, OBIDMS dms, str view_name, bint new=False, object view_to_clone=None, list line_selection=None, str comments=""):
cdef Obiview_p view = NULL
cdef int i
cdef list col_list
cdef str col_name
cdef OBIDMS_column column
cdef OBIDMS_column_p column_p
cdef OBIDMS_column_p* column_pp
cdef Obiview_p view = NULL
cdef int i
cdef list col_list
cdef str col_name
cdef OBIDMS_column column
cdef OBIDMS_column_p column_p
cdef OBIDMS_column_header_p header
cdef index_t* line_selection_p
cdef index_t* line_selection_p
self.dms = dms
if line_selection is not None :
line_selection_p = <index_t*> malloc((len(line_selection) + 1) * sizeof(index_t))
for i in range(len(line_selection)) :
line_selection_p[i] = line_selection[i] # TODO type problem?
line_selection_p[i] = line_selection[i]
line_selection_p[len(line_selection)] = -1
else :
line_selection_p = NULL
@ -506,76 +485,29 @@ cdef class OBIView_NUC_SEQS(OBIView):
raise Exception("Error creating/opening view")
self.pointer = view
self.name = bytes2str(view.name)
self.comments = bytes2str(view.comments)
self.name = bytes2str(view.infos.name)
self.comments = bytes2str(view.infos.comments)
# go through columns to build list and open python object (TODO make separate function?)
# Go through columns to build list of corresponding python instances
self.columns = {}
self.columns_pp = {}
i = 0
while i < view.column_count :
column_pp = <OBIDMS_column_p*> ((view.columns)+i)
column_p = <OBIDMS_column_p> (view.columns)[i]
header = (column_p).header
col_name = bytes2str(header.name)
col_capsule = PyCapsule_New(column_pp, NULL, NULL) # TODO discuss
(self.columns_pp)[col_name] = col_capsule
subclass = OBIDMS_column.get_subclass_type(column_p)
for i in range(view.infos.column_count) :
column_p = <OBIDMS_column_p> (view.columns)[i]
header = (column_p).header
col_name = bytes2str(header.name)
subclass = OBIDMS_column.get_subclass_type(column_p)
self.columns[col_name] = subclass(self, col_name)
i+=1
self.ids = self.columns[bytes2str(ID_COLUMN)]
self.sequences = self.columns[bytes2str(NUC_SEQUENCE_COLUMN)]
self.definitions = self.columns[bytes2str(DEFINITION_COLUMN)]
cpdef delete_column(self, str column_name) :
cdef int i
cdef Obiview_p view
cdef OBIDMS_column column
cdef OBIDMS_column_p column_p
cdef OBIDMS_column_p* column_pp
cdef OBIDMS_column_header_p header
cdef str column_n
if ((column_name == bytes2str(ID_COLUMN)) or (column_name == bytes2str(NUC_SEQUENCE_COLUMN)) or (column_name == bytes2str(DEFINITION_COLUMN))) :
raise Exception("Can't delete an obligatory column from a NUC_SEQS view")
view = self.pointer
if obi_view_delete_column(view, str2bytes(column_name)) < 0 :
raise Exception("Problem deleting a column from a view")
# Update the dictionaries of column pointers and column objects, and update pointers in column objects (make function?):
(self.columns).pop(column_name)
(self.columns_pp).pop(column_name)
i = 0
while i < view.column_count :
column_pp = <OBIDMS_column_p*> ((view.columns)+i)
column_p = <OBIDMS_column_p> (view.columns)[i]
header = (column_p).header
col_name = bytes2str(header.name)
col_capsule = PyCapsule_New(column_pp, NULL, NULL)
(self.columns_pp)[col_name] = col_capsule
i+=1
for column_n in self.columns :
(self.columns[column_n]).update_pointer()
self.qualities = self.columns[bytes2str(QUALITY_COLUMN)]
def __getitem__(self, object item) :
if type(item) == str :
return (self.columns)[item]
elif type(item) == int : # TODO int?
elif type(item) == int :
return OBI_Nuc_Seq_Stored(self, item)
def __setitem__(self, index_t line_idx, OBI_Nuc_Seq sequence_obj) :
for key in sequence_obj :
self[line_idx][key] = sequence_obj[key]
@ -594,6 +526,7 @@ cdef class OBIView_line :
def __setitem__(self, str column_name, object value):
# TODO detect multiple elements (dict type)? put somewhere else? but more risky (in get)
# TODO OBI_QUAL ?
cdef type value_type
cdef str value_obitype
if column_name not in self.view :
@ -607,7 +540,7 @@ cdef class OBIView_line :
elif value_type == bool :
value_obitype = 'OBI_BOOL'
elif value_type == str :
if only_ATGC(str2bytes(value)) : # TODO
if only_ATGC(str2bytes(value)) : # TODO detect IUPAC?
value_obitype = 'OBI_SEQ'
elif len(value) == 1 :
value_obitype = 'OBI_CHAR'
@ -669,7 +602,7 @@ cdef class OBIDMS :
view_class = OBIView_NUC_SEQS
else :
view_class = OBIView
return view_class(self, view_name)
@ -687,54 +620,84 @@ cdef class OBIDMS :
cpdef dict read_view_infos(self, str view_name) :
all_views = self.read_views()
return all_views[view_name]
cdef Obiview_infos_p view_infos_p
cdef dict view_infos_d
cdef Column_reference_p column_refs
cdef int i, j
cdef str column_name
view_infos_p = obi_view_map_file(self.pointer, str2bytes(view_name))
view_infos_d = {}
view_infos_d["name"] = bytes2str(view_infos_p.name)
view_infos_d["comments"] = bytes2str(view_infos_p.comments)
view_infos_d["view_type"] = bytes2str(view_infos_p.view_type)
view_infos_d["column_count"] = <int> view_infos_p.column_count
view_infos_d["line_count"] = <int> view_infos_p.line_count
view_infos_d["created_from"] = bytes2str(view_infos_p.created_from)
view_infos_d["creation_date"] = bytes2str(obi_format_date(view_infos_p.creation_date))
if (view_infos_p.all_lines) :
view_infos_d["line_selection"] = None
else :
view_infos_d["line_selection"] = {}
view_infos_d["line_selection"]["column_name"] = bytes2str((view_infos_p.line_selection).column_name)
view_infos_d["line_selection"]["version"] = <int> (view_infos_p.line_selection).version
view_infos_d["column_references"] = {}
column_refs = view_infos_p.column_references
for j in range(view_infos_d["column_count"]) :
column_name = bytes2str((column_refs[j]).column_name)
view_infos_d["column_references"][column_name] = {}
view_infos_d["column_references"][column_name]["version"] = column_refs[j].version
obi_view_unmap_file(self.pointer, view_infos_p)
return view_infos_d
cpdef dict read_views(self) : # TODO function that prints the dic nicely and function that prints 1 view. Add column type in col ref
cdef Obiviews_infos_all_p all_views_p
cdef Obiview_infos_p view_p
cdef Column_reference_p column_refs
cdef int nb_views
cdef int i, j
cdef str view_name
cdef str column_name
cdef dict views
cdef bytes name_b
views = {}
all_views_p = obi_read_view_infos(self.pointer)
if all_views_p == NULL :
raise Exception("No views to read")
nb_views = <int> (all_views_p.header).view_count
for i in range(nb_views) :
view_p = (<Obiview_infos_p> (all_views_p.view_infos)) + i
view_name = bytes2str(view_p.name)
views[view_name] = {}
views[view_name]["comments"] = bytes2str(view_p.comments)
views[view_name]["view_type"] = bytes2str(view_p.view_type)
views[view_name]["column_count"] = <int> view_p.column_count
views[view_name]["line_count"] = <int> view_p.line_count
views[view_name]["view_number"] = <int> view_p.view_number
views[view_name]["created_from"] = bytes2str(view_p.created_from)
views[view_name]["creation_date"] = bytes2str(obi_format_date(view_p.creation_date)) # TODO move this function in utils or somethings
if (view_p.all_lines) :
views[view_name]["line_selection"] = None
else :
views[view_name]["line_selection"] = {}
views[view_name]["line_selection"]["column_name"] = bytes2str((view_p.line_selection).column_name)
views[view_name]["line_selection"]["version"] = <int> (view_p.line_selection).version
views[view_name]["column_references"] = {}
column_refs = view_p.column_references
for j in range(views[view_name]["column_count"]) :
column_name = bytes2str((column_refs[j]).column_name)
views[view_name]["column_references"][column_name] = {}
views[view_name]["column_references"][column_name]["version"] = column_refs[j].version
obi_close_view_infos(all_views_p);
return views
# cpdef dict read_views(self) : # TODO function that prints the dic nicely and function that prints 1 view nicely. Add column type in col ref
#
# cdef Obiviews_infos_all_p all_views_p
# cdef Obiview_infos_p view_p
# cdef Column_reference_p column_refs
# cdef int nb_views
# cdef int i, j
# cdef str view_name
# cdef str column_name
# cdef dict views
# cdef bytes name_b
#
# views = {}
# all_views_p = obi_read_view_infos(self.pointer)
# if all_views_p == NULL :
# raise Exception("No views to read")
# nb_views = <int> (all_views_p.header).view_count
# for i in range(nb_views) :
# view_p = (<Obiview_infos_p> (all_views_p.view_infos)) + i
# view_name = bytes2str(view_p.name)
# views[view_name] = {}
# views[view_name]["comments"] = bytes2str(view_p.comments)
# views[view_name]["view_type"] = bytes2str(view_p.view_type)
# views[view_name]["column_count"] = <int> view_p.column_count
# views[view_name]["line_count"] = <int> view_p.line_count
# views[view_name]["view_number"] = <int> view_p.view_number
# views[view_name]["created_from"] = bytes2str(view_p.created_from)
# views[view_name]["creation_date"] = bytes2str(obi_format_date(view_p.creation_date))
# if (view_p.all_lines) :
# views[view_name]["line_selection"] = None
# else :
# views[view_name]["line_selection"] = {}
# views[view_name]["line_selection"]["column_name"] = bytes2str((view_p.line_selection).column_name)
# views[view_name]["line_selection"]["version"] = <int> (view_p.line_selection).version
# views[view_name]["column_references"] = {}
# column_refs = view_p.column_references
# for j in range(views[view_name]["column_count"]) :
# column_name = bytes2str((column_refs[j]).column_name)
# views[view_name]["column_references"][column_name] = {}
# views[view_name]["column_references"][column_name]["version"] = column_refs[j].version
#
# obi_close_view_infos(all_views_p);
#
# return views

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -17,7 +17,7 @@ cdef class OBIDMS_column_bool(OBIDMS_column):
cpdef object get_line(self, index_t line_nb):
cdef obibool_t value
cdef object result
value = obi_column_get_obibool_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0)
value = obi_column_get_obibool_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIBool_NA :
@ -29,7 +29,7 @@ cdef class OBIDMS_column_bool(OBIDMS_column):
cpdef set_line(self, index_t line_nb, object value):
if value is None :
value = OBIBool_NA
if obi_column_set_obibool_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0, <obibool_t> value) < 0:
if obi_column_set_obibool_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, <obibool_t> value) < 0:
raise Exception("Problem setting a value in a column")
@ -38,7 +38,7 @@ cdef class OBIDMS_column_multi_elts_bool(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name):
cdef obibool_t value
cdef object result
value = obi_column_get_obibool_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name))
value = obi_column_get_obibool_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name))
if obi_errno > 0 :
raise IndexError(line_nb, element_name)
if value == OBIBool_NA :
@ -56,7 +56,7 @@ cdef class OBIDMS_column_multi_elts_bool(OBIDMS_column_multi_elts):
result = {}
all_NA = True
for i in range(self.nb_elements_per_line) :
value = obi_column_get_obibool_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, i)
value = obi_column_get_obibool_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, i)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIBool_NA :
@ -73,5 +73,5 @@ cdef class OBIDMS_column_multi_elts_bool(OBIDMS_column_multi_elts):
cpdef set_item(self, index_t line_nb, str element_name, object value):
if value is None :
value = OBIBool_NA
if obi_column_set_obibool_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name), <obibool_t> value) < 0:
if obi_column_set_obibool_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), <obibool_t> value) < 0:
raise Exception("Problem setting a value in a column")

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -15,7 +15,7 @@ cdef class OBIDMS_column_char(OBIDMS_column):
cpdef object get_line(self, index_t line_nb):
cdef obichar_t value
cdef object result
value = obi_column_get_obichar_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0)
value = obi_column_get_obichar_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIChar_NA :
@ -27,7 +27,7 @@ cdef class OBIDMS_column_char(OBIDMS_column):
cpdef set_line(self, index_t line_nb, object value):
if value is None :
value = OBIChar_NA
if obi_column_set_obichar_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0, str2bytes(value)[0]) < 0:
if obi_column_set_obichar_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, str2bytes(value)[0]) < 0:
raise Exception("Problem setting a value in a column")
@ -36,7 +36,7 @@ cdef class OBIDMS_column_multi_elts_char(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name):
cdef obichar_t value
cdef object result
value = obi_column_get_obichar_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name))
value = obi_column_get_obichar_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name))
if obi_errno > 0 :
raise IndexError(line_nb, element_name)
if value == OBIChar_NA :
@ -54,7 +54,7 @@ cdef class OBIDMS_column_multi_elts_char(OBIDMS_column_multi_elts):
result = {}
all_NA = True
for i in range(self.nb_elements_per_line) :
value = obi_column_get_obichar_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, i)
value = obi_column_get_obichar_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, i)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIChar_NA :
@ -71,6 +71,6 @@ cdef class OBIDMS_column_multi_elts_char(OBIDMS_column_multi_elts):
cpdef set_item(self, index_t line_nb, str element_name, object value):
if value is None :
value = OBIChar_NA
if obi_column_set_obichar_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name), str2bytes(value)[0]) < 0:
if obi_column_set_obichar_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), str2bytes(value)[0]) < 0:
raise Exception("Problem setting a value in a column")

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -15,7 +15,7 @@ cdef class OBIDMS_column_float(OBIDMS_column):
cpdef object get_line(self, index_t line_nb):
cdef obifloat_t value
cdef object result
value = obi_column_get_obifloat_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0)
value = obi_column_get_obifloat_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIFloat_NA :
@ -27,7 +27,7 @@ cdef class OBIDMS_column_float(OBIDMS_column):
cpdef set_line(self, index_t line_nb, object value):
if value is None :
value = OBIFloat_NA
if obi_column_set_obifloat_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0, <obifloat_t> value) < 0:
if obi_column_set_obifloat_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, <obifloat_t> value) < 0:
raise Exception("Problem setting a value in a column")
@ -36,7 +36,7 @@ cdef class OBIDMS_column_multi_elts_float(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name):
cdef obifloat_t value
cdef object result
value = obi_column_get_obifloat_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name))
value = obi_column_get_obifloat_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name))
if obi_errno > 0 :
raise IndexError(line_nb, element_name)
if value == OBIFloat_NA :
@ -54,7 +54,7 @@ cdef class OBIDMS_column_multi_elts_float(OBIDMS_column_multi_elts):
result = {}
all_NA = True
for i in range(self.nb_elements_per_line) :
value = obi_column_get_obifloat_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, i)
value = obi_column_get_obifloat_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, i)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIFloat_NA :
@ -71,6 +71,6 @@ cdef class OBIDMS_column_multi_elts_float(OBIDMS_column_multi_elts):
cpdef set_item(self, index_t line_nb, str element_name, object value):
if value is None :
value = OBIFloat_NA
if obi_column_set_obifloat_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name), <obifloat_t> value) < 0:
if obi_column_set_obifloat_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), <obifloat_t> value) < 0:
raise Exception("Problem setting a value in a column")

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -17,7 +17,7 @@ cdef class OBIDMS_column_int(OBIDMS_column):
cpdef object get_line(self, index_t line_nb):
cdef obiint_t value
cdef object result
value = obi_column_get_obiint_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0)
value = obi_column_get_obiint_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIInt_NA :
@ -29,7 +29,7 @@ cdef class OBIDMS_column_int(OBIDMS_column):
cpdef set_line(self, index_t line_nb, object value):
if value is None :
value = OBIInt_NA
if obi_column_set_obiint_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0, <obiint_t> value) < 0:
if obi_column_set_obiint_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, <obiint_t> value) < 0:
raise Exception("Problem setting a value in a column")
@ -38,7 +38,7 @@ cdef class OBIDMS_column_multi_elts_int(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name):
cdef obiint_t value
cdef object result
value = obi_column_get_obiint_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name))
value = obi_column_get_obiint_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name))
if obi_errno > 0 :
raise IndexError(line_nb, element_name)
if value == OBIInt_NA :
@ -56,7 +56,7 @@ cdef class OBIDMS_column_multi_elts_int(OBIDMS_column_multi_elts):
result = {}
all_NA = True
for i in range(self.nb_elements_per_line) :
value = obi_column_get_obiint_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, i)
value = obi_column_get_obiint_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, i)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIInt_NA :
@ -73,6 +73,6 @@ cdef class OBIDMS_column_multi_elts_int(OBIDMS_column_multi_elts):
cpdef set_item(self, index_t line_nb, str element_name, object value):
if value is None :
value = OBIInt_NA
if obi_column_set_obiint_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name), <obiint_t> value) < 0:
if obi_column_set_obiint_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), <obiint_t> value) < 0:
raise Exception("Problem setting a value in a column")

View File

@ -0,0 +1,59 @@
../../../src/bloom.h
../../../src/bloom.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.h
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/obidebug.h
../../../src/obidms_taxonomy.h
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/obilittlebigman.h
../../../src/obilittlebigman.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c

View File

@ -0,0 +1,20 @@
#cython: language_level=3
from .capi.obitypes cimport index_t
from ._obidms cimport OBIDMS_column , OBIDMS_column_multi_elts
cdef class OBIDMS_column_qual(OBIDMS_column):
cpdef object get_line(self, index_t line_nb)
cpdef object get_str_line(self, index_t line_nb)
cpdef set_line(self, index_t line_nb, object value)
cpdef set_str_line(self, index_t line_nb, object value)
cdef class OBIDMS_column_multi_elts_qual(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name)
cpdef object get_str_item(self, index_t line_nb, str element_name)
cpdef object get_line(self, index_t line_nb)
cpdef object get_str_line(self, index_t line_nb)
cpdef set_item(self, index_t line_nb, str element_name, object value)
cpdef set_str_item(self, index_t line_nb, str element_name, object value)

View File

@ -0,0 +1,184 @@
#cython: language_level=3
from .capi.obiview cimport obi_column_get_obiqual_char_with_elt_name_in_view, \
obi_column_get_obiqual_char_with_elt_idx_in_view, \
obi_column_set_obiqual_char_with_elt_name_in_view, \
obi_column_set_obiqual_char_with_elt_idx_in_view, \
obi_column_get_obiqual_int_with_elt_name_in_view, \
obi_column_get_obiqual_int_with_elt_idx_in_view, \
obi_column_set_obiqual_int_with_elt_name_in_view, \
obi_column_set_obiqual_int_with_elt_idx_in_view
from .capi.obierrno cimport obi_errno
from .capi.obitypes cimport OBIQual_char_NA, OBIQual_int_NA, const_char_p
from ._obidms cimport OBIView
from obitools3.utils cimport str2bytes, bytes2str
from libc.stdlib cimport free
from libc.string cimport strcmp
from libc.stdint cimport uint8_t
from libc.stdlib cimport malloc
cdef class OBIDMS_column_qual(OBIDMS_column):
cpdef object get_line(self, index_t line_nb):
cdef const uint8_t* value
cdef int value_length
cdef object result
cdef int i
value = obi_column_get_obiqual_int_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, &value_length)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIQual_int_NA :
result = None
else :
result = []
for i in range(value_length) :
result.append(<int>value[i])
return result
cpdef object get_str_line(self, index_t line_nb):
cdef char* value
cdef object result
cdef int i
value = obi_column_get_obiqual_char_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIQual_char_NA :
result = None
else :
result = bytes2str(value)
free(value)
return result
cpdef set_line(self, index_t line_nb, object value):
cdef uint8_t* value_b
cdef int value_length
if value is None :
if obi_column_set_obiqual_int_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, OBIQual_int_NA, 0) < 0:
raise Exception("Problem setting a value in a column")
else :
value_length = len(value)
value_b = <uint8_t*> malloc(value_length * sizeof(uint8_t))
for i in range(value_length) :
value_b[i] = <uint8_t>value[i]
if obi_column_set_obiqual_int_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, value_b, value_length) < 0:
raise Exception("Problem setting a value in a column")
free(value_b)
cpdef set_str_line(self, index_t line_nb, object value):
if value is None :
if obi_column_set_obiqual_char_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, OBIQual_char_NA) < 0:
raise Exception("Problem setting a value in a column")
else :
if obi_column_set_obiqual_char_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, str2bytes(value)) < 0:
raise Exception("Problem setting a value in a column")
cdef class OBIDMS_column_multi_elts_qual(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name):
cdef const uint8_t* value
cdef int value_length
cdef object result
cdef int i
value = obi_column_get_obiqual_int_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), &value_length)
if obi_errno > 0 :
raise IndexError(line_nb, element_name)
if value == OBIQual_int_NA :
result = None
else :
result = []
for i in range(value_length) :
result.append(<int>value[i])
return result
cpdef object get_str_item(self, index_t line_nb, str element_name):
cdef char* value
cdef object result
value = obi_column_get_obiqual_char_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name))
if obi_errno > 0 :
raise IndexError(line_nb, element_name)
if value == OBIQual_char_NA :
result = None
else :
result = bytes2str(value)
free(value)
return result
cpdef object get_line(self, index_t line_nb) :
cdef const uint8_t* value
cdef int value_length
cdef object value_in_result
cdef dict result
cdef index_t i
cdef int j
cdef bint all_NA
result = {}
all_NA = True
for i in range(self.nb_elements_per_line) :
value = obi_column_get_obiqual_int_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, i, &value_length)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIQual_int_NA :
value_in_result = None
else :
value_in_result = []
for j in range(value_length) :
value_in_result.append(<int>value[j])
result[self.elements_names[i]] = value_in_result
if all_NA and (value_in_result is not None) :
all_NA = False
if all_NA :
result = None
return result
cpdef object get_str_line(self, index_t line_nb) :
cdef char* value
cdef object value_in_result
cdef dict result
cdef index_t i
cdef bint all_NA
result = {}
all_NA = True
for i in range(self.nb_elements_per_line) :
value = obi_column_get_obiqual_char_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, i)
if obi_errno > 0 :
raise IndexError(line_nb)
if value == OBIQual_char_NA :
value_in_result = None
else :
value_in_result = bytes2str(value)
free(value)
result[self.elements_names[i]] = value_in_result
if all_NA and (value_in_result is not None) :
all_NA = False
if all_NA :
result = None
return result
cpdef set_item(self, index_t line_nb, str element_name, object value):
cdef uint8_t* value_b
cdef int value_length
if value is None :
if obi_column_set_obiqual_int_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), OBIQual_int_NA, 0) < 0:
raise Exception("Problem setting a value in a column")
else :
value_length = len(value)
value_b = <uint8_t*> malloc(value_length * sizeof(uint8_t))
for i in range(value_length) :
value_b[i] = <uint8_t>value[i]
if obi_column_set_obiqual_int_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), value_b, value_length) < 0:
raise Exception("Problem setting a value in a column")
free(value_b)
cpdef set_str_item(self, index_t line_nb, str element_name, object value):
if value is None :
if obi_column_set_obiqual_char_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), OBIQual_char_NA) < 0:
raise Exception("Problem setting a value in a column")
else :
if obi_column_set_obiqual_char_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), str2bytes(value)) < 0:
raise Exception("Problem setting a value in a column")

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -1,12 +1,24 @@
#cython: language_level=3
from .capi.obitypes cimport index_t
from ._obidms cimport OBIDMS_column, OBIDMS_column_multi_elts
from ._obidms cimport OBIView, OBIDMS_column, OBIDMS_column_multi_elts
cdef class OBIDMS_column_seq(OBIDMS_column):
cpdef object get_line(self, index_t line_nb)
cpdef set_line(self, index_t line_nb, object value)
# TO DISCUSS :
# I'am not sure that this method has to be declared here
# Alignment must be declared outside of the sequence object
cpdef align(self,
OBIView score_view,
OBIDMS_column score_column,
double threshold = *,
bint normalize = *,
int reference = *,
bint similarity_mode = *)
cdef class OBIDMS_column_multi_elts_seq(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name)

View File

@ -4,13 +4,15 @@ from .capi.obiview cimport obi_column_get_obiseq_with_elt_name_in_view, \
obi_column_get_obiseq_with_elt_idx_in_view, \
obi_column_set_obiseq_with_elt_name_in_view, \
obi_column_set_obiseq_with_elt_idx_in_view
from .capi.obialign cimport obi_align_one_column
from .capi.obierrno cimport obi_errno
from .capi.obitypes cimport OBISeq_NA, const_char_p
from ._obidms cimport OBIView
from obitools3.utils cimport str2bytes, bytes2str
from libc.stdlib cimport free
from libc.string cimport strcmp
cdef class OBIDMS_column_seq(OBIDMS_column):
@ -18,41 +20,67 @@ cdef class OBIDMS_column_seq(OBIDMS_column):
cpdef object get_line(self, index_t line_nb):
cdef char* value
cdef object result
value = obi_column_get_obiseq_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0)
value = obi_column_get_obiseq_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0)
if obi_errno > 0 :
raise IndexError(line_nb)
if strcmp(value, OBISeq_NA) == 0 :
if value == OBISeq_NA :
result = None
else :
result = bytes2str(value)
free(value)
try:
result = <bytes> value
finally:
free(value)
return result
cpdef set_line(self, index_t line_nb, object value):
cdef bytes value_b
if value is None :
value_b = OBISeq_NA
else :
elif isinstance(value, bytes) :
value_b = value
elif isinstance(value, str) :
value_b = str2bytes(value)
if obi_column_set_obiseq_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0, value_b) < 0:
raise Exception("Problem setting a value in a column")
else:
raise TypeError('Sequence value must be of type Bytes, Str or None')
if obi_column_set_obiseq_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, value_b) < 0:
raise Exception("Problem setting a value in a column")
else :
if obi_column_set_obiseq_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, str2bytes(value)) < 0:
raise Exception("Problem setting a value in a column")
# TODO choose alignment type (lcs or other) with supplementary argument
cpdef align(self,
OBIView score_view,
OBIDMS_column score_column,
double threshold = 0.0,
bint normalize = True,
int reference = 0, # TODO
bint similarity_mode = True):
if (obi_align_one_column(self.view.pointer, (self.pointer)[0], score_view.pointer, (score_column.pointer)[0], threshold, normalize, reference, similarity_mode) < 0) :
raise Exception("An error occurred while aligning sequences")
cdef class OBIDMS_column_multi_elts_seq(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name):
cdef char* value
cdef object result
value = obi_column_get_obiseq_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name))
value = obi_column_get_obiseq_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name))
if obi_errno > 0 :
raise IndexError(line_nb, element_name)
if strcmp(value, OBISeq_NA) == 0 :
if value == OBISeq_NA :
result = None
else :
result = bytes2str(value)
free(value)
try:
result = <bytes> value
finally:
free(value)
return result
cpdef object get_line(self, index_t line_nb) :
cdef char* value
cdef object value_in_result
@ -62,14 +90,16 @@ cdef class OBIDMS_column_multi_elts_seq(OBIDMS_column_multi_elts):
result = {}
all_NA = True
for i in range(self.nb_elements_per_line) :
value = obi_column_get_obiseq_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, i)
value = obi_column_get_obiseq_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, i)
if obi_errno > 0 :
raise IndexError(line_nb)
if strcmp(value, OBISeq_NA) == 0 :
if value == OBISeq_NA :
value_in_result = None
else :
value_in_result = bytes2str(value)
free(value)
try:
value_in_result = <bytes> value
finally:
free(value)
result[self.elements_names[i]] = value_in_result
if all_NA and (value_in_result is not None) :
all_NA = False
@ -79,10 +109,19 @@ cdef class OBIDMS_column_multi_elts_seq(OBIDMS_column_multi_elts):
cpdef set_item(self, index_t line_nb, str element_name, object value):
cdef bytes value_b
if value is None :
value_b = OBISeq_NA
else :
elif isinstance(value, bytes) :
value_b = value
elif isinstance(value, str) :
value_b = str2bytes(value)
if obi_column_set_obiseq_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name), value_b) < 0:
else:
raise TypeError('Sequence value must be of type Bytes, Str or None')
if obi_column_set_obiseq_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), value_b) < 0:
raise Exception("Problem setting a value in a column")
# cpdef align(self, ): # TODO
# raise Exception("Columns with multiple sequences per line can't be aligned") # TODO discuss

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -9,18 +9,16 @@ from .capi.obitypes cimport OBIStr_NA, const_char_p
from obitools3.utils cimport str2bytes, bytes2str
from libc.string cimport strcmp
cdef class OBIDMS_column_str(OBIDMS_column):
cpdef object get_line(self, index_t line_nb):
cdef const_char_p value
cdef object result
value = obi_column_get_obistr_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0)
value = obi_column_get_obistr_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0)
if obi_errno > 0 :
raise IndexError(line_nb)
if strcmp(value, OBIStr_NA) == 0 :
if value == OBIStr_NA :
result = None
else :
result = bytes2str(value)
@ -28,13 +26,12 @@ cdef class OBIDMS_column_str(OBIDMS_column):
return result
cpdef set_line(self, index_t line_nb, object value):
cdef bytes value_b
if value is None :
value_b = OBIStr_NA
if obi_column_set_obistr_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, OBIStr_NA) < 0:
raise Exception("Problem setting a value in a column")
else :
value_b = str2bytes(value)
if obi_column_set_obistr_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, 0, value_b) < 0:
raise Exception("Problem setting a value in a column")
if obi_column_set_obistr_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, 0, str2bytes(value)) < 0:
raise Exception("Problem setting a value in a column")
cdef class OBIDMS_column_multi_elts_str(OBIDMS_column_multi_elts):
@ -42,10 +39,10 @@ cdef class OBIDMS_column_multi_elts_str(OBIDMS_column_multi_elts):
cpdef object get_item(self, index_t line_nb, str element_name):
cdef const_char_p value
cdef object result
value = obi_column_get_obistr_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name))
value = obi_column_get_obistr_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name))
if obi_errno > 0 :
raise IndexError(line_nb, element_name)
if strcmp(value, OBIStr_NA) == 0 :
if value == OBIStr_NA :
result = None
else :
result = bytes2str(value)
@ -61,10 +58,10 @@ cdef class OBIDMS_column_multi_elts_str(OBIDMS_column_multi_elts):
result = {}
all_NA = True
for i in range(self.nb_elements_per_line) :
value = obi_column_get_obistr_with_elt_idx_in_view(self.view, (self.pointer)[0], line_nb, i)
value = obi_column_get_obistr_with_elt_idx_in_view(self.view.pointer, (self.pointer)[0], line_nb, i)
if obi_errno > 0 :
raise IndexError(line_nb)
if strcmp(value, OBIStr_NA) == 0 :
if value == OBIStr_NA :
value_in_result = None
else :
value_in_result = bytes2str(value)
@ -82,6 +79,6 @@ cdef class OBIDMS_column_multi_elts_str(OBIDMS_column_multi_elts):
value_b = OBIStr_NA
else :
value_b = str2bytes(value)
if obi_column_set_obistr_with_elt_name_in_view(self.view, (self.pointer)[0], line_nb, str2bytes(element_name), value_b) < 0:
if obi_column_set_obistr_with_elt_name_in_view(self.view.pointer, (self.pointer)[0], line_nb, str2bytes(element_name), value_b) < 0:
raise Exception("Problem setting a value in a column")

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -4,27 +4,35 @@ from ._obidms cimport OBIView_line
cdef class OBI_Seq(dict) :
cdef str id
cdef str definition
cdef str sequence
cdef object id
cdef object definition
cdef object sequence
cpdef set_id(self, str id)
cpdef get_id(self)
cpdef set_definition(self, str definition)
cpdef get_definition(self)
cpdef get_sequence(self)
cpdef set_id(self, object id)
cpdef object get_id(self)
cpdef set_definition(self, object definition)
cpdef object get_definition(self)
cpdef object get_sequence(self)
cdef class OBI_Nuc_Seq(OBI_Seq) :
#cpdef str reverse_complement(self)
cpdef set_sequence(self, str sequence)
cdef object quality
#cpdef object reverse_complement(self)
cpdef set_sequence(self, object sequence)
cpdef set_quality(self, object quality)
cpdef object get_quality(self)
cdef class OBI_Nuc_Seq_Stored(OBIView_line) :
cpdef set_id(self, str id)
cpdef get_id(self)
cpdef set_definition(self, str definition)
cpdef get_definition(self)
cpdef set_sequence(self, str sequence)
cpdef get_sequence(self)
# cpdef str reverse_complement(self)
cpdef set_id(self, object id)
cpdef object get_id(self)
cpdef set_definition(self, object definition)
cpdef object get_definition(self)
cpdef set_sequence(self, object sequence)
cpdef object get_sequence(self)
cpdef set_quality(self, object quality)
cpdef object get_quality(self)
cpdef object get_str_quality(self)
# cpdef object reverse_complement(self)

View File

@ -4,30 +4,31 @@ from obitools3.utils cimport bytes2str, str2bytes
from .capi.obiview cimport NUC_SEQUENCE_COLUMN, \
ID_COLUMN, \
DEFINITION_COLUMN
DEFINITION_COLUMN, \
QUALITY_COLUMN
cdef class OBI_Seq(dict) :
def __init__(self, str id, str seq, str definition=None) :
def __init__(self, object id, object seq, object definition=None) :
self.set_id(id)
self.set_sequence(seq)
if definition is not None :
self.set_definition(definition)
cpdef set_id(self, str id) :
cpdef set_id(self, object id) :
self.id = id
self[bytes2str(ID_COLUMN)] = id
cpdef get_id(self) :
return self.id
cpdef set_definition(self, str definition) :
cpdef set_definition(self, object definition) :
self.definition = definition
self[bytes2str(DEFINITION_COLUMN)] = definition
cpdef get_definition(self) :
return self.definition
cpdef get_sequence(self) :
return self.sequence
@ -37,34 +38,55 @@ cdef class OBI_Seq(dict) :
cdef class OBI_Nuc_Seq(OBI_Seq) :
cpdef set_sequence(self, str sequence) :
cpdef set_sequence(self, object sequence) :
self.sequence = sequence
self[bytes2str(NUC_SEQUENCE_COLUMN)] = sequence
cpdef set_quality(self, object quality) :
self.quality = quality
self[bytes2str(QUALITY_COLUMN)] = quality
cpdef get_quality(self) :
return self.quality
# cpdef str reverse_complement(self) : TODO in C ?
# pass
cdef class OBI_Nuc_Seq_Stored(OBIView_line) :
cpdef set_id(self, str id) :
# TODO store the str version of column name macros
cpdef set_id(self, object id) :
self[bytes2str(ID_COLUMN)] = id
cpdef get_id(self) :
cpdef object get_id(self) :
return self[bytes2str(ID_COLUMN)]
cpdef set_definition(self, str definition) :
cpdef set_definition(self, object definition) :
self[bytes2str(DEFINITION_COLUMN)] = definition
cpdef get_definition(self) :
cpdef object get_definition(self) :
return self[bytes2str(DEFINITION_COLUMN)]
cpdef set_sequence(self, str sequence) :
cpdef set_sequence(self, object sequence) :
self[bytes2str(NUC_SEQUENCE_COLUMN)] = sequence
cpdef get_sequence(self) :
cpdef object get_sequence(self) :
return self[bytes2str(NUC_SEQUENCE_COLUMN)]
cpdef set_quality(self, object quality) :
if (type(quality) == list) or (quality is None) :
self[bytes2str(QUALITY_COLUMN)] = quality
else : # Quality is in str form
(((self.view).columns)[bytes2str(QUALITY_COLUMN)]).set_str_line(self.index, quality)
cpdef object get_quality(self) :
return self[bytes2str(QUALITY_COLUMN)]
cpdef object get_str_quality(self) :
return ((self.view).columns)[bytes2str(QUALITY_COLUMN)].get_str_line(self.index)
# def __str__(self) :
# return self[bytes2str(NUC_SEQUENCE_COLUMN)] # or not

View File

@ -10,6 +10,8 @@
../../../src/encode.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obi_align.h
../../../src/obi_align.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/obiblob_indexer.h
@ -21,8 +23,22 @@
../../../src/obidms_taxonomy.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obidmscolumndir.h
@ -35,17 +51,9 @@
../../../src/obitypes.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/uint8_indexer.h
../../../src/uint8_indexer.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_str.c
../../../src/obidmscolumn_str.h

View File

@ -0,0 +1,10 @@
#cython: language_level=3
from ..capi.obiview cimport Obiview_p
from ..capi.obidmscolumn cimport OBIDMS_column_p
cdef extern from "obi_align.h" nogil:
int obi_align_one_column(Obiview_p seq_view, OBIDMS_column_p seq_column, Obiview_p score_view, OBIDMS_column_p score_column, double threshold, bint normalize, int reference, bint similarity_mode)

View File

@ -11,6 +11,8 @@ from ..capi.obitypes cimport const_char_p, \
index_t, \
time_t
from libc.stdint cimport uint8_t
cdef extern from "obidmscolumn.h" nogil:
@ -194,3 +196,46 @@ cdef extern from "obidmscolumn_seq.h" nogil:
index_t line_nb,
index_t element_idx)
cdef extern from "obidmscolumn_qual.h" nogil:
int obi_column_set_obiqual_char_with_elt_name(OBIDMS_column_p column,
index_t line_nb,
const_char_p element_name,
const_char_p value)
int obi_column_set_obiqual_char_with_elt_idx(OBIDMS_column_p column,
index_t line_nb,
index_t element_idx,
const_char_p value)
int obi_column_set_obiqual_int_with_elt_name(OBIDMS_column_p column,
index_t line_nb,
const_char_p element_name,
const uint8_t* value,
int value_length)
int obi_column_set_obiqual_int_with_elt_idx(OBIDMS_column_p column,
index_t line_nb,
index_t element_idx,
const uint8_t* value,
int value_length)
char* obi_column_get_obiqual_char_with_elt_name(OBIDMS_column_p column,
index_t line_nb,
const_char_p element_name)
char* obi_column_get_obiqual_char_with_elt_idx(OBIDMS_column_p column,
index_t line_nb,
index_t element_idx)
const uint8_t* obi_column_get_obiqual_int_with_elt_name(OBIDMS_column_p column,
index_t line_nb,
const_char_p element_name,
int* value_length)
const uint8_t* obi_column_get_obiqual_int_with_elt_idx(OBIDMS_column_p column,
index_t line_nb,
index_t element_idx,
int* value_length)

View File

@ -1,7 +1,8 @@
#cython: language_level=3
from libc.stdint cimport int32_t, int64_t
from libc.stdint cimport int32_t, int64_t, uint8_t
from posix.types cimport time_t
@ -21,6 +22,7 @@ cdef extern from "obitypes.h" nogil:
OBI_FLOAT,
OBI_BOOL,
OBI_CHAR,
OBI_QUAL,
OBI_STR,
OBI_SEQ,
OBI_IDX
@ -46,5 +48,8 @@ cdef extern from "obitypes.h" nogil:
extern obibool_t OBIBool_NA
extern const_char_p OBISeq_NA
extern const_char_p OBIStr_NA
extern const_char_p OBIQual_char_NA
extern uint8_t* OBIQual_int_NA
const_char_p name_data_type(int data_type)

View File

@ -12,6 +12,8 @@ from .obitypes cimport const_char_p, \
from ..capi.obidms cimport OBIDMS_p
from ..capi.obidmscolumn cimport OBIDMS_column_p
from libc.stdint cimport uint8_t
cdef extern from "obiview.h" nogil:
@ -19,21 +21,7 @@ cdef extern from "obiview.h" nogil:
extern const_char_p NUC_SEQUENCE_COLUMN
extern const_char_p ID_COLUMN
extern const_char_p DEFINITION_COLUMN
struct Obiview_t :
OBIDMS_p dms
const_char_p name
const_char_p created_from
const_char_p view_type
bint read_only
OBIDMS_column_p line_selection
OBIDMS_column_p new_line_selection
index_t line_count
int column_count
OBIDMS_column_p columns
const_char_p comments
ctypedef Obiview_t* Obiview_p
extern const_char_p QUALITY_COLUMN
struct Column_reference_t :
@ -44,7 +32,6 @@ cdef extern from "obiview.h" nogil:
struct Obiview_infos_t :
int view_number
time_t creation_date
const_char_p name
const_char_p created_from
@ -59,19 +46,15 @@ cdef extern from "obiview.h" nogil:
ctypedef Obiview_infos_t* Obiview_infos_p
struct Obiviews_header_t :
size_t header_size
size_t views_size
int view_count
ctypedef Obiviews_header_t* Obiviews_header_p
struct Obiview_t :
Obiview_infos_p infos
OBIDMS_p dms
bint read_only
OBIDMS_column_p line_selection
OBIDMS_column_p new_line_selection
OBIDMS_column_p columns
struct Obiviews_infos_all_t :
Obiviews_header_p header
Obiview_infos_p view_infos
ctypedef Obiviews_infos_all_t* Obiviews_infos_all_p
ctypedef Obiview_t* Obiview_p
Obiview_p obi_new_view_nuc_seqs(OBIDMS_p dms, const_char_p view_name, Obiview_p view_to_clone, index_t* line_selection, const_char_p comments)
@ -82,6 +65,10 @@ cdef extern from "obiview.h" nogil:
Obiview_p obi_new_view_nuc_seqs_cloned_from_name(OBIDMS_p dms, const_char_p view_name, const_char_p view_to_clone_name, index_t* line_selection, const_char_p comments)
Obiview_infos_p obi_view_map_file(OBIDMS_p dms, const char* view_name)
int obi_view_unmap_file(OBIDMS_p dms, Obiview_infos_p view_infos)
Obiview_p obi_open_view(OBIDMS_p dms, const_char_p view_name)
int obi_view_add_column(Obiview_p view,
@ -110,10 +97,6 @@ cdef extern from "obiview.h" nogil:
int obi_close_view(Obiview_p view)
int obi_save_and_close_view(Obiview_p view)
Obiviews_infos_all_p obi_read_view_infos(OBIDMS_p dms)
int obi_close_view_infos(Obiviews_infos_all_p views)
int obi_column_set_obiint_with_elt_name_in_view(Obiview_p view,
OBIDMS_column_p column,
@ -203,6 +186,54 @@ cdef extern from "obiview.h" nogil:
index_t line_nb,
index_t element_idx)
int obi_column_set_obiqual_char_with_elt_idx_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,
index_t element_idx,
const char* value)
int obi_column_set_obiqual_int_with_elt_idx_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,
index_t element_idx,
const uint8_t* value,
int value_length)
char* obi_column_get_obiqual_char_with_elt_idx_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,
index_t element_idx)
const uint8_t* obi_column_get_obiqual_int_with_elt_idx_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,
index_t element_idx,
int* value_length)
int obi_column_set_obiqual_char_with_elt_name_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,
const char* element_name,
const char* value)
int obi_column_set_obiqual_int_with_elt_name_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,
const char* element_name,
const uint8_t* value,
int value_length)
char* obi_column_get_obiqual_char_with_elt_name_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,
const char* element_name)
const uint8_t* obi_column_get_obiqual_int_with_elt_name_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,
const char* element_name,
int* value_length)
int obi_column_set_obistr_with_elt_name_in_view(Obiview_p view,
OBIDMS_column_p column,
index_t line_nb,

View File

@ -36,7 +36,7 @@ if __name__ == '__main__':
if l['score'] > 350 :
line_selec.append(i)
i+=1
new_v = d.new_view(args.new_view, view_to_clone=v, line_selection=line_selec, view_type="NUC_SEQS_VIEW", comments="obigrep "+args.view+" to "+args.new_view) #args.key+" "+str(args.comparison)+" "+str(args.value)+" "+)
print("\n")

View File

@ -1,5 +1,6 @@
#cython: language_level=3
from ..utils cimport str2bytes
from .header cimport parseHeader
from ..files.universalopener cimport uopen
from ..files.linebuffer cimport LineBuffer

View File

@ -6,12 +6,15 @@ Created on 30 mars 2016
@author: coissac
'''
def fastaIterator(lineiterator, int buffersize=100000000):
cdef LineBuffer lb
cdef str ident
cdef str definition
cdef dict tags
cdef list s
cdef bytes sequence
cdef bytes quality
if isinstance(lineiterator,(str,bytes)):
lineiterator=uopen(lineiterator)
@ -28,10 +31,15 @@ def fastaIterator(lineiterator, int buffersize=100000000):
ident,tags,definition = parseHeader(line)
s = []
line = next(i)
while line[0]!='>':
s.append(line[0:-1])
line = next(i)
sequence = "".join(s)
try:
while line[0]!='>':
s.append(str2bytes(line)[0:-1])
line = next(i)
except StopIteration:
pass
sequence = b"".join(s)
quality = None
yield { "id" : ident,
@ -42,5 +50,6 @@ def fastaIterator(lineiterator, int buffersize=100000000):
"annotation" : {}
}

View File

@ -1,5 +1,7 @@
#cython: language_level=3
from ..utils cimport str2bytes
from .header cimport parseHeader
from ..files.universalopener cimport uopen
from ..files.linebuffer cimport LineBuffer

View File

@ -6,13 +6,13 @@ Created on 30 mars 2016
@author: coissac
'''
def fastqIterator(lineiterator, int buffersize=100000000):
cdef LineBuffer lb
cdef str ident
cdef str definition
cdef dict tags
cdef bytes sequence
cdef bytes quality
if isinstance(lineiterator,(str,bytes)):
lineiterator=uopen(lineiterator)
@ -25,9 +25,9 @@ def fastqIterator(lineiterator, int buffersize=100000000):
i = iter(lb)
for line in i:
ident,tags,definition = parseHeader(line)
sequence = next(i)[0:-1]
sequence = str2bytes(next(i)[0:-1])
next(i)
quality = next(i)[0:-1]
quality = str2bytes(next(i)[0:-1])
yield { "id" : ident,
"definition" : definition,
@ -38,4 +38,3 @@ def fastqIterator(lineiterator, int buffersize=100000000):
}

View File

@ -30,20 +30,27 @@ __re_dict__ = re.compile("""^\{\ *
)
)*\ *\}$""", re.VERBOSE)
__re_val__ = re.compile("""(("[^"]*"|'[^']*') *: *([^,}]+|"[^"]*"|'[^']*') *[,}] *)""")
cdef object __etag__(str x):
cdef list elements
cdef tuple i
if __re_int__.match(x):
v=int(x)
elif __re_float__.match(x):
v=float(x)
elif __re_str__.match(x):
v=x[1:-1]
elif x=='None':
v=None
elif x=='False':
v=False
elif x=='True':
v=True
elif __re_dict__.match(x):
v=eval(x)
elements=__re_val__.findall(x)
v=dict([(i[1][1:-1],__etag__(i[2])) for i in elements])
else:
v=x
return v
@ -56,7 +63,7 @@ cpdef tuple parseHeader(str header):
cdef str second
m=header[1:-1].split(maxsplit=1)
ident=m[0]
if len(m)==1:
@ -75,4 +82,4 @@ cpdef tuple parseHeader(str header):
return ident,tags,definition

View File

@ -1,7 +1,5 @@
#cython: language_level=3
import sys
import io
cdef bytes str2bytes(str string):
"""

961
src/_sse.h Normal file
View File

@ -0,0 +1,961 @@
#ifndef _SSE_H_
#define _SSE_H_
#include <string.h>
#include <inttypes.h>
#ifdef __SSE2__
#include <xmmintrin.h>
#else
typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
#endif /* __SSE2__ */
#ifndef MAX
#define MAX(x,y) (((x)>(y)) ? (x):(y))
#define MIN(x,y) (((x)<(y)) ? (x):(y))
#endif
#define ALIGN __attribute__((aligned(16)))
typedef __m128i vUInt8;
typedef __m128i vInt8;
typedef __m128i vUInt16;
typedef __m128i vInt16;
typedef __m128i vUInt64;
typedef union
{
__m128i i;
int64_t s64[ 2];
int16_t s16[ 8];
int8_t s8 [16];
uint8_t u8 [16];
uint16_t u16[8 ];
uint32_t u32[4 ];
uint64_t u64[2 ];
} um128;
typedef union
{
vUInt8 m;
uint8_t c[16];
} uchar_v;
typedef union
{
vUInt16 m;
uint16_t c[8];
} ushort_v;
typedef union
{
vUInt64 m;
uint64_t c[2];
} uint64_v;
#ifdef __SSE2__
static inline int8_t _s2_extract_epi8(__m128i r, const int p)
{
#define ACTIONP(r,x) return _mm_extract_epi16(r,x) & 0xFF
#define ACTIONI(r,x) return _mm_extract_epi16(r,x) >> 8
switch (p) {
case 0: ACTIONP(r,0);
case 1: ACTIONI(r,0);
case 2: ACTIONP(r,1);
case 3: ACTIONI(r,1);
case 4: ACTIONP(r,2);
case 5: ACTIONI(r,2);
case 6: ACTIONP(r,3);
case 7: ACTIONI(r,3);
case 8: ACTIONP(r,4);
case 9: ACTIONI(r,4);
case 10: ACTIONP(r,5);
case 11: ACTIONI(r,5);
case 12: ACTIONP(r,6);
case 13: ACTIONI(r,6);
case 14: ACTIONP(r,7);
case 15: ACTIONI(r,7);
}
#undef ACTIONP
#undef ACTIONI
return 0;
}
static inline __m128i _s2_max_epi8(__m128i a, __m128i b)
{
__m128i mask = _mm_cmpgt_epi8( a, b );
a = _mm_and_si128 (a,mask );
b = _mm_andnot_si128(mask,b);
return _mm_or_si128(a,b);
}
static inline __m128i _s2_min_epi8(__m128i a, __m128i b)
{
__m128i mask = _mm_cmplt_epi8( a, b );
a = _mm_and_si128 (a,mask );
b = _mm_andnot_si128(mask,b);
return _mm_or_si128(a,b);
}
static inline __m128i _s2_insert_epi8(__m128i r, int b, const int p)
{
#define ACTIONP(r,x) return _mm_insert_epi16(r,(_mm_extract_epi16(r,x) & 0xFF00) | (b & 0x00FF),x)
#define ACTIONI(r,x) return _mm_insert_epi16(r,(_mm_extract_epi16(r,x) & 0x00FF) | ((b << 8)& 0xFF00),x)
switch (p) {
case 0: ACTIONP(r,0);
case 1: ACTIONI(r,0);
case 2: ACTIONP(r,1);
case 3: ACTIONI(r,1);
case 4: ACTIONP(r,2);
case 5: ACTIONI(r,2);
case 6: ACTIONP(r,3);
case 7: ACTIONI(r,3);
case 8: ACTIONP(r,4);
case 9: ACTIONI(r,4);
case 10: ACTIONP(r,5);
case 11: ACTIONI(r,5);
case 12: ACTIONP(r,6);
case 13: ACTIONI(r,6);
case 14: ACTIONP(r,7);
case 15: ACTIONI(r,7);
}
#undef ACTIONP
#undef ACTIONI
return _mm_setzero_si128();
}
// Fill a SSE Register with 16 time the same 8bits integer value
#define _MM_SET1_EPI8(x) _mm_set1_epi8(x)
#define _MM_INSERT_EPI8(r,x,i) _s2_insert_epi8((r),(x),(i))
#define _MM_CMPEQ_EPI8(x,y) _mm_cmpeq_epi8((x),(y))
#define _MM_CMPGT_EPI8(x,y) _mm_cmpgt_epi8((x),(y))
#define _MM_CMPLT_EPI8(x,y) _mm_cmplt_epi8((x),(y))
#define _MM_MAX_EPI8(x,y) _s2_max_epi8((x),(y))
#define _MM_MIN_EPI8(x,y) _s2_min_epi8((x),(y))
#define _MM_ADD_EPI8(x,y) _mm_add_epi8((x),(y))
#define _MM_SUB_EPI8(x,y) _mm_sub_epi8((x),(y))
#define _MM_EXTRACT_EPI8(r,p) _s2_extract_epi8((r),(p))
#define _MM_MIN_EPU8(x,y) _mm_min_epu8((x),(y))
// Fill a SSE Register with 8 time the same 16bits integer value
#define _MM_SET1_EPI16(x) _mm_set1_epi16(x)
#define _MM_INSERT_EPI16(r,x,i) _mm_insert_epi16((r),(x),(i))
#define _MM_CMPEQ_EPI16(x,y) _mm_cmpeq_epi16((x),(y))
#define _MM_CMPGT_EPI16(x,y) _mm_cmpgt_epi16((x),(y))
#define _MM_CMPGT_EPU16(x,y) _mm_cmpgt_epu16((x),(y)) // n'existe pas ??
#define _MM_CMPLT_EPI16(x,y) _mm_cmplt_epi16((x),(y))
#define _MM_MAX_EPI16(x,y) _mm_max_epi16((x),(y))
#define _MM_MIN_EPI16(x,y) _mm_min_epi16((x),(y))
#define _MM_ADD_EPI16(x,y) _mm_add_epi16((x),(y))
#define _MM_SUB_EPI16(x,y) _mm_sub_epi16((x),(y))
#define _MM_EXTRACT_EPI16(r,p) _mm_extract_epi16((r),(p))
#define _MM_UNPACKLO_EPI8(a,b) _mm_unpacklo_epi8((a),(b))
#define _MM_UNPACKHI_EPI8(a,b) _mm_unpackhi_epi8((a),(b))
#define _MM_ADDS_EPU16(x,y) _mm_adds_epu16((x),(y))
// Multiplication
#define _MM_MULLO_EPI16(x,y) _mm_mullo_epi16((x), (y))
#define _MM_SRLI_EPI64(r,x) _mm_srli_epi64((r),(x))
#define _MM_SLLI_EPI64(r,x) _mm_slli_epi64((r),(x))
// Set a SSE Register to 0
#define _MM_SETZERO_SI128() _mm_setzero_si128()
#define _MM_AND_SI128(x,y) _mm_and_si128((x),(y))
#define _MM_ANDNOT_SI128(x,y) _mm_andnot_si128((x),(y))
#define _MM_OR_SI128(x,y) _mm_or_si128((x),(y))
#define _MM_XOR_SI128(x,y) _mm_xor_si128((x),(y))
#define _MM_SLLI_SI128(r,s) _mm_slli_si128((r),(s))
#define _MM_SRLI_SI128(r,s) _mm_srli_si128((r),(s))
// Load a SSE register from an unaligned address
#define _MM_LOADU_SI128(x) _mm_loadu_si128(x)
// Load a SSE register from an aligned address (/!\ not defined when SSE not available)
#define _MM_LOAD_SI128(x) _mm_load_si128(x)
// #define _MM_UNPACKLO_EPI8(x,y) _mm_unpacklo_epi8((x),(y))
#else /* __SSE2__ Not defined */
static inline __m128i _em_set1_epi8(int x)
{
um128 a;
x&=0xFF;
a.s8[0]=x;
a.s8[1]=x;
a.u16[1]=a.u16[0];
a.u32[1]=a.u32[0];
a.u64[1]=a.u64[0];
return a.i;
}
static inline __m128i _em_insert_epi8(__m128i r, int x, const int i)
{
um128 a;
a.i=r;
a.s8[i]=x & 0xFF;
return a.i;
}
static inline __m128i _em_cmpeq_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s8[z]=(x.s8[z]==y.s8[z]) ? 0xFF:0
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return r.i;
}
static inline __m128i _em_cmpgt_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s8[z]=(x.s8[z]>y.s8[z]) ? 0xFF:0
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return r.i;
}
static inline __m128i _em_cmplt_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s8[z]=(x.s8[z]<y.s8[z]) ? 0xFF:0
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return r.i;
}
static inline __m128i _em_max_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s8[z]=MAX(x.s8[z],y.s8[z])
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return r.i;
}
static inline __m128i _em_min_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s8[z]=MIN(x.s8[z],y.s8[z])
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return r.i;
}
static inline __m128i _em_add_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s8[z]=x.s8[z]+y.s8[z]
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return r.i;
}
static inline __m128i _em_sub_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s8[z]=x.s8[z]+y.s8[z]
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return r.i;
}
static inline int _em_extract_epi8(__m128i r, const int i)
{
um128 a;
a.i=r;
return a.s8[i] & 0xFF;
}
static inline __m128i _em_min_epu8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.u8[z]=MIN(x.u8[z],y.u8[z])
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return r.i;
}
static inline __m128i _em_set1_epi16(int x)
{
um128 a;
x&=0xFFFF;
a.s16[0]=x;
a.s16[1]=x;
a.u32[1]=a.u32[0];
a.u64[1]=a.u64[0];
return a.i;
}
static inline __m128i _em_insert_epi16(__m128i r, int x, const int i)
{
um128 a;
a.i=r;
a.s16[i]=x & 0xFFFF;
return a.i;
}
static inline __m128i _em_cmpeq_epi16(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=(x.s16[z]==y.s16[z]) ? 0xFFFF:0
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_cmpgt_epi16(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=(x.s16[z]>y.s16[z]) ? 0xFFFF:0
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_cmplt_epi16(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=(x.s16[z]<y.s16[z]) ? 0xFFFF:0
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_max_epi16(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=MAX(x.s16[z],y.s16[z])
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_min_epi16(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=MIN(x.s16[z],y.s16[z])
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_add_epi16(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=x.s16[z]+y.s16[z]
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_sub_epi16(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=x.s16[z]+y.s16[z]
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline int _em_extract_epi16(__m128i r, const int i)
{
um128 a;
a.i=r;
return a.s16[i] & 0xFFFF;
}
static inline __m128i _em_unpacklo_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=(((int16_t)(y.s8[z])) << 8) | (int16_t)(x.s8[z])
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_unpackhi_epi8(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.s16[z]=(((int16_t)(y.s8[z+8])) << 8) | (int16_t)(x.s8[z+8])
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_adds_epu16(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.u16[z]=x.u16[z]+y.u16[z]
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
#undef R
return r.i;
}
static inline __m128i _em_srli_epi64(__m128i a, int b)
{
um128 x;
x.i=a;
x.s64[0]>>=b;
x.s64[1]>>=b;
return x.i;
}
static inline __m128i _em_slli_epi64(__m128i a, int b)
{
um128 x;
x.i=a;
x.s64[0]<<=b;
x.s64[1]<<=b;
return x.i;
}
static inline __m128i _em_setzero_si128()
{
um128 x;
x.s64[0]=x.s64[1]=0;
return x.i;
}
static inline __m128i _em_and_si128(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.u64[z]=x.u64[z] & y.u64[z]
R(0);
R(1);
#undef R
return r.i;
}
static inline __m128i _em_andnot_si128(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.u64[z]=(~x.u64[z]) & y.u64[z]
R(0);
R(1);
#undef R
return r.i;
}
static inline __m128i _em_or_si128(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.u64[z]=x.u64[z] | y.u64[z]
R(0);
R(1);
#undef R
return r.i;
}
static inline __m128i _em_xor_si128(__m128i a, __m128i b)
{
um128 x;
um128 y;
um128 r;
x.i=a;
y.i=b;
#define R(z) r.u64[z]=x.u64[z] ^ y.u64[z]
R(0);
R(1);
#undef R
return r.i;
}
static inline __m128i _em_slli_si128(__m128i a, int b)
{
um128 x;
x.i=a;
#define R(z) x.u8[z]=(z>=b) ? x.u8[z-b]:0
R(15);
R(14);
R(13);
R(12);
R(11);
R(10);
R(9);
R(8);
R(7);
R(6);
R(5);
R(4);
R(3);
R(2);
R(1);
R(0);
#undef R
return x.i;
}
static inline __m128i _em_srli_si128(__m128i a, int b)
{
um128 x;
x.i=a;
#define R(z) x.u8[z]=((b+z) > 15) ? 0:x.u8[z+b]
R(0);
R(1);
R(2);
R(3);
R(4);
R(5);
R(6);
R(7);
R(8);
R(9);
R(10);
R(11);
R(12);
R(13);
R(14);
R(15);
#undef R
return x.i;
}
inline static __m128i _em_loadu_si128(__m128i const *P)
{
um128 tmp;
um128 *pp=(um128*)P;
tmp.u8[0]=(*pp).u8[0];
tmp.u8[1]=(*pp).u8[1];
tmp.u8[2]=(*pp).u8[2];
tmp.u8[3]=(*pp).u8[3];
tmp.u8[4]=(*pp).u8[4];
tmp.u8[5]=(*pp).u8[5];
tmp.u8[6]=(*pp).u8[6];
tmp.u8[7]=(*pp).u8[7];
tmp.u8[8]=(*pp).u8[8];
tmp.u8[9]=(*pp).u8[9];
tmp.u8[10]=(*pp).u8[10];
tmp.u8[11]=(*pp).u8[11];
tmp.u8[12]=(*pp).u8[12];
tmp.u8[13]=(*pp).u8[13];
tmp.u8[14]=(*pp).u8[14];
tmp.u8[15]=(*pp).u8[15];
return tmp.i;
}
#define _MM_SET1_EPI8(x) _em_set1_epi8(x)
#define _MM_INSERT_EPI8(r,x,i) _em_insert_epi8((r),(x),(i))
#define _MM_CMPEQ_EPI8(x,y) _em_cmpeq_epi8((x),(y))
#define _MM_CMPGT_EPI8(x,y) _em_cmpgt_epi8((x),(y))
#define _MM_CMPLT_EPI8(x,y) _em_cmplt_epi8((x),(y))
#define _MM_MAX_EPI8(x,y) _em_max_epi8((x),(y))
#define _MM_MIN_EPI8(x,y) _em_min_epi8((x),(y))
#define _MM_ADD_EPI8(x,y) _em_add_epi8((x),(y))
#define _MM_SUB_EPI8(x,y) _em_sub_epi8((x),(y))
#define _MM_EXTRACT_EPI8(r,p) _em_extract_epi8((r),(p))
#define _MM_MIN_EPU8(x,y) _em_min_epu8((x),(y))
#define _MM_SET1_EPI16(x) _em_set1_epi16(x)
#define _MM_INSERT_EPI16(r,x,i) _em_insert_epi16((r),(x),(i))
#define _MM_CMPEQ_EPI16(x,y) _em_cmpeq_epi16((x),(y))
#define _MM_CMPGT_EPI16(x,y) _em_cmpgt_epi16((x),(y))
#define _MM_CMPLT_EPI16(x,y) _em_cmplt_epi16((x),(y))
#define _MM_MAX_EPI16(x,y) _em_max_epi16((x),(y))
#define _MM_MIN_EPI16(x,y) _em_min_epi16((x),(y))
#define _MM_ADD_EPI16(x,y) _em_add_epi16((x),(y))
#define _MM_SUB_EPI16(x,y) _em_sub_epi16((x),(y))
#define _MM_EXTRACT_EPI16(r,p) _em_extract_epi16((r),(p))
#define _MM_UNPACKLO_EPI8(a,b) _em_unpacklo_epi8((a),(b))
#define _MM_UNPACKHI_EPI8(a,b) _em_unpackhi_epi8((a),(b))
#define _MM_ADDS_EPU16(x,y) _em_adds_epu16((x),(y))
#define _MM_SRLI_EPI64(r,x) _em_srli_epi64((r),(x))
#define _MM_SLLI_EPI64(r,x) _em_slli_epi64((r),(x))
#define _MM_SETZERO_SI128() _em_setzero_si128()
#define _MM_AND_SI128(x,y) _em_and_si128((x),(y))
#define _MM_ANDNOT_SI128(x,y) _em_andnot_si128((x),(y))
#define _MM_OR_SI128(x,y) _em_or_si128((x),(y))
#define _MM_XOR_SI128(x,y) _em_xor_si128((x),(y))
#define _MM_SLLI_SI128(r,s) _em_slli_si128((r),(s))
#define _MM_SRLI_SI128(r,s) _em_srli_si128((r),(s))
#define _MM_LOADU_SI128(x) _em_loadu_si128(x)
#define _MM_LOAD_SI128(x) _em_loadu_si128(x)
#endif /* __SSE2__ */
#define _MM_NOT_SI128(x) _MM_XOR_SI128((x),(_MM_SET1_EPI8(0xFFFF)))
#endif

View File

@ -14,6 +14,7 @@
#include <stdio.h>
#include <math.h>
#include "char_str_indexer.h"
#include "obiblob.h"
#include "obiblob_indexer.h"
#include "obidebug.h"
@ -25,24 +26,16 @@
Obi_blob_p obi_str_to_blob(const char* value)
{
Obi_blob_p value_b;
int32_t length;
int32_t length;
// Compute the number of bytes on which the value will be encoded
length = strlen(value) + 1; // +1 to store \0 at the end (makes retrieving faster)
value_b = obi_blob((byte_t*)value, ELEMENT_SIZE_STR, length, length);
if (value_b == NULL)
{
obidebug(1, "\nError encoding a character string in a blob");
return NULL;
}
return value_b;
return obi_blob((byte_t*)value, ELEMENT_SIZE_STR, length, length);
}
char* obi_blob_to_str(Obi_blob_p value_b)
const char* obi_blob_to_str(Obi_blob_p value_b)
{
return value_b->value;
}
@ -67,7 +60,7 @@ index_t obi_index_char_str(Obi_indexer_p indexer, const char* value)
}
char* obi_retrieve_char_str(Obi_indexer_p indexer, index_t idx)
const char* obi_retrieve_char_str(Obi_indexer_p indexer, index_t idx)
{
Obi_blob_p value_b;

View File

@ -35,7 +35,7 @@
* @since October 2015
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
Obi_blob_p obi_str_to_blob(char* value);
Obi_blob_p obi_str_to_blob(const char* value);
/**
@ -80,7 +80,7 @@ index_t obi_index_char_str(Obi_indexer_p indexer, const char* value);
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
char* obi_retrieve_char_str(Obi_indexer_p indexer, index_t idx);
const char* obi_retrieve_char_str(Obi_indexer_p indexer, index_t idx);
#endif /* CHAR_STR_INDEXER_H_ */

View File

@ -14,6 +14,7 @@
#include <stdio.h>
#include <math.h>
#include "dna_seq_indexer.h"
#include "obiblob.h"
#include "obiblob_indexer.h"
#include "obidebug.h"

109
src/obi_align.c Normal file
View File

@ -0,0 +1,109 @@
/****************************************************************************
* Sequence alignment functions *
****************************************************************************/
/**
* @file obi_align.c
* @author Celine Mercier
* @date May 4th 2016
* @brief Functions handling sequence alignments.
*/
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include "obidebug.h"
#include "obierrno.h"
#include "obitypes.h"
#include "obiview.h"
#include "sse_banded_LCS_alignment.h"
#define DEBUG_LEVEL 0 // TODO has to be defined somewhere else (cython compil flag?)
// TODO
// use openMP pragmas : garder scores en memoire et ecrire a la fin? (normalement c bon avec index)
// tout ecrire en stdint?
// check NUC_SEQS and score type (int or float if normalize)
// what's with multiple sequence/line columns?
// make function that put blobs in int16
int obi_align_one_column(Obiview_p seq_view, OBIDMS_column_p seq_column, Obiview_p score_view, OBIDMS_column_p score_column, double threshold, bool normalize, int reference, bool similarity_mode)
{
index_t i, j, k;
index_t seq_count;
char* seq1;
char* seq2;
double score;
k = 0;
if ((seq_column->header)->returned_data_type != OBI_SEQ)
{
obi_set_errno(OBI_ALIGN_ERROR);
obidebug(1, "\nTrying to align a column of a different type than OBI_SEQ");
return -1;
}
if ((normalize && ((score_column->header)->returned_data_type != OBI_FLOAT)) ||
(!normalize && ((score_column->header)->returned_data_type != OBI_INT)))
{
obi_set_errno(OBI_ALIGN_ERROR);
obidebug(1, "\nTrying to store alignment scores in a column of an inappropriate type");
return -1;
}
seq_count = (seq_column->header)->lines_used;
for (i=0; i < (seq_count - 1); i++)
{
for (j=i+1; j < seq_count; j++)
{
//fprintf(stderr, "\ni=%lld, j=%lld, k=%lld", i, j, k);
seq1 = obi_column_get_obiseq_with_elt_idx_in_view(seq_view, seq_column, i, 0);
seq2 = obi_column_get_obiseq_with_elt_idx_in_view(seq_view, seq_column, j, 0);
if ((seq1 == NULL) || (seq2 == NULL))
{
obidebug(1, "\nError retrieving sequences to align");
return -1;
}
// TODO filter
score = generic_sse_banded_lcs_align(seq1, seq2, threshold, normalize, reference, similarity_mode);
if (normalize)
{
if (obi_column_set_obifloat_with_elt_idx_in_view(score_view, score_column, k, 0, (obifloat_t) score) < 0)
{
obidebug(1, "\nError writing alignment score in a column");
return -1;
}
}
else
{
if (obi_column_set_obiint_with_elt_idx_in_view(score_view, score_column, k, 0, (obiint_t) score) < 0)
{
obidebug(1, "\nError writing alignment score in a column");
return -1;
}
}
k++;
}
}
return 0;
}
int obi_align_two_columns(OBIDMS_column_p seq_column_1, OBIDMS_column_p seq_column_2, OBIDMS_column_p score_column, double threshold, bool normalize, int reference, bool similarity_mode)
{
// TODO
return 0;
}

37
src/obi_align.h Normal file
View File

@ -0,0 +1,37 @@
/****************************************************************************
* Sequence alignment functions header file *
****************************************************************************/
/**
* @file obi_align.h
* @author Celine Mercier
* @date May 11th 2016
* @brief Header file for the functions handling the alignment of DNA sequences.
*/
#ifndef OBI_ALIGN_H_
#define OBI_ALIGN_H_
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include "obidms.h"
#include "obidmscolumn.h"
#include "obitypes.h"
/**
* @brief
*
* TODO
*
*/
int obi_align_one_column(Obiview_p seq_view, OBIDMS_column_p seq_column, Obiview_p score_view, OBIDMS_column_p score_column, double threshold, bool normalize, int reference, bool similarity_mode);
#endif /* OBI_ALIGN_H_ */

View File

@ -107,6 +107,42 @@ static char* build_avl_file_name(const char* avl_name);
static char* build_avl_data_file_name(const char* avl_name);
/**
* @brief Internal function building the full path of an AVL tree file.
*
* @warning The returned pointer has to be freed by the caller.
*
* @param dms A pointer to the OBIDMS to which the AVL tree belongs.
* @param avl_name The name of the AVL tree.
* @param avl_idx The index of the AVL if it's part of an AVL group, or -1 if not.
*
* @returns A pointer to the full path of the file where the AVL tree is stored.
* @retval NULL if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
static char* get_full_path_of_avl_file_name(OBIDMS_p dms, const char* avl_name, int avl_idx);
/**
* @brief Internal function building the file name for an AVL data file.
*
* @warning The returned pointer has to be freed by the caller.
*
* @param dms A pointer to the OBIDMS to which the AVL tree belongs.
* @param avl_name The name of the AVL tree.
* @param avl_idx The index of the AVL if it's part of an AVL group, or -1 if not.
*
* @returns A pointer to the full path of the file where the data referred to by the AVL tree is stored.
* @retval NULL if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
static char* get_full_path_of_avl_data_file_name(OBIDMS_p dms, const char* avl_name, int avl_idx);
/**
* @brief Internal function returning the size of an AVL tree header on this platform,
* including the size of the bloom filter associated with the AVL tree.
@ -253,9 +289,12 @@ int remap_an_avl(OBIDMS_avl_p avl);
/**
* @brief Internal function (re)mapping the tree and data parts of an AVL tree structure.
* @brief Internal function creating and adding a new AVL in an AVL group.
*
* @param avl A pointer to the AVL tree group structure.
* @warning The previous AVL in the list of the group is unmapped,
* if it's not the 1st AVL being added.
*
* @param avl A pointer on the AVL tree group structure.
*
* @retval 0 if the operation was successfully completed.
* @retval -1 if an error occurred.
@ -547,6 +586,102 @@ static char* build_avl_data_file_name(const char* avl_name)
}
static char* get_full_path_of_avl_file_name(OBIDMS_p dms, const char* avl_name, int avl_idx)
{
char* complete_avl_name;
char* full_path;
char* avl_file_name;
if (avl_idx >= 0)
{
complete_avl_name = build_avl_name_with_idx(avl_name, avl_idx);
if (complete_avl_name == NULL)
return NULL;
}
else
{
complete_avl_name = (char*) malloc((strlen(avl_name)+1)*sizeof(char));
if (complete_avl_name == NULL)
{
obi_set_errno(OBI_MALLOC_ERROR);
obidebug(1, "\nError allocating memory for an AVL name");
return NULL;
}
strcpy(complete_avl_name, avl_name);
}
avl_file_name = build_avl_file_name(complete_avl_name);
if (avl_file_name == NULL)
{
free(complete_avl_name);
return NULL;
}
full_path = get_full_path_of_avl_dir(dms, avl_name);
if (full_path == NULL)
{
free(complete_avl_name);
free(avl_file_name);
return NULL;
}
strcat(full_path, "/");
strcat(full_path, avl_file_name);
free(complete_avl_name);
return full_path;
}
static char* get_full_path_of_avl_data_file_name(OBIDMS_p dms, const char* avl_name, int avl_idx)
{
char* complete_avl_name;
char* full_path;
char* avl_data_file_name;
if (avl_idx >= 0)
{
complete_avl_name = build_avl_name_with_idx(avl_name, avl_idx);
if (complete_avl_name == NULL)
return NULL;
}
else
{
complete_avl_name = (char*) malloc((strlen(avl_name)+1)*sizeof(char));
if (complete_avl_name == NULL)
{
obi_set_errno(OBI_MALLOC_ERROR);
obidebug(1, "\nError allocating memory for an AVL name");
return NULL;
}
strcpy(complete_avl_name, avl_name);
}
avl_data_file_name = build_avl_data_file_name(complete_avl_name);
if (avl_data_file_name == NULL)
{
free(complete_avl_name);
return NULL;
}
full_path = get_full_path_of_avl_dir(dms, avl_name);
if (full_path == NULL)
{
free(complete_avl_name);
free(avl_data_file_name);
return NULL;
}
strcat(full_path, "/");
strcat(full_path, avl_data_file_name);
free(complete_avl_name);
return full_path;
}
size_t get_avl_header_size()
{
size_t header_size;
@ -646,7 +781,6 @@ int truncate_avl_to_size_used(OBIDMS_avl_p avl) // TODO is it necessary to unmap
file_descriptor,
(avl->header)->header_size
);
if (avl->tree == MAP_FAILED)
{
obi_set_errno(OBI_AVL_ERROR);
@ -923,23 +1057,24 @@ int remap_an_avl(OBIDMS_avl_p avl)
int add_new_avl_in_group(OBIDMS_avl_group_p avl_group)
{
// Check that maximum number of AVLs in a group was not reached
if (avl_group->current_avl_idx == (MAX_NB_OF_AVLS_IN_GROUP-1))
if (avl_group->last_avl_idx == (MAX_NB_OF_AVLS_IN_GROUP - 1))
{
obi_set_errno(OBI_AVL_ERROR);
obidebug(1, "\nError: Trying to add new AVL in AVL group but maximum number of AVLs in a group reached");
return -1;
}
// Unmap the previous AVL
if (unmap_an_avl((avl_group->sub_avls)[avl_group->current_avl_idx]) < 0)
return -1;
// Unmap the previous AVL if it's not the 1st
if (avl_group->last_avl_idx > 0)
if (unmap_an_avl((avl_group->sub_avls)[avl_group->last_avl_idx]) < 0)
return -1;
// Increment current AVL index
(avl_group->current_avl_idx)++;
(avl_group->last_avl_idx)++;
// Create the new AVL
(avl_group->sub_avls)[avl_group->current_avl_idx] = obi_create_avl(avl_group->dms, avl_group->name, avl_group->current_avl_idx);
if ((avl_group->sub_avls)[avl_group->current_avl_idx] == NULL)
(avl_group->sub_avls)[avl_group->last_avl_idx] = obi_create_avl(avl_group->dms, avl_group->name, avl_group->last_avl_idx);
if ((avl_group->sub_avls)[avl_group->last_avl_idx] == NULL)
{
obidebug(1, "\nError creating a new AVL tree in a group");
return -1;
@ -949,6 +1084,36 @@ int add_new_avl_in_group(OBIDMS_avl_group_p avl_group)
}
// TODO doc
int add_existing_avl_in_group(OBIDMS_avl_group_p avl_group_dest, OBIDMS_avl_group_p avl_group_source, int avl_idx)
{
if (link(get_full_path_of_avl_file_name(avl_group_source->dms, avl_group_source->name, avl_idx), get_full_path_of_avl_file_name(avl_group_dest->dms, avl_group_dest->name, avl_idx)) < 0)
{
obi_set_errno(OBI_AVL_ERROR);
obidebug(1, "\nError creating a hard link to an existing AVL tree file");
return -1;
}
if (link(get_full_path_of_avl_data_file_name(avl_group_source->dms, avl_group_source->name, avl_idx), get_full_path_of_avl_data_file_name(avl_group_dest->dms, avl_group_dest->name, avl_idx)) < 0)
{
obi_set_errno(OBI_AVL_ERROR);
obidebug(1, "\nError creating a hard link to an existing AVL data file");
return -1;
}
// Increment current AVL index
(avl_group_dest->last_avl_idx)++;
// Open AVL for that group TODO ideally not needed, but needed for now
avl_group_dest->sub_avls[avl_group_dest->last_avl_idx] = obi_open_avl(avl_group_source->dms, avl_group_source->name, avl_idx);
if ((avl_group_dest->sub_avls)[avl_group_dest->last_avl_idx] == NULL)
{
obidebug(1, "\nError opening an AVL to add in an AVL group");
return -1;
}
return 0;
}
int maybe_in_avl(OBIDMS_avl_p avl, Obi_blob_p value)
{
return (bloom_check(&((avl->header)->bloom_filter), value, obi_blob_sizeof(value)));
@ -1529,8 +1694,7 @@ OBIDMS_avl_p obi_create_avl(OBIDMS_p dms, const char* avl_name, int avl_idx)
// Bloom filter
bloom_init(&((avl->header)->bloom_filter), MAX_NODE_COUNT_PER_AVL);
if (avl_idx >= 0)
free(complete_avl_name);
free(complete_avl_name);
return avl;
}
@ -1777,8 +1941,7 @@ OBIDMS_avl_p obi_open_avl(OBIDMS_p dms, const char* avl_name, int avl_idx)
avl->dir_fd = avl_dir_file_descriptor;
avl->avl_fd = avl_file_descriptor;
if (avl_idx >= 0)
free(complete_avl_name);
free(complete_avl_name);
return avl;
}
@ -1806,6 +1969,7 @@ OBIDMS_avl_group_p obi_avl_group(OBIDMS_p dms, const char* avl_name)
OBIDMS_avl_group_p obi_create_avl_group(OBIDMS_p dms, const char* avl_name)
{
OBIDMS_avl_group_p avl_group;
char* avl_dir_name;
avl_group = (OBIDMS_avl_group_p) malloc(sizeof(OBIDMS_avl_group_t));
if (avl_group == NULL)
@ -1815,18 +1979,22 @@ OBIDMS_avl_group_p obi_create_avl_group(OBIDMS_p dms, const char* avl_name)
return NULL;
}
// Create 1st avl
(avl_group->sub_avls)[0] = obi_create_avl(dms, avl_name, 0);
if ((avl_group->sub_avls)[0] == NULL)
{
obidebug(1, "\nError creating the first AVL of an AVL group");
return NULL;
}
avl_group->current_avl_idx = 0;
avl_group->last_avl_idx = -1;
avl_group->dms = dms;
strcpy(avl_group->name, avl_name);
avl_group->dms = dms;
// Create the directory for that AVL group
avl_dir_name = get_full_path_of_avl_dir(dms, avl_name);
if (avl_dir_name == NULL)
return NULL;
if (mkdirat(dms->indexer_dir_fd, avl_dir_name, 00777) < 0)
{
obi_set_errno(OBI_AVL_ERROR);
obidebug(1, "\nError creating an AVL directory");
free(avl_dir_name);
return NULL;
}
// Add in the list of open indexers
obi_dms_list_indexer(dms, avl_group);
@ -1883,7 +2051,7 @@ OBIDMS_avl_group_p obi_open_avl_group(OBIDMS_p dms, const char* avl_name)
if ((avl_group->sub_avls)[i] == NULL)
return NULL;
}
avl_group->current_avl_idx = avl_count-1; // TODO latest. discuss
avl_group->last_avl_idx = avl_count-1; // TODO latest. discuss
strcpy(avl_group->name, avl_name);
avl_group->dms = dms;
@ -1901,6 +2069,59 @@ OBIDMS_avl_group_p obi_open_avl_group(OBIDMS_p dms, const char* avl_name)
}
void obi_clone_avl(OBIDMS_avl_p avl, OBIDMS_avl_p new_avl)
{
// Clone AVL tree
memcpy(new_avl->tree, avl->tree, (avl->header)->avl_size);
(new_avl->header)->avl_size = (avl->header)->avl_size;
(new_avl->header)->nb_items = (avl->header)->nb_items;
(new_avl->header)->root_idx = (avl->header)->root_idx;
(new_avl->header)->bloom_filter = (avl->header)->bloom_filter;
// Clone AVL data
memcpy((new_avl->data)->data, (avl->data)->data, ((avl->data)->header)->data_size_used);
((new_avl->data)->header)->data_size_used = ((avl->data)->header)->data_size_used;
((new_avl->data)->header)->data_size_max = ((avl->data)->header)->data_size_max;
((new_avl->data)->header)->nb_items = ((avl->data)->header)->nb_items;
}
OBIDMS_avl_group_p obi_clone_avl_group(OBIDMS_avl_group_p avl_group, const char* new_avl_name)
{
OBIDMS_avl_group_p new_avl_group;
int i;
// Create the new AVL group
new_avl_group = obi_create_avl_group(avl_group->dms, new_avl_name);
// Create hard links to all the full AVLs that won't be modified: all but the last one
for (i=0; i<(avl_group->last_avl_idx); i++)
{
if (add_existing_avl_in_group(new_avl_group, avl_group, i) < 0)
return NULL;
}
// Create the last AVL to copy data in it
if (add_new_avl_in_group(new_avl_group) < 0)
{
obi_close_avl_group(new_avl_group);
return NULL;
}
// Copy the data from the last AVL to a new one that can be modified
obi_clone_avl((avl_group->sub_avls)[avl_group->last_avl_idx], (new_avl_group->sub_avls)[new_avl_group->last_avl_idx]);
// Close old AVL group
if (obi_close_avl_group(avl_group) < 0)
{
obi_close_avl_group(new_avl_group);
return NULL;
}
return new_avl_group;
}
int obi_close_avl(OBIDMS_avl_p avl)
{
int ret_val = 0;
@ -1909,7 +2130,7 @@ int obi_close_avl(OBIDMS_avl_p avl)
ret_val = truncate_avl_to_size_used(avl);
if (munmap(avl->tree, (((avl->header)->nb_items_max) * sizeof(AVL_node_t))) < 0)
if (munmap(avl->tree, (avl->header)->avl_size) < 0)
{
obi_set_errno(OBI_AVL_ERROR);
obidebug(1, "\nError munmapping the tree of an AVL tree file");
@ -1946,9 +2167,17 @@ int obi_close_avl_group(OBIDMS_avl_group_p avl_group)
ret_val = obi_dms_unlist_indexer(avl_group->dms, avl_group);
// Close each AVL of the group
for (i=0; i < (avl_group->current_avl_idx); i++)
for (i=0; i <= (avl_group->last_avl_idx); i++)
{
// Remap all but the last AVL (already mapped) before closing to truncate and close properly
if (i < (avl_group->last_avl_idx))
{
if (remap_an_avl((avl_group->sub_avls)[i]) < 0)
ret_val = -1;
}
if (obi_close_avl((avl_group->sub_avls)[i]) < 0)
ret_val = -1;
}
free(avl_group);
}
@ -2157,18 +2386,29 @@ index_t obi_avl_group_add(OBIDMS_avl_group_p avl_group, Obi_blob_p value)
index_t index_with_avl;
int i;
if (maybe_in_avl((avl_group->sub_avls)[avl_group->current_avl_idx], value))
// Create 1st AVL if group is empty
if (avl_group->last_avl_idx == -1)
{
index_in_avl = (int32_t) obi_avl_find((avl_group->sub_avls)[avl_group->current_avl_idx], value);
if (add_new_avl_in_group(avl_group) < 0)
{
obidebug(1, "\nError creating the first AVL of an AVL group");
return -1;
}
}
if (maybe_in_avl((avl_group->sub_avls)[avl_group->last_avl_idx], value))
{
index_in_avl = (int32_t) obi_avl_find((avl_group->sub_avls)[avl_group->last_avl_idx], value);
if (index_in_avl >= 0)
{
index_with_avl = avl_group->current_avl_idx;
index_with_avl = avl_group->last_avl_idx;
index_with_avl = index_with_avl << 32;
index_with_avl = index_with_avl + index_in_avl;
return index_with_avl;
}
}
for (i=0; i < (avl_group->current_avl_idx); i++)
for (i=0; i < (avl_group->last_avl_idx); i++)
{
if (maybe_in_avl((avl_group->sub_avls)[i], value))
{
@ -2192,24 +2432,23 @@ index_t obi_avl_group_add(OBIDMS_avl_group_p avl_group, Obi_blob_p value)
// Check if the AVL group is writable
if (!(avl_group->writable))
{
obi_set_errno(OBI_AVL_ERROR);
obidebug(1, "\nTrying to add a value in an AVL group that is read-only.");
obi_set_errno(OBI_READ_ONLY_INDEXER_ERROR);
return -1;
}
// Check if need to make new AVL
if (((((avl_group->sub_avls)[avl_group->current_avl_idx])->header)->nb_items == MAX_NODE_COUNT_PER_AVL) || (((((avl_group->sub_avls)[avl_group->current_avl_idx])->data)->header)->data_size_used >= MAX_DATA_SIZE_PER_AVL))
if (((((avl_group->sub_avls)[avl_group->last_avl_idx])->header)->nb_items == MAX_NODE_COUNT_PER_AVL) || (((((avl_group->sub_avls)[avl_group->last_avl_idx])->data)->header)->data_size_used >= MAX_DATA_SIZE_PER_AVL))
{
if (add_new_avl_in_group(avl_group) < 0)
return -1;
}
// Add in the current AVL
bloom_add(&((((avl_group->sub_avls)[avl_group->current_avl_idx])->header)->bloom_filter), value, obi_blob_sizeof(value));
index_in_avl = (int32_t) obi_avl_add((avl_group->sub_avls)[avl_group->current_avl_idx], value);
bloom_add(&((((avl_group->sub_avls)[avl_group->last_avl_idx])->header)->bloom_filter), value, obi_blob_sizeof(value));
index_in_avl = (int32_t) obi_avl_add((avl_group->sub_avls)[avl_group->last_avl_idx], value);
// Build the index containing the AVL index
index_with_avl = avl_group->current_avl_idx;
index_with_avl = avl_group->last_avl_idx;
index_with_avl = index_with_avl << 32;
index_with_avl = index_with_avl + index_in_avl;
@ -2217,3 +2456,8 @@ index_t obi_avl_group_add(OBIDMS_avl_group_p avl_group, Obi_blob_p value)
}
const char* obi_avl_group_get_name(OBIDMS_avl_group_p avl_group)
{
return avl_group->name;
}

View File

@ -73,7 +73,7 @@ typedef struct AVL_node {
* @brief OBIDMS AVL tree data header structure.
*/
typedef struct OBIDMS_avl_data_header {
int header_size; /**< Size of the header in bytes.
size_t header_size; /**< Size of the header in bytes.
*/
index_t data_size_used; /**< Size of the data used in bytes.
*/
@ -105,7 +105,7 @@ typedef struct OBIDMS_avl_data {
* @brief OBIDMS AVL tree header structure.
*/
typedef struct OBIDMS_avl_header {
int header_size; /**< Size of the header in bytes.
size_t header_size; /**< Size of the header in bytes.
*/
size_t avl_size; /**< Size of the AVL tree in bytes.
*/
@ -160,7 +160,7 @@ typedef struct OBIDMS_avl {
typedef struct OBIDMS_avl_group {
OBIDMS_avl_p sub_avls[MAX_NB_OF_AVLS_IN_GROUP]; /**< Array containing the pointers to the AVL trees of the group.
*/
int current_avl_idx; /**< Index in the sub_avls array of the AVL tree currently being filled.
int last_avl_idx; /**< Index in the sub_avls array of the AVL tree currently being filled.
*/
char name[AVL_MAX_NAME+1]; /**< Base name of the AVL group. The AVL trees in it have names of the form basename_idx.
*/
@ -290,6 +290,37 @@ OBIDMS_avl_group_p obi_create_avl_group(OBIDMS_p dms, const char* avl_name);
OBIDMS_avl_group_p obi_open_avl_group(OBIDMS_p dms, const char* avl_name);
/**
* @brief Clones an AVL.
*
* The tree and the data are both cloned into the new AVL.
*
* @param avl A pointer on the AVL to clone.
* @param new_avl A pointer on the new AVL to fill.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
void obi_clone_avl(OBIDMS_avl_p avl, OBIDMS_avl_p new_avl);
/**
* @brief Clones an AVL group.
*
* @warning The AVL group that has be cloned is closed by the function.
*
* @param avl_group A pointer on the AVL group to clone.
* @param new_avl_name The name of the new AVL group.
*
* @returns A pointer on the new AVL group structure.
* @retval NULL if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
OBIDMS_avl_group_p obi_clone_avl_group(OBIDMS_avl_group_p avl_group, const char* new_avl_name);
/**
* @brief Closes an AVL tree.
*
@ -410,5 +441,18 @@ Obi_blob_p obi_avl_group_get(OBIDMS_avl_group_p avl_group, index_t idx);
index_t obi_avl_group_add(OBIDMS_avl_group_p avl_group, Obi_blob_p value);
/**
* @brief Recovers the name of an AVL group.
*
* @param avl_group A pointer on the AVL group structure.
*
* @returns A pointer on the name of the AVL group.
*
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
const char* obi_avl_group_get_name(OBIDMS_avl_group_p avl_group);
#endif /* OBIAVL_H_ */

View File

@ -23,6 +23,8 @@
#define ELEMENT_SIZE_STR (8) /**< The size of an element from a value of type character string.
*/
#define ELEMENT_SIZE_UINT8 (8) /**< The size of an element from a value of type uint8_t.
*/
#define ELEMENT_SIZE_SEQ_2 (2) /**< The size of an element from a value of type DNA sequence encoded on 2 bits.
*/
#define ELEMENT_SIZE_SEQ_4 (4) /**< The size of an element from a value of type DNA sequence encoded on 4 bits.

View File

@ -22,19 +22,23 @@
#define DEBUG_LEVEL 0 // TODO has to be defined somewhere else (cython compil flag?)
//inline int obi_indexer_exists(OBIDMS_p dms, const char* name);
inline int obi_indexer_exists(OBIDMS_p dms, const char* name);
//inline Obi_indexer_p obi_indexer(OBIDMS_p dms, const char* name);
inline Obi_indexer_p obi_indexer(OBIDMS_p dms, const char* name);
//inline Obi_indexer_p obi_create_indexer(OBIDMS_p dms, const char* name);
inline Obi_indexer_p obi_create_indexer(OBIDMS_p dms, const char* name);
//inline Obi_indexer_p obi_open_indexer(OBIDMS_p dms, const char* name);
inline Obi_indexer_p obi_open_indexer(OBIDMS_p dms, const char* name);
//inline int obi_close_indexer(Obi_indexer_p indexer);
inline Obi_indexer_p obi_clone_indexer(Obi_indexer_p indexer, const char* name);
//inline index_t obi_indexer_add(Obi_indexer_p indexer, Obi_blob_p value);
inline int obi_close_indexer(Obi_indexer_p indexer);
//inline Obi_blob_p obi_indexer_get(Obi_indexer_p indexer, index_t idx);
inline index_t obi_indexer_add(Obi_indexer_p indexer, Obi_blob_p value);
inline Obi_blob_p obi_indexer_get(Obi_indexer_p indexer, index_t idx);
inline const char* obi_indexer_get_name(Obi_indexer_p indexer);
char* obi_build_indexer_name(const char* column_name, obiversion_t column_version)

View File

@ -47,7 +47,10 @@ typedef OBIDMS_avl_group_p Obi_indexer_p; /**< Typedef to refer to the pointer
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
#define obi_indexer_exists obi_avl_exists
inline int obi_indexer_exists(OBIDMS_p dms, const char* name)
{
return obi_avl_exists(dms, name);
}
/**
@ -62,7 +65,10 @@ typedef OBIDMS_avl_group_p Obi_indexer_p; /**< Typedef to refer to the pointer
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
#define obi_indexer obi_avl_group
inline Obi_indexer_p obi_indexer(OBIDMS_p dms, const char* name)
{
return obi_avl_group(dms, name);
}
/**
@ -77,7 +83,10 @@ typedef OBIDMS_avl_group_p Obi_indexer_p; /**< Typedef to refer to the pointer
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
#define obi_create_indexer obi_create_avl_group
inline Obi_indexer_p obi_create_indexer(OBIDMS_p dms, const char* name)
{
return obi_create_avl_group(dms, name);
}
/**
@ -92,7 +101,28 @@ typedef OBIDMS_avl_group_p Obi_indexer_p; /**< Typedef to refer to the pointer
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
#define obi_open_indexer obi_open_avl_group
inline Obi_indexer_p obi_open_indexer(OBIDMS_p dms, const char* name)
{
return obi_open_avl_group(dms, name);
}
/**
* @brief Clones an indexer.
*
* @param indexer The indexer to clone.
* @param name The name of the new indexer.
*
* @returns A pointer on the new indexer structure.
* @retval NULL if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
inline Obi_indexer_p obi_clone_indexer(Obi_indexer_p indexer, const char* name)
{
return obi_clone_avl_group(indexer, name);
}
/**
@ -106,7 +136,10 @@ typedef OBIDMS_avl_group_p Obi_indexer_p; /**< Typedef to refer to the pointer
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
#define obi_close_indexer obi_close_avl_group
inline int obi_close_indexer(Obi_indexer_p indexer)
{
return obi_close_avl_group(indexer);
}
/**
@ -121,7 +154,10 @@ typedef OBIDMS_avl_group_p Obi_indexer_p; /**< Typedef to refer to the pointer
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
#define obi_indexer_add obi_avl_group_add
inline index_t obi_indexer_add(Obi_indexer_p indexer, Obi_blob_p value)
{
return obi_avl_group_add(indexer, value);
}
/**
@ -135,7 +171,26 @@ typedef OBIDMS_avl_group_p Obi_indexer_p; /**< Typedef to refer to the pointer
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
#define obi_indexer_get obi_avl_group_get
inline Obi_blob_p obi_indexer_get(Obi_indexer_p indexer, index_t idx)
{
return obi_avl_group_get(indexer, idx);
}
/**
* @brief Recovers the name of an indexer.
*
* @param indexer A pointer on the indexer.
*
* @returns A pointer on the name of the indexer.
*
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
inline const char* obi_indexer_get_name(Obi_indexer_p indexer)
{
return obi_avl_group_get_name(indexer);
}
/**

View File

@ -18,6 +18,7 @@
#include <fcntl.h>
#include <sys/types.h>
#include <dirent.h>
#include <unistd.h>
#include "obidms.h"
#include "obierrno.h"
@ -202,14 +203,14 @@ int create_dms_infos_file(int dms_file_descriptor, const char* dms_name)
*
**********************************************************************/
int obi_dms_exists(const char* dms_name)
int obi_dms_exists(const char* dms_path)
{
struct stat buffer;
char* directory_name;
int check_dir;
// Build and check the directory name
directory_name = build_directory_name(dms_name);
directory_name = build_directory_name(dms_path);
if (directory_name == NULL)
return -1;
@ -224,14 +225,15 @@ int obi_dms_exists(const char* dms_name)
}
OBIDMS_p obi_create_dms(const char* dms_name)
OBIDMS_p obi_create_dms(const char* dms_path)
{
char* directory_name;
DIR* dms_dir;
int dms_file_descriptor;
char* directory_name;
DIR* dms_dir;
int dms_file_descriptor;
size_t i, j;
// Build and check the directory name
directory_name = build_directory_name(dms_name);
directory_name = build_directory_name(dms_path);
if (directory_name == NULL)
return NULL;
@ -250,7 +252,7 @@ OBIDMS_p obi_create_dms(const char* dms_name)
return NULL;
}
// Get file descriptor of DMS directory to create the indexer directory
// Get file descriptor of DMS directory to create other directories
dms_dir = opendir(directory_name);
if (dms_dir == NULL)
{
@ -278,22 +280,40 @@ OBIDMS_p obi_create_dms(const char* dms_name)
return NULL;
}
// Create the view directory
if (mkdirat(dms_file_descriptor, VIEW_DIR_NAME, 00777) < 0)
{
obi_set_errno(OBIVIEW_ERROR);
obidebug(1, "\nProblem creating a view directory");
return NULL;
}
// Isolate the dms name
j = 0;
for (i=0; i<strlen(dms_path); i++)
{
if (dms_path[i] == '/')
j = i+1;
i++;
}
// Create the informations file
if (create_dms_infos_file(dms_file_descriptor, dms_name) < 0)
if (create_dms_infos_file(dms_file_descriptor, dms_path+j) < 0)
return NULL;
return obi_open_dms(dms_name);
return obi_open_dms(dms_path);
}
OBIDMS_p obi_open_dms(const char* dms_name)
OBIDMS_p obi_open_dms(const char* dms_path)
{
OBIDMS_p dms;
char* directory_name;
char* relative_directory_path;
char* infos_file_name;
int infos_file_descriptor;
bool little_endian_dms;
bool little_endian_platform;
size_t i, j;
dms = NULL;
@ -307,18 +327,37 @@ OBIDMS_p obi_open_dms(const char* dms_name)
}
// Build and check the directory name
directory_name = build_directory_name(dms_name);
if (directory_name == NULL)
relative_directory_path = build_directory_name(dms_path);
if (relative_directory_path == NULL)
{
free(dms);
return NULL;
}
strncpy(dms->directory_name, directory_name, OBIDMS_MAX_NAME);
free(directory_name);
// Build and store the absolute path to the DMS
if (getcwd(dms->directory_path, MAX_PATH_LEN) == NULL)
{
obi_set_errno(OBIDMS_UNKNOWN_ERROR);
obidebug(1, "\nError getting the absolute path to the current working directory");
free(relative_directory_path);
return NULL;
}
strcat(dms->directory_path, "/");
strcat(dms->directory_path, relative_directory_path);
free(relative_directory_path);
// Isolate and store the dms name
j = 0;
for (i=0; i<strlen(dms_path); i++)
{
if (dms_path[i] == '/')
j = i+1;
i++;
}
strcpy(dms->dms_name, dms_path+j);
// Try to open the directory
dms->directory = opendir(dms->directory_name);
dms->directory = opendir(dms->directory_path);
if (dms->directory == NULL)
{
switch (errno)
@ -355,7 +394,7 @@ OBIDMS_p obi_open_dms(const char* dms_name)
}
// Open informations file to check endianness
infos_file_name = build_infos_file_name(dms_name);
infos_file_name = build_infos_file_name(dms->dms_name);
infos_file_descriptor = openat(dms->dir_fd, infos_file_name, O_RDONLY, 0777);
if (infos_file_descriptor < 0)
{
@ -416,6 +455,30 @@ OBIDMS_p obi_open_dms(const char* dms_name)
return NULL;
}
// Open the view directory
dms->view_directory = opendir_in_dms(dms, VIEW_DIR_NAME);
if (dms->view_directory == NULL)
{
obi_set_errno(OBIDMS_UNKNOWN_ERROR);
obidebug(1, "\nError opening the view directory");
closedir(dms->indexer_directory);
closedir(dms->directory);
free(dms);
return NULL;
}
// Store the view directory's file descriptor
dms->view_dir_fd = dirfd(dms->view_directory);
if (dms->view_dir_fd < 0)
{
obi_set_errno(OBIDMS_UNKNOWN_ERROR);
obidebug(1, "\nError getting the file descriptor of the view directory");
closedir(dms->view_directory);
closedir(dms->directory);
free(dms);
return NULL;
}
// Initialize the list of opened columns
dms->opened_columns = (Opened_columns_list_p) malloc(sizeof(Opened_columns_list_t));
(dms->opened_columns)->nb_opened_columns = 0;
@ -455,7 +518,7 @@ int obi_close_dms(OBIDMS_p dms)
while ((dms->opened_columns)->nb_opened_columns > 0)
obi_close_column(*((dms->opened_columns)->columns));
// Close dms and indexer directories
// Close dms, and view and indexer directories
if (closedir(dms->directory) < 0)
{
obi_set_errno(OBIDMS_MEMORY_ERROR);
@ -470,6 +533,13 @@ int obi_close_dms(OBIDMS_p dms)
free(dms);
return -1;
}
if (closedir(dms->view_directory) < 0)
{
obi_set_errno(OBIVIEW_ERROR);
obidebug(1, "\nError closing a view directory");
free(dms);
return -1;
}
free(dms);
}
@ -551,7 +621,7 @@ Obi_indexer_p obi_dms_get_indexer_from_list(OBIDMS_p dms, const char* indexer_na
for (i=0; i < (indexers_list->nb_opened_indexers); i++)
{
if (!strcmp(((indexers_list->indexers)[i])->name, indexer_name)) // TODO get_name function indexer
if (!strcmp(obi_indexer_get_name((indexers_list->indexers)[i]), indexer_name))
{ // Found the indexer already opened, return it
return (indexers_list->indexers)[i];
}
@ -561,7 +631,7 @@ Obi_indexer_p obi_dms_get_indexer_from_list(OBIDMS_p dms, const char* indexer_na
}
void obi_dms_list_indexer(OBIDMS_p dms, Obi_indexer_p indexer)
void obi_dms_list_indexer(OBIDMS_p dms, Obi_indexer_p indexer) // TODO add check if indexer already in list?
{
*(((dms->opened_indexers)->indexers)+((dms->opened_indexers)->nb_opened_indexers)) = indexer;
((dms->opened_indexers)->nb_opened_indexers)++;
@ -577,7 +647,7 @@ int obi_dms_unlist_indexer(OBIDMS_p dms, Obi_indexer_p indexer)
for (i=0; i < indexers_list->nb_opened_indexers; i++)
{
if (!strcmp(((indexers_list->indexers)[i])->name, indexer->name)) // TODO get_name function indexer
if (!strcmp(obi_indexer_get_name((indexers_list->indexers)[i]), indexer->name))
{ // Found the indexer. Rearrange list
(indexers_list->nb_opened_indexers)--;
(indexers_list->indexers)[i] = (indexers_list->indexers)[indexers_list->nb_opened_indexers];
@ -602,15 +672,7 @@ char* obi_dms_get_dms_path(OBIDMS_p dms)
return NULL;
}
if (getcwd(full_path, MAX_PATH_LEN) == NULL) // TODO store when opening
{
obi_set_errno(OBI_UTILS_ERROR);
obidebug(1, "\nError getting the path to a file or directory");
return NULL;
}
strcat(full_path, "/");
strcat(full_path, dms->directory_name);
strcpy(full_path, dms->directory_path);
return full_path;
}

View File

@ -30,6 +30,8 @@
*/
#define INDEXER_DIR_NAME "OBIBLOB_INDEXERS" /**< The name of the Obiblob indexer directory.
*/
#define VIEW_DIR_NAME "VIEWS" /**< The name of the view directory.
*/
#define TAXONOMY_DIR_NAME "TAXONOMY" /**< The name of the taxonomy directory.
*/
#define MAX_NB_OPENED_COLUMNS (100) /**< The maximum number of columns open at the same time.
@ -78,9 +80,14 @@ typedef struct Opened_indexers_list {
* and opening of an OBITools Data Management System (DMS)
*/
typedef struct OBIDMS {
char dms_name[OBIDMS_MAX_NAME+1]; /** The name of the DMS.
*/
char directory_name[OBIDMS_MAX_NAME+1]; /**< The name of the directory
* containing the DMS.
*/
char directory_path[MAX_PATH_LEN]; /**< The absolute path of the directory
* containing the DMS.
*/
DIR* directory; /**< A directory entry usable to
* refer and scan the database directory.
*/
@ -93,6 +100,12 @@ typedef struct OBIDMS {
int indexer_dir_fd; /**< The file descriptor of the directory entry
* usable to refer and scan the indexer directory.
*/
DIR* view_directory; /**< A directory entry usable to
* refer and scan the view directory.
*/
int view_dir_fd; /**< The file descriptor of the directory entry
* usable to refer and scan the view directory.
*/
bool little_endian; /**< Endianness of the database.
*/
Opened_columns_list_p opened_columns; /**< List of opened columns.
@ -105,7 +118,7 @@ typedef struct OBIDMS {
/**
* @brief Checks if an OBIDMS exists.
*
* @param dms_name A pointer to a C string containing the name of the database.
* @param dms_path A pointer to a C string containing the path to the database.
*
* @returns An integer value indicating the status of the database.
* @retval 1 if the database exists.
@ -115,7 +128,7 @@ typedef struct OBIDMS {
* @since May 2015
* @author Eric Coissac (eric.coissac@metabarcoding.org)
*/
int obi_dms_exists(const char* dms_name);
int obi_dms_exists(const char* dms_path);
/**
@ -127,7 +140,7 @@ int obi_dms_exists(const char* dms_name);
*
* A directory to store Obiblob indexers is also created.
*
* @param dms_name A pointer to a C string containing the name of the database.
* @param dms_path A pointer to a C string containing the path to the database.
* The actual directory name used to store the DMS will be
* `<dms_name>.obidms`.
*
@ -144,7 +157,7 @@ OBIDMS_p obi_create_dms(const char* dms_name);
/**
* @brief Opens an existing OBITools Data Management instance (OBIDMS).
*
* @param dms_name A pointer to a C string containing the name of the database.
* @param dms_path A pointer to a C string containing the path to the database.
*
* @returns A pointer to an OBIDMS structure describing the opened DMS.
* @retval NULL if an error occurred.
@ -153,7 +166,7 @@ OBIDMS_p obi_create_dms(const char* dms_name);
* @since May 2015
* @author Eric Coissac (eric.coissac@metabarcoding.org)
*/
OBIDMS_p obi_open_dms(const char* dms_name);
OBIDMS_p obi_open_dms(const char* dms_path);
/**
@ -162,7 +175,7 @@ OBIDMS_p obi_open_dms(const char* dms_name);
* If the database already exists, this function opens it, otherwise it
* creates a new database.
*
* @param dms_name A pointer to a C string containing the name of the database.
* @param dms_path A pointer to a C string containing the path to the database.
*
* @returns A pointer to an OBIDMS structure describing the OBIDMS.
* @retval NULL if an error occurred.
@ -171,7 +184,7 @@ OBIDMS_p obi_open_dms(const char* dms_name);
* @since May 2015
* @author Eric Coissac (eric.coissac@metabarcoding.org)
*/
OBIDMS_p obi_dms(const char* dms_name);
OBIDMS_p obi_dms(const char* dms_path);
/**
@ -230,6 +243,8 @@ OBIDMS_column_p obi_dms_get_column_from_list(OBIDMS_p dms, const char* column_na
/**
* @brief Adds a column identified by its name and its version number in the list of opened columns.
*
* @warning obi_dms_get_column_from_list() should be used first to check if the column isn't already listed.
*
* @param dms The OBIDMS.
* @param column A pointer on the column that should be added in the list of opened columns.
*
@ -269,6 +284,8 @@ Obi_indexer_p obi_dms_get_indexer_from_list(OBIDMS_p dms, const char* indexer_na
/**
* @brief Adds an indexer identified by its name in the list of opened indexers.
*
* @warning obi_dms_get_indexer_from_list() should be used first to check if the indexer isn't already listed.
*
* @param dms The OBIDMS.
* @param indexer A pointer on the indexer that should be added in the list of opened indexers.
*
@ -291,9 +308,19 @@ int obi_dms_unlist_indexer(OBIDMS_p dms, Obi_indexer_p indexer);
/**
* Function meant to disappear soon
* @brief Gets the full path to the DMS.
*
* @warning The returned pointer has to be freed by the caller.
*
* @param dms The DMS.
*
* @returns A pointer to the full path.
* @retval NULL if an error occurred.
*
* @since April 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
char* obi_dms_get_path(OBIDMS_p dms);
char* obi_dms_get_dms_path(OBIDMS_p dms);
/**

View File

@ -587,7 +587,7 @@ OBIDMS_column_p obi_create_column(OBIDMS_p dms,
}
// Build the indexer name if needed
if ((data_type == OBI_STR) || (data_type == OBI_SEQ))
if ((data_type == OBI_STR) || (data_type == OBI_SEQ) || (data_type == OBI_QUAL))
{
if (strcmp(indexer_name, "") == 0)
{
@ -603,7 +603,7 @@ OBIDMS_column_p obi_create_column(OBIDMS_p dms,
}
returned_data_type = data_type;
if ((data_type == OBI_STR) || (data_type == OBI_SEQ))
if ((data_type == OBI_STR) || (data_type == OBI_SEQ) || (data_type == OBI_QUAL))
// stored data is indices referring to data stored elsewhere
stored_data_type = OBI_IDX;
else
@ -750,8 +750,8 @@ OBIDMS_column_p obi_create_column(OBIDMS_p dms,
if (comments != NULL)
strncpy(header->comments, comments, COMMENTS_MAX_LENGTH);
// If the data type is OBI_STR or OBI_SEQ, the associated obi_indexer is opened or created
if ((returned_data_type == OBI_STR) || (returned_data_type == OBI_SEQ))
// If the data type is OBI_STR, OBI_SEQ or OBI_QUAL, the associated obi_indexer is opened or created
if ((returned_data_type == OBI_STR) || (returned_data_type == OBI_SEQ) || (returned_data_type == OBI_QUAL))
{
new_column->indexer = obi_indexer(dms, final_indexer_name);
if (new_column->indexer == NULL)
@ -900,8 +900,8 @@ OBIDMS_column_p obi_open_column(OBIDMS_p dms,
column->writable = false;
// If the data type is OBI_STR or OBI_SEQ, the associated indexer is opened
if (((column->header)->returned_data_type == OBI_STR) || ((column->header)->returned_data_type == OBI_SEQ))
// If the data type is OBI_STR, OBI_SEQ or OBI_QUAL, the associated indexer is opened
if (((column->header)->returned_data_type == OBI_STR) || ((column->header)->returned_data_type == OBI_SEQ) || ((column->header)->returned_data_type == OBI_QUAL))
{
column->indexer = obi_open_indexer(dms, (column->header)->indexer_name);
if (column->indexer == NULL)
@ -940,8 +940,6 @@ OBIDMS_column_p obi_clone_column(OBIDMS_p dms,
size_t line_size;
index_t i, index;
obidebug(1, "\nline_selection == NULL = %d\n", (line_selection == NULL));
column_to_clone = obi_open_column(dms, column_name, version_number);
if (column_to_clone == NULL)
{
@ -1028,8 +1026,8 @@ int obi_close_column(OBIDMS_column_p column)
// Close column directory if it was the last column opened from that directory
close_dir = obi_dms_is_column_name_in_list(column->dms, (column->header)->name);
// If the data type is OBI_STR or OBI_SEQ, the associated indexer is closed
if (((column->header)->returned_data_type == OBI_STR) || ((column->header)->returned_data_type == OBI_SEQ))
// If the data type is OBI_STR, OBI_SEQ or OBI_QUAL, the associated indexer is closed
if (((column->header)->returned_data_type == OBI_STR) || ((column->header)->returned_data_type == OBI_SEQ) || ((column->header)->returned_data_type == OBI_QUAL))
if (obi_close_indexer(column->indexer) < 0)
ret_val = -1;
@ -1155,6 +1153,14 @@ int obi_enlarge_column(OBIDMS_column_p column)
int column_file_descriptor;
char* column_file_name;
// Check if the column is read-only
if (!(column->writable))
{
obi_set_errno(OBICOL_UNKNOWN_ERROR);
obidebug(1, "\nError trying to enlarge a read-only column");
return -1;
}
// Get the column file name
column_file_name = build_column_file_name((column->header)->name, (column->header)->version);
if (column_file_name == NULL)
@ -1209,12 +1215,12 @@ int obi_enlarge_column(OBIDMS_column_p column)
}
column->data = mmap(NULL,
new_data_size,
PROT_READ | PROT_WRITE,
MAP_SHARED,
column_file_descriptor,
header_size
);
new_data_size,
PROT_READ | PROT_WRITE,
MAP_SHARED,
column_file_descriptor,
header_size
);
if (column->data == MAP_FAILED)
{
obi_set_errno(OBICOL_UNKNOWN_ERROR);
@ -1274,23 +1280,29 @@ void obi_ini_to_NA_values(OBIDMS_column_p column,
}
break;
case OBI_STR: for (i=start;i<end;i++)
{
*(((index_t*) (column->data)) + i) = OBIIdx_NA;
}
break;
case OBI_SEQ: for (i=start;i<end;i++)
{
*(((index_t*) (column->data)) + i) = OBIIdx_NA;
}
break;
case OBI_IDX: for (i=start;i<end;i++)
{
*(((index_t*) (column->data)) + i) = OBIIdx_NA;
}
break;
case OBI_QUAL: for (i=start;i<end;i++) // case not used since OBI_QUAL is only a returned_data_type
{
*(((index_t*) (column->data)) + i) = OBIIdx_NA;
}
break;
case OBI_STR: for (i=start;i<end;i++) // case not used since OBI_QUAL is only a returned_data_type
{
*(((index_t*) (column->data)) + i) = OBIIdx_NA;
}
break;
case OBI_SEQ: for (i=start;i<end;i++) // case not used since OBI_QUAL is only a returned_data_type
{
*(((index_t*) (column->data)) + i) = OBIIdx_NA;
}
break;
}
}
@ -1460,12 +1472,12 @@ int obi_column_prepare_to_set_value(OBIDMS_column_p column, index_t line_nb)
}
int obi_column_prepare_to_get_value(OBIDMS_column_p column, index_t line_nb)
int obi_column_prepare_to_get_value(OBIDMS_column_p column, index_t line_nb) // TODO problem with some columns in a view being empty or shorter and triggering an error because they've been truncated when the view was closed. Fixed with obiview.c in update_lines() for now
{
if ((line_nb+1) > ((column->header)->line_count))
{
obi_set_errno(OBICOL_UNKNOWN_ERROR);
obidebug(1, "\nError trying to get a value that is beyond the current number of lines used in the column");
obidebug(1, "\nError trying to get a value that is beyond the current number of lines of the column");
return -1;
}
return 0;

184
src/obidmscolumn_qual.c Normal file
View File

@ -0,0 +1,184 @@
/****************************************************************************
* OBIDMS_column_qual functions *
****************************************************************************/
/**
* @file obidsmcolumn_qual.c
* @author Celine Mercier
* @date May 4th 2016
* @brief Functions handling OBIColumns containing data in the form of indices referring to sequence qualities.
*/
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include "obidmscolumn_qual.h"
#include "obidmscolumn.h"
#include "obitypes.h"
#include "uint8_indexer.h"
/**********************************************************************
*
* D E F I N I T I O N O F T H E P U B L I C F U N C T I O N S
*
**********************************************************************/
int obi_column_set_obiqual_char_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx, const char* value)
{
uint8_t* int_value;
int int_value_length;
int i;
int ret_value;
// Check NA value
if (value == OBIQual_char_NA)
{
ret_value = obi_column_set_obiqual_int_with_elt_idx(column, line_nb, element_idx, OBIQual_int_NA, 0);
}
else
{
int_value_length = strlen(value);
int_value = (uint8_t*) malloc(int_value_length * sizeof(uint8_t));
// Convert in uint8_t array to index in that format
for (i=0; i<int_value_length; i++)
int_value[i] = ((uint8_t)(value[i])) - QUALITY_ASCII_BASE;
ret_value = obi_column_set_obiqual_int_with_elt_idx(column, line_nb, element_idx, int_value, int_value_length);
free(int_value);
}
return ret_value;
}
int obi_column_set_obiqual_int_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx, const uint8_t* value, int value_length)
{
index_t idx;
char* new_indexer_name;
if (obi_column_prepare_to_set_value(column, line_nb) < 0)
return -1;
if (value == OBIQual_int_NA)
{
idx = OBIIdx_NA;
}
else
{
// Add the value in the indexer
idx = obi_index_uint8(column->indexer, value, value_length);
if (idx == -1) // An error occurred
{
if (obi_errno == OBI_READ_ONLY_INDEXER_ERROR)
{
// TODO PUT IN A COLUMN FUNCTION
// If the error is that the indexer is read-only, clone it
new_indexer_name = obi_build_indexer_name((column->header)->name, (column->header)->version);
if (new_indexer_name == NULL)
return -1;
column->indexer = obi_clone_indexer(column->indexer, new_indexer_name); // TODO Need to lock this somehow?
strcpy((column->header)->indexer_name, new_indexer_name);
free(new_indexer_name);
obi_set_errno(0);
// Add the value in the new indexer
idx = obi_index_uint8(column->indexer, value, value_length);
if (idx == -1)
return -1;
}
else
return -1;
}
}
// Add the value's index in the column
*(((index_t*) (column->data)) + (line_nb * ((column->header)->nb_elements_per_line)) + element_idx) = idx;
return 0;
}
char* obi_column_get_obiqual_char_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx)
{
char* value;
const uint8_t* int_value;
int int_value_length;
int i;
int_value = obi_column_get_obiqual_int_with_elt_idx(column, line_nb, element_idx, &int_value_length);
// Check NA
if (int_value == OBIQual_int_NA)
return OBIQual_char_NA;
value = (char*) malloc((int_value_length + 1) * sizeof(char));
// Encode int quality to char quality
for (i=0; i<int_value_length; i++)
value[i] = (char)(int_value[i] + QUALITY_ASCII_BASE);
value[i] = '\0';
return value;
}
const uint8_t* obi_column_get_obiqual_int_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx, int* value_length)
{
index_t idx;
if (obi_column_prepare_to_get_value(column, line_nb) < 0)
return OBIQual_int_NA;
idx = *(((index_t*) (column->data)) + (line_nb * ((column->header)->nb_elements_per_line)) + element_idx);
// Check NA
if (idx == OBIIdx_NA)
return OBIQual_int_NA;
return obi_retrieve_uint8(column->indexer, idx, value_length);
}
int obi_column_set_obiqual_char_with_elt_name(OBIDMS_column_p column, index_t line_nb, const char* element_name, const char* value)
{
index_t element_idx = obi_column_get_element_index_from_name(column, element_name);
if (element_idx == OBIIdx_NA)
return -1;
return obi_column_set_obiqual_char_with_elt_idx(column, line_nb, element_idx, value);
}
int obi_column_set_obiqual_int_with_elt_name(OBIDMS_column_p column, index_t line_nb, const char* element_name, const uint8_t* value, int value_length)
{
index_t element_idx = obi_column_get_element_index_from_name(column, element_name);
if (element_idx == OBIIdx_NA)
return -1;
return obi_column_set_obiqual_int_with_elt_idx(column, line_nb, element_idx, value, value_length);
}
char* obi_column_get_obiqual_char_with_elt_name(OBIDMS_column_p column, index_t line_nb, const char* element_name)
{
index_t element_idx = obi_column_get_element_index_from_name(column, element_name);
if (element_idx == OBIIdx_NA)
return OBIQual_char_NA;
return obi_column_get_obiqual_char_with_elt_idx(column, line_nb, element_idx);
}
const uint8_t* obi_column_get_obiqual_int_with_elt_name(OBIDMS_column_p column, index_t line_nb, const char* element_name, int* value_length)
{
index_t element_idx = obi_column_get_element_index_from_name(column, element_name);
if (element_idx == OBIIdx_NA)
return OBIQual_int_NA;
return obi_column_get_obiqual_int_with_elt_idx(column, line_nb, element_idx, value_length);
}

204
src/obidmscolumn_qual.h Normal file
View File

@ -0,0 +1,204 @@
/****************************************************************************
* OBIDMS_column_qual header file *
****************************************************************************/
/**
* @file obidsmcolumn_qual.h
* @author Celine Mercier
* @date May 4th 2016
* @brief Header file for the functions handling OBIColumns containing data in the form of indices referring to sequence qualities.
*/
#ifndef OBIDMSCOLUMN_QUAL_H_
#define OBIDMSCOLUMN_QUAL_H_
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include "obidmscolumn.h"
#include "obitypes.h"
#define QUALITY_ASCII_BASE (33) /**< The ASCII base of sequence quality.
* Used to convert sequence qualities from characters to integers
* and the other way around.
*/
/**
* @brief Sets a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line.
*
* This function is for qualities in the character string format.
*
* @warning Pointers returned by obi_open_column() don't allow writing.
*
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be set.
* @param element_idx The index of the element that should be set in the line.
* @param value The value that should be set, in the character string format.
*
* @returns An integer value indicating the success of the operation.
* @retval 0 on success.
* @retval -1 if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_column_set_obiqual_char_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx, const char* value);
/**
* @brief Sets a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line.
*
* This function is for qualities in the integer format.
*
* @warning Pointers returned by obi_open_column() don't allow writing.
*
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be set.
* @param element_idx The index of the element that should be set in the line.
* @param value The value that should be set, in the integer array format.
* @param value_length The length of the integer array.
*
* @returns An integer value indicating the success of the operation.
* @retval 0 on success.
* @retval -1 if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_column_set_obiqual_int_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx, const uint8_t* value, int value_length);
/**
* @brief Recovers a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line.
*
* This function returns quality scores in the character string format.
*
* @param column A pointer as returned by obi_create_column().
* @param line_nb The number of the line where the value should be recovered.
* @param element_idx The index of the element that should be recovered in the line.
*
* @returns The recovered value, in the character string format.
* @retval OBIQual_char_NA the NA value of the type if an error occurred and obi_errno is set.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
char* obi_column_get_obiqual_char_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx);
/**
* @brief Recovers a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line.
*
* This function returns quality scores in the integer format.
*
* @param column A pointer as returned by obi_create_column().
* @param line_nb The number of the line where the value should be recovered.
* @param element_idx The index of the element that should be recovered in the line.
* @param value_length A pointer on an integer to store the length of the integer array recovered.
*
* @returns The recovered value, in the integer array format.
* @retval OBIQual_int_NA the NA value of the type if an error occurred and obi_errno is set.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
const uint8_t* obi_column_get_obiqual_int_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx, int* value_length);
/**
* @brief Sets a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line.
*
* This function is for quality scores in the character string format.
*
* @warning Pointers returned by obi_open_column() don't allow writing.
*
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be set.
* @param element_name The name of the element that should be set in the line.
* @param value The value that should be set, in the character string format.
*
* @returns An integer value indicating the success of the operation.
* @retval 0 on success.
* @retval -1 if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_column_set_obiqual_char_with_elt_name(OBIDMS_column_p column, index_t line_nb, const char* element_name, const char* value);
/**
* @brief Sets a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line.
*
* This function is for quality scores in the integer array format.
*
* @warning Pointers returned by obi_open_column() don't allow writing.
*
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be set.
* @param element_name The name of the element that should be set in the line.
* @param value The value that should be set, in the integer format.
* @param value_length The length of the integer array.
*
* @returns An integer value indicating the success of the operation.
* @retval 0 on success.
* @retval -1 if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_column_set_obiqual_int_with_elt_name(OBIDMS_column_p column, index_t line_nb, const char* element_name, const uint8_t* value, int value_length);
/**
* @brief Recovers a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line.
*
* This function returns quality scores in the character string format.
*
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be recovered.
* @param element_name The name of the element that should be recovered in the line.
*
* @returns The recovered value, in the character string format.
* @retval OBIQual_char_NA the NA value of the type if an error occurred and obi_errno is set.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
char* obi_column_get_obiqual_char_with_elt_name(OBIDMS_column_p column, index_t line_nb, const char* element_name);
/**
* @brief Recovers a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line.
*
* This function returns quality scores in the integer array format.
*
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be recovered.
* @param element_name The name of the element that should be recovered in the line.
* @param value_length A pointer on an integer to store the length of the integer array recovered.
*
* @returns The recovered value, in the integer format.
* @retval OBIQual_int_NA the NA value of the type if an error occurred and obi_errno is set.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
const uint8_t* obi_column_get_obiqual_int_with_elt_name(OBIDMS_column_p column, index_t line_nb, const char* element_name, int* value_length);
#endif /* OBIDMSCOLUMN_QUAL_H_ */

View File

@ -27,14 +27,42 @@
int obi_column_set_obiseq_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx, const char* value)
{
index_t idx;
char* new_indexer_name;
if (obi_column_prepare_to_set_value(column, line_nb) < 0)
return -1;
// Add the value in the indexer
idx = obi_index_dna_seq(column->indexer, value);
if (idx == -1)
return -1;
if (value == OBISeq_NA)
{
idx = OBIIdx_NA;
}
else
{
// Add the value in the indexer
idx = obi_index_dna_seq(column->indexer, value);
if (idx == -1) // An error occurred
{
if (obi_errno == OBI_READ_ONLY_INDEXER_ERROR)
{
// TODO PUT IN A COLUMN FUNCTION
// If the error is that the indexer is read-only, clone it
new_indexer_name = obi_build_indexer_name((column->header)->name, (column->header)->version);
if (new_indexer_name == NULL)
return -1;
column->indexer = obi_clone_indexer(column->indexer, new_indexer_name); // TODO Need to lock this somehow?
strcpy((column->header)->indexer_name, new_indexer_name);
free(new_indexer_name);
obi_set_errno(0);
// Add the value in the new indexer
idx = obi_index_dna_seq(column->indexer, value);
if (idx == -1)
return -1;
}
else
return -1;
}
}
// Add the value's index in the column
*(((index_t*) (column->data)) + (line_nb * ((column->header)->nb_elements_per_line)) + element_idx) = idx;

View File

@ -27,14 +27,42 @@
int obi_column_set_obistr_with_elt_idx(OBIDMS_column_p column, index_t line_nb, index_t element_idx, const char* value)
{
index_t idx;
char* new_indexer_name;
if (obi_column_prepare_to_set_value(column, line_nb) < 0)
return -1;
// Add the value in the indexer
idx = obi_index_char_str(column->indexer, value);
if (idx == -1)
return -1;
if (value == OBIStr_NA)
{
idx = OBIIdx_NA;
}
else
{
// Add the value in the indexer
idx = obi_index_char_str(column->indexer, value);
if (idx == -1) // An error occurred
{
if (obi_errno == OBI_READ_ONLY_INDEXER_ERROR)
{
// TODO PUT IN A COLUMN FUNCTION
// If the error is that the indexer is read-only, clone it
new_indexer_name = obi_build_indexer_name((column->header)->name, (column->header)->version);
if (new_indexer_name == NULL)
return -1;
column->indexer = obi_clone_indexer(column->indexer, new_indexer_name); // TODO Need to lock this somehow?
strcpy((column->header)->indexer_name, new_indexer_name);
free(new_indexer_name);
obi_set_errno(0);
// Add the value in the new indexer
idx = obi_index_char_str(column->indexer, value);
if (idx == -1)
return -1;
}
else
return -1;
}
}
// Add the value's index in the column
*(((index_t*) (column->data)) + (line_nb * ((column->header)->nb_elements_per_line)) + element_idx) = idx;
@ -48,7 +76,7 @@ const char* obi_column_get_obistr_with_elt_idx(OBIDMS_column_p column, index_t l
index_t idx;
if (obi_column_prepare_to_get_value(column, line_nb) < 0)
return OBISeq_NA;
return OBIStr_NA;
idx = *(((index_t*) (column->data)) + (line_nb * ((column->header)->nb_elements_per_line)) + element_idx);

View File

@ -114,6 +114,10 @@ extern int obi_errno;
*/
#define OBI_INDEXER_ERROR (27) /** Error handling a blob indexer
*/
#define OBI_READ_ONLY_INDEXER_ERROR (28) /** Error trying to modify a read-only a blob indexer
*/
#define OBI_ALIGN_ERROR (29) /** Error while aligning sequences
*/
/**@}*/
#endif /* OBIERRNO_H_ */

View File

@ -40,6 +40,9 @@ size_t obi_sizeof(OBIType_t type)
case OBI_CHAR: size = sizeof(obichar_t);
break;
case OBI_QUAL: size = sizeof(index_t);
break;
case OBI_STR: size = sizeof(index_t);
break;
@ -96,6 +99,9 @@ char* name_data_type(int data_type)
case OBI_CHAR: name = strdup("OBI_CHAR");
break;
case OBI_QUAL: name = strdup("OBI_QUAL");
break;
case OBI_STR: name = strdup("OBI_STR");
break;

View File

@ -22,8 +22,10 @@
#define OBIIdx_NA (INT64_MIN) /**< NA value for indices */
#define OBIFloat_NA (float_NA()) /**< NA value for the type OBI_FLOAT */
#define OBIChar_NA (0) /**< NA value for the type OBI_CHAR */ // TODO not sure about this one as it can be impossible to distinguish from uninitialized values
#define OBISeq_NA ("\0") /**< NA value for the type OBI_SEQ */ // TODO discuss
#define OBIStr_NA ("\0") /**< NA value for the type OBI_STR */ // TODO discuss
#define OBISeq_NA (NULL) /**< NA value for the type OBI_SEQ */ // TODO discuss
#define OBIStr_NA (NULL) /**< NA value for the type OBI_STR */ // TODO discuss
#define OBIQual_char_NA (NULL) /**< NA value for the type OBI_QUAL if the quality is in character string format */
#define OBIQual_int_NA (NULL) /**< NA value for the type OBI_QUAL if the quality is in integer format */
/**
@ -45,6 +47,7 @@ typedef enum OBIType {
OBI_FLOAT, /**< a floating value (C type : double) */
OBI_BOOL, /**< a boolean true/false value, see obibool_t enum */
OBI_CHAR, /**< a character (C type : char) */
OBI_QUAL, /**< an index in a data structure (C type : int64_t) referring to a quality score array */
OBI_STR, /**< an index in a data structure (C type : int64_t) referring to a character string */
OBI_SEQ, /**< an index in a data structure (C type : int64_t) referring to a DNA sequence */
OBI_IDX /**< an index referring to a line in another column (C type : int64_t) */

File diff suppressed because it is too large Load Diff

View File

@ -17,6 +17,7 @@
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <time.h>
#include <math.h>
@ -33,8 +34,6 @@
*/
#define VIEW_TYPE_MAX_LENGTH (1024) /**< The maximum length of the type name of a view.
*/
#define OBIVIEW_FILE_NAME "obiviews" /**< The default name of a view file.
*/
#define LINES_COLUMN_NAME "LINES" /**< The name of the column containing the line selections
* in all views.
*/
@ -44,62 +43,17 @@
#define NUC_SEQUENCE_COLUMN "NUC_SEQ" /**< The name of the column containing the nucleotide sequences
* in NUC_SEQS_VIEW views.
*/
#define NUC_SEQUENCE_INDEXER "NUC_SEQ_INDEXER" /**< The name of the indexer containing the nucleotide sequences
* in NUC_SEQS_VIEW views.
*/
#define ID_COLUMN "ID" /**< The name of the column containing the sequence identifiers
* in NUC_SEQS_VIEW views.
*/
#define ID_INDEXER "ID_INDEXER" /**< The name of the indexer containing the sequence identifiers
* in NUC_SEQS_VIEW views.
*/
#define DEFINITION_COLUMN "DEFINITION" /**< The name of the column containing the sequence definitions
* in NUC_SEQS_VIEW views.
*/
#define DEFINITION_INDEXER "DEFINITION_INDEXER" /**< The name of the indexer containing the sequence definitions
#define QUALITY_COLUMN "QUALITY" /**< The name of the column containing the sequence qualities
* in NUC_SEQS_VIEW views.
*/
/**
* @brief Structure for an opened view.
*/
typedef struct Obiview {
OBIDMS_p dms; /**< A pointer on the DMS to which the view belongs.
*/
char name[OBIVIEW_NAME_MAX_LENGTH+1]; /**< Name of the view.
*/
char created_from[OBIVIEW_NAME_MAX_LENGTH+1]; /**< Name of the view from which that view was cloned if the view was cloned.
*/
char view_type[VIEW_TYPE_MAX_LENGTH+1]; /**< Type of the view if there is one.
* Types existing: NUC_SEQS_VIEW.
*/
bool read_only; /**< Whether the view is read-only or can be modified.
*/
OBIDMS_column_p line_selection; /**< A pointer on the column containing the line selection
* associated with the view if there is one.
* This line selection is read-only, and when a line from the view is read,
* it is this line selection that is used.
*/
OBIDMS_column_p new_line_selection; /**< A pointer on the column containing the new line selection being built
* to associate with the view, if there is one.
* When a line is selected with obi_select_line() or obi_select_lines(),
* it is recorded in this line selection.
*/
index_t line_count; /**< The number of lines in the view. Refers to the number of lines in each
* column of the view if line_selection is NULL, or to the line count of
* line_selection if it is not NULL.
*/
int column_count; /**< The number of columns in the view.
*/
OBIDMS_column_p columns[MAX_NB_OPENED_COLUMNS]; /**< Array of pointers on all the columns of the view.
*/
char comments[OBIVIEW_COMMENTS_MAX_LENGTH+1]; /**< Comments, additional informations on the view.
*/
} Obiview_t, *Obiview_p;
/**
* @brief Structure referencing a column by its name and its version.
*/
@ -117,8 +71,6 @@ typedef struct Column_reference {
* Once a view has been written in the view file, it can not be modified and can only be read.
*/
typedef struct Obiview_infos {
int view_number; /**< Number of the view in the view file.
*/
time_t creation_date; /**< Time at which the view was written in the view file.
*/
char name[OBIVIEW_NAME_MAX_LENGTH+1]; /**< Name of the view, used to identify it.
@ -143,7 +95,29 @@ typedef struct Obiview_infos {
} Obiview_infos_t, *Obiview_infos_p;
// TODO : Combine the common elements of the Obiview_infos and Obiview structures in one structure used by both?
/**
* @brief Structure for an opened view.
*/
typedef struct Obiview {
Obiview_infos_p infos; /**< A pointer on the mapped view informations.
*/
OBIDMS_p dms; /**< A pointer on the DMS to which the view belongs.
*/
bool read_only; /**< Whether the view is read-only or can be modified.
*/
OBIDMS_column_p line_selection; /**< A pointer on the column containing the line selection
* associated with the view if there is one.
* This line selection is read-only, and when a line from the view is read,
* it is this line selection that is used.
*/
OBIDMS_column_p new_line_selection; /**< A pointer on the column containing the new line selection being built
* to associate with the view, if there is one.
* When a line is selected with obi_select_line() or obi_select_lines(),
* it is recorded in this line selection.
*/
OBIDMS_column_p columns[MAX_NB_OPENED_COLUMNS]; /**< Array of pointers on all the columns of the view.
*/
} Obiview_t, *Obiview_p;
/**
@ -224,6 +198,7 @@ Obiview_p obi_new_view_cloned_from_name(OBIDMS_p dms, const char* view_name, con
* - NUC_SEQUENCE_COLUMN where nucleotide sequences are stored
* - ID_COLUMN where sequence identifiers are stored
* - DEFINITION_COLUMN where sequence definitions are stored
* - QUALITY_COLUMN where sequence qualities are stored
*
* @param dms A pointer on the OBIDMS.
* @param view_name The unique name of the view.
@ -255,6 +230,7 @@ Obiview_p obi_new_view_nuc_seqs(OBIDMS_p dms, const char* view_name, Obiview_p v
* - NUC_SEQUENCE_COLUMN where nucleotide sequences are stored
* - ID_COLUMN where sequence identifiers are stored
* - DEFINITION_COLUMN where sequence definitions are stored
* - QUALITY_COLUMN where sequence qualities are stored
*
* @param dms A pointer on the OBIDMS.
* @param view_name The unique name of the new view.
@ -272,6 +248,37 @@ Obiview_p obi_new_view_nuc_seqs(OBIDMS_p dms, const char* view_name, Obiview_p v
Obiview_p obi_new_view_nuc_seqs_cloned_from_name(OBIDMS_p dms, const char* view_name, const char* view_to_clone_name, index_t* line_selection, const char* comments);
/**
* @brief Maps a view file and returns the mapped structure stored in it.
*
* @param dms A pointer on the OBIDMS.
* @param view_name The unique name identifying the view.
*
* @returns A pointer on the mapped view infos structure.
* @retval NULL if an error occurred.
*
* @since June 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
Obiview_infos_p obi_view_map_file(OBIDMS_p dms, const char* view_name);
/**
* @brief Unmaps a view file.
*
* @param dms A pointer on the OBIDMS.
* @param view_infos A pointer on the mapped view infos structure.
*
* @returns A value indicating the success of the operation.
* @retval 0 if the operation was successfully completed.
* @retval -1 if an error occurred.
*
* @since June 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_view_unmap_file(OBIDMS_p dms, Obiview_infos_p view_infos);
/**
* @brief Opens a view identified by its name stored in the view file.
*
@ -812,6 +819,194 @@ int obi_column_set_obiint_with_elt_name_in_view(Obiview_p view, OBIDMS_column_p
obiint_t obi_column_get_obiint_with_elt_name_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, const char* element_name);
/**
* @brief Sets a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line,
* in the context of a view.
*
* This function is for qualities in the character string format.
*
* @warning Pointers returned by obi_open_column() don't allow writing.
*
* @param view A pointer on the opened view.
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be set.
* @param element_idx The index of the element that should be set in the line.
* @param value The value that should be set, in the character string format.
*
* @returns An integer value indicating the success of the operation.
* @retval 0 on success.
* @retval -1 if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_column_set_obiqual_char_with_elt_idx_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, index_t element_idx, const char* value);
/**
* @brief Sets a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line,
* in the context of a view.
*
* This function is for qualities in the integer format.
*
* @warning Pointers returned by obi_open_column() don't allow writing.
*
* @param view A pointer on the opened view.
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be set.
* @param element_idx The index of the element that should be set in the line.
* @param value The value that should be set, in the integer array format.
* @param value_length The length of the integer array.
*
* @returns An integer value indicating the success of the operation.
* @retval 0 on success.
* @retval -1 if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_column_set_obiqual_int_with_elt_idx_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, index_t element_idx, const uint8_t* value, int value_length);
/**
* @brief Recovers a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line,
* in the context of a view.
*
* This function returns quality scores in the character string format.
*
* @param view A pointer on the opened view.
* @param column A pointer as returned by obi_create_column().
* @param line_nb The number of the line where the value should be recovered.
* @param element_idx The index of the element that should be recovered in the line.
*
* @returns The recovered value, in the character string format.
* @retval OBIQual_char_NA the NA value of the type if an error occurred and obi_errno is set.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
char* obi_column_get_obiqual_char_with_elt_idx_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, index_t element_idx);
/**
* @brief Recovers a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line,
* in the context of a view.
*
* This function returns quality scores in the integer format.
*
* @param view A pointer on the opened view.
* @param column A pointer as returned by obi_create_column().
* @param line_nb The number of the line where the value should be recovered.
* @param element_idx The index of the element that should be recovered in the line.
* @param value_length A pointer on an integer to store the length of the integer array recovered.
*
* @returns The recovered value, in the integer array format.
* @retval OBIQual_int_NA the NA value of the type if an error occurred and obi_errno is set.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
const uint8_t* obi_column_get_obiqual_int_with_elt_idx_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, index_t element_idx, int* value_length);
/**
* @brief Sets a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line,
* in the context of a view.
*
* This function is for quality scores in the character string format.
*
* @warning Pointers returned by obi_open_column() don't allow writing.
*
* @param view A pointer on the opened view.
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be set.
* @param element_name The name of the element that should be set in the line.
* @param value The value that should be set, in the character string format.
*
* @returns An integer value indicating the success of the operation.
* @retval 0 on success.
* @retval -1 if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_column_set_obiqual_char_with_elt_name_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, const char* element_name, const char* value);
/**
* @brief Sets a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line,
* in the context of a view.
*
* This function is for quality scores in the integer array format.
*
* @warning Pointers returned by obi_open_column() don't allow writing.
*
* @param view A pointer on the opened view.
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be set.
* @param element_name The name of the element that should be set in the line.
* @param value The value that should be set, in the integer format.
* @param value_length The length of the integer array.
*
* @returns An integer value indicating the success of the operation.
* @retval 0 on success.
* @retval -1 if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
int obi_column_set_obiqual_int_with_elt_name_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, const char* element_name, const uint8_t* value, int value_length);
/**
* @brief Recovers a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line,
* in the context of a view.
*
* This function returns quality scores in the character string format.
*
* @param view A pointer on the opened view.
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be recovered.
* @param element_name The name of the element that should be recovered in the line.
*
* @returns The recovered value, in the character string format.
* @retval OBIQual_char_NA the NA value of the type if an error occurred and obi_errno is set.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
char* obi_column_get_obiqual_char_with_elt_name_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, const char* element_name);
/**
* @brief Recovers a value in an OBIDMS column containing data in the form of indices referring
* to sequence qualities handled by an indexer, and using the index of the element in the column's line,
* in the context of a view.
*
* This function returns quality scores in the integer array format.
*
* @param view A pointer on the opened view.
* @param column A pointer as returned by obi_create_column() or obi_clone_column().
* @param line_nb The number of the line where the value should be recovered.
* @param element_name The name of the element that should be recovered in the line.
* @param value_length A pointer on an integer to store the length of the integer array recovered.
*
* @returns The recovered value, in the integer format.
* @retval OBIQual_int_NA the NA value of the type if an error occurred and obi_errno is set.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
const uint8_t* obi_column_get_obiqual_int_with_elt_name_in_view(Obiview_p view, OBIDMS_column_p column, index_t line_nb, const char* element_name, int* value_length);
/**
* @brief Sets a value in an OBIDMS column containing data with the type OBI_SEQ, using the index of the element in the line,
* in the context of a view.

View File

@ -0,0 +1,696 @@
/*
* sse_banded_LCS_alignment.c
*
* Created on: 7 nov. 2012
* Author: celine mercier
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdint.h>
#include <stdbool.h>
#include "obierrno.h"
#include "obidebug.h"
#include "utils.h"
#include "_sse.h"
#include "sse_banded_LCS_alignment.h"
#define DEBUG_LEVEL 0 // TODO has to be defined somewhere else (cython compil flag?)
static void printreg(__m128i r)
{
int16_t a0,a1,a2,a3,a4,a5,a6,a7;
a0= _MM_EXTRACT_EPI16(r,0);
a1= _MM_EXTRACT_EPI16(r,1);
a2= _MM_EXTRACT_EPI16(r,2);
a3= _MM_EXTRACT_EPI16(r,3);
a4= _MM_EXTRACT_EPI16(r,4);
a5= _MM_EXTRACT_EPI16(r,5);
a6= _MM_EXTRACT_EPI16(r,6);
a7= _MM_EXTRACT_EPI16(r,7);
fprintf(stderr, "a00 :-> %7d %7d %7d %7d "
" %7d %7d %7d %7d "
"\n"
, a0,a1,a2,a3,a4,a5,a6,a7
);
}
static inline int extract_reg(__m128i r, int p)
{
switch (p) {
case 0: return(_MM_EXTRACT_EPI16(r,0));
case 1: return(_MM_EXTRACT_EPI16(r,1));
case 2: return(_MM_EXTRACT_EPI16(r,2));
case 3: return(_MM_EXTRACT_EPI16(r,3));
case 4: return(_MM_EXTRACT_EPI16(r,4));
case 5: return(_MM_EXTRACT_EPI16(r,5));
case 6: return(_MM_EXTRACT_EPI16(r,6));
case 7: return(_MM_EXTRACT_EPI16(r,7));
}
return(0);
}
// TODO warning on length order
void sse_banded_align_lcs_and_ali_len(int16_t* seq1, int16_t* seq2, int l1, int l2, int bandLengthLeft, int bandLengthTotal, int16_t* address, double* lcs_length, int* ali_length)
{
register int j;
int k1, k2;
int max, diff;
int l_reg, l_loc;
int line;
int numberOfRegistersPerLine;
int numberOfRegistersFor3Lines;
bool even_line;
bool odd_line;
bool even_BLL;
bool odd_BLL;
um128* SSEregisters;
um128* p_diag;
um128* p_gap1;
um128* p_gap2;
um128* p_diag_j;
um128* p_gap1_j;
um128* p_gap2_j;
um128 current;
um128* l_ali_SSEregisters;
um128* p_l_ali_diag;
um128* p_l_ali_gap1;
um128* p_l_ali_gap2;
um128* p_l_ali_diag_j;
um128* p_l_ali_gap1_j;
um128* p_l_ali_gap2_j;
um128 l_ali_current;
um128 nucs1;
um128 nucs2;
um128 scores;
um128 boolean_reg;
// Initialisations
odd_BLL = bandLengthLeft & 1;
even_BLL = !odd_BLL;
max = INT16_MAX - l1;
numberOfRegistersPerLine = bandLengthTotal / 8;
numberOfRegistersFor3Lines = 3 * numberOfRegistersPerLine;
SSEregisters = (um128*) calloc(numberOfRegistersFor3Lines * 2, sizeof(um128));
l_ali_SSEregisters = SSEregisters + numberOfRegistersFor3Lines;
// preparer registres SSE
for (j=0; j<numberOfRegistersFor3Lines; j++)
l_ali_SSEregisters[j].i = _MM_LOAD_SI128(address+j*8);
p_diag = SSEregisters;
p_gap1 = SSEregisters+numberOfRegistersPerLine;
p_gap2 = SSEregisters+2*numberOfRegistersPerLine;
p_l_ali_diag = l_ali_SSEregisters;
p_l_ali_gap1 = l_ali_SSEregisters+numberOfRegistersPerLine;
p_l_ali_gap2 = l_ali_SSEregisters+2*numberOfRegistersPerLine;
// Loop on diagonals = 'lines' :
for (line=2; line <= l1+l2; line++)
{
odd_line = line & 1;
even_line = !odd_line;
// loop on the registers of a line :
for (j=0; j < numberOfRegistersPerLine; j++)
{
p_diag_j = p_diag+j;
p_gap1_j = p_gap1+j;
p_gap2_j = p_gap2+j;
p_l_ali_diag_j = p_l_ali_diag+j;
p_l_ali_gap1_j = p_l_ali_gap1+j;
p_l_ali_gap2_j = p_l_ali_gap2+j;
// comparing nucleotides for diagonal scores :
// k1 = position of the 1st nucleotide to align for seq1 and k2 = position of the 1st nucleotide to align for seq2
if (odd_line && odd_BLL)
k1 = (line / 2) + ((bandLengthLeft+1) / 2) - j*8;
else
k1 = (line / 2) + (bandLengthLeft/2) - j*8;
k2 = line - k1 - 1;
nucs1.i = _MM_LOADU_SI128(seq1+l1-k1);
nucs2.i = _MM_LOADU_SI128(seq2+k2);
/* fprintf(stderr, "\nnucs, r %d, k1 = %d, k2 = %d\n", j, k1, k2);
printreg(nucs1.i);
printreg(nucs2.i);
*/
// computing diagonal score :
scores.i = _MM_AND_SI128(_MM_CMPEQ_EPI16(nucs1.i, nucs2.i), _MM_SET1_EPI16(1));
current.i = _MM_ADDS_EPU16(p_diag_j->i, scores.i);
// Computing alignment length
l_ali_current.i = p_l_ali_diag_j->i;
boolean_reg.i = _MM_CMPGT_EPI16(p_gap1_j->i, current.i);
l_ali_current.i = _MM_OR_SI128(
_MM_AND_SI128(p_l_ali_gap1_j->i, boolean_reg.i),
_MM_ANDNOT_SI128(boolean_reg.i, l_ali_current.i));
current.i = _MM_OR_SI128(
_MM_AND_SI128(p_gap1_j->i, boolean_reg.i),
_MM_ANDNOT_SI128(boolean_reg.i, current.i));
boolean_reg.i = _MM_AND_SI128(
_MM_CMPEQ_EPI16(p_gap1_j->i, current.i),
_MM_CMPLT_EPI16(p_l_ali_gap1_j->i, l_ali_current.i));
l_ali_current.i = _MM_OR_SI128(
_MM_AND_SI128(p_l_ali_gap1_j->i, boolean_reg.i),
_MM_ANDNOT_SI128(boolean_reg.i, l_ali_current.i));
current.i = _MM_OR_SI128(
_MM_AND_SI128(p_gap1_j->i, boolean_reg.i),
_MM_ANDNOT_SI128(boolean_reg.i, current.i));
boolean_reg.i = _MM_CMPGT_EPI16(p_gap2_j->i, current.i);
l_ali_current.i = _MM_OR_SI128(
_MM_AND_SI128(p_l_ali_gap2_j->i, boolean_reg.i),
_MM_ANDNOT_SI128(boolean_reg.i, l_ali_current.i));
current.i = _MM_OR_SI128(
_MM_AND_SI128(p_gap2_j->i, boolean_reg.i),
_MM_ANDNOT_SI128(boolean_reg.i, current.i));
boolean_reg.i = _MM_AND_SI128(
_MM_CMPEQ_EPI16(p_gap2_j->i, current.i),
_MM_CMPLT_EPI16(p_l_ali_gap2_j->i, l_ali_current.i));
l_ali_current.i = _MM_OR_SI128(
_MM_AND_SI128(p_l_ali_gap2_j->i, boolean_reg.i),
_MM_ANDNOT_SI128(boolean_reg.i, l_ali_current.i));
current.i = _MM_OR_SI128(
_MM_AND_SI128(p_gap2_j->i, boolean_reg.i),
_MM_ANDNOT_SI128(boolean_reg.i, current.i));
/*
fprintf(stderr, "\nline = %d", line);
fprintf(stderr, "\nDiag, r %d : ", j);
printreg((*(p_diag_j)).i);
fprintf(stderr, "Gap1 : ");
printreg((*(p_gap1_j)).i);
fprintf(stderr, "Gap2 : ");
printreg((*(p_gap2_j)).i);
fprintf(stderr, "current : ");
printreg(current.i);
fprintf(stderr, "L ALI\nDiag r %d : ", j);
printreg((*(p_l_ali_diag_j)).i);
fprintf(stderr, "Gap1 : ");
printreg((*(p_l_ali_gap1_j)).i);
fprintf(stderr, "Gap2 : ");
printreg((*(p_l_ali_gap2_j)).i);
fprintf(stderr, "current : ");
printreg(l_ali_current.i);
*/
// diag = gap1 and gap1 = current
p_diag_j->i = p_gap1_j->i;
p_gap1_j->i = current.i;
// l_ali_diag = l_ali_gap1 and l_ali_gap1 = l_ali_current+1
p_l_ali_diag_j->i = p_l_ali_gap1_j->i;
p_l_ali_gap1_j->i = _MM_ADD_EPI16(l_ali_current.i, _MM_SET1_EPI16(1));
}
// shifts for gap2, to do only once all the registers of a line have been computed Copier gap2 puis le charger depuis la copie?
for (j=0; j < numberOfRegistersPerLine; j++)
{
if ((odd_line && even_BLL) || (even_line && odd_BLL))
{
p_gap2[j].i = _MM_LOADU_SI128((p_gap1[j].s16)-1);
p_l_ali_gap2[j].i = _MM_LOADU_SI128((p_l_ali_gap1[j].s16)-1);
if (j == 0)
{
p_gap2[j].i = _MM_INSERT_EPI16(p_gap2[j].i, 0, 0);
p_l_ali_gap2[j].i = _MM_INSERT_EPI16(p_l_ali_gap2[j].i, max, 0);
}
}
else
{
p_gap2[j].i = _MM_LOADU_SI128(p_gap1[j].s16+1);
p_l_ali_gap2[j].i = _MM_LOADU_SI128(p_l_ali_gap1[j].s16+1);
if (j == numberOfRegistersPerLine - 1)
{
p_gap2[j].i = _MM_INSERT_EPI16(p_gap2[j].i, 0, 7);
p_l_ali_gap2[j].i = _MM_INSERT_EPI16(p_l_ali_gap2[j].i, max, 7);
}
}
}
// end shifts for gap2
}
/* /// Recovering LCS and alignment lengths \\\ */
// finding the location of the results in the registers :
diff = l1-l2;
if ((diff & 1) && odd_BLL)
l_loc = (int) floor((double)(bandLengthLeft) / (double)2) - floor((double)(diff) / (double)2);
else
l_loc = (int) floor((double)(bandLengthLeft) / (double)2) - ceil((double)(diff) / (double)2);
l_reg = (int)floor((double)l_loc/(double)8.0);
//fprintf(stderr, "\nl_reg = %d, l_loc = %d\n", l_reg, l_loc);
l_loc = l_loc - l_reg*8;
// extracting the results from the registers :
*lcs_length = extract_reg(p_gap1[l_reg].i, l_loc);
*ali_length = extract_reg(p_l_ali_gap1[l_reg].i, l_loc) - 1;
// freeing the registers
free(SSEregisters);
}
// TODO warning on length order
double sse_banded_align_just_lcs(int16_t* seq1, int16_t* seq2, int l1, int l2, int bandLengthLeft, int bandLengthTotal)
{
register int j;
int k1, k2;
int diff;
int l_reg, l_loc;
int16_t l_lcs;
int line;
int numberOfRegistersPerLine;
int numberOfRegistersFor3Lines;
bool even_line;
bool odd_line;
bool even_BLL;
bool odd_BLL;
um128* SSEregisters;
um128* p_diag;
um128* p_gap1;
um128* p_gap2;
um128* p_diag_j;
um128* p_gap1_j;
um128* p_gap2_j;
um128 current;
um128 nucs1;
um128 nucs2;
um128 scores;
// Initialisations
odd_BLL = bandLengthLeft & 1;
even_BLL = !odd_BLL;
numberOfRegistersPerLine = bandLengthTotal / 8;
numberOfRegistersFor3Lines = 3 * numberOfRegistersPerLine;
SSEregisters = malloc(numberOfRegistersFor3Lines * sizeof(um128));
if (SSEregisters == NULL)
{
obi_set_errno(OBI_MALLOC_ERROR);
obidebug(1, "\nError allocating memory for SSE registers for LCS alignment");
return 0; // TODO DOUBLE_MIN?
}
// preparer registres SSE
for (j=0; j<numberOfRegistersFor3Lines; j++)
(*(SSEregisters+j)).i = _MM_SETZERO_SI128();
p_diag = SSEregisters;
p_gap1 = SSEregisters+numberOfRegistersPerLine;
p_gap2 = SSEregisters+2*numberOfRegistersPerLine;
// Loop on diagonals = 'lines' :
for (line=2; line <= l1+l2; line++)
{
odd_line = line & 1;
even_line = !odd_line;
// loop on the registers of a line :
for (j=0; j < numberOfRegistersPerLine; j++)
{
p_diag_j = p_diag+j;
p_gap1_j = p_gap1+j;
p_gap2_j = p_gap2+j;
// comparing nucleotides for diagonal scores :
// k1 = position of the 1st nucleotide to align for seq1 and k2 = position of the 1st nucleotide to align for seq2
if (odd_line && odd_BLL)
k1 = (line / 2) + ((bandLengthLeft+1) / 2) - j*8;
else
k1 = (line / 2) + (bandLengthLeft/2) - j*8;
k2 = line - k1 - 1;
nucs1.i = _MM_LOADU_SI128(seq1+l1-k1);
nucs2.i = _MM_LOADU_SI128(seq2+k2);
// computing diagonal score :
scores.i = _MM_AND_SI128(_MM_CMPEQ_EPI16(nucs1.i, nucs2.i), _MM_SET1_EPI16(1));
current.i = _MM_ADDS_EPU16((*(p_diag_j)).i, scores.i);
// current = max(gap1, current)
current.i = _MM_MAX_EPI16((*(p_gap1_j)).i, current.i);
// current = max(gap2, current)
current.i = _MM_MAX_EPI16((*(p_gap2_j)).i, current.i);
// diag = gap1 and gap1 = current
(*(p_diag_j)).i = (*(p_gap1_j)).i;
(*(p_gap1_j)).i = current.i;
}
// shifts for gap2, to do only once all the registers of a line have been computed
for (j=0; j < numberOfRegistersPerLine; j++)
{
if ((odd_line && even_BLL) || (even_line && odd_BLL))
{
(*(p_gap2+j)).i = _MM_LOADU_SI128(((*(p_gap1+j)).s16)-1);
if (j == 0)
{
(*(p_gap2+j)).i = _MM_INSERT_EPI16((*(p_gap2+j)).i, 0, 0);
}
}
else
{
(*(p_gap2+j)).i = _MM_LOADU_SI128(((*(p_gap1+j)).s16)+1);
if (j == numberOfRegistersPerLine - 1)
{
(*(p_gap2+j)).i = _MM_INSERT_EPI16((*(p_gap2+j)).i, 0, 7);
}
}
}
// end shifts for gap2
}
/* /// Recovering LCS and alignment lengths \\\ */
// finding the location of the results in the registers :
diff = l1-l2;
if ((diff & 1) && odd_BLL)
l_loc = (int) floor((double)(bandLengthLeft) / (double)2) - floor((double)(diff) / (double)2);
else
l_loc = (int) floor((double)(bandLengthLeft) / (double)2) - ceil((double)(diff) / (double)2);
l_reg = (int)floor((double)l_loc/(double)8.0);
//fprintf(stderr, "\nl_reg = %d, l_loc = %d\n", l_reg, l_loc);
l_loc = l_loc - l_reg*8;
// extracting LCS from the registers :
l_lcs = extract_reg((*(p_gap1+l_reg)).i, l_loc);
// freeing the registers
free(SSEregisters);
return((double) l_lcs);
}
int calculateLeftBandLength(int lmax, int LCSmin)
{
return (lmax - LCSmin);
}
int calculateRightBandLength(int lmin, int LCSmin)
{
return (lmin - LCSmin);
}
int calculateSSEBandLength(int bandLengthRight, int bandLengthLeft)
{
int bandLengthTotal= (double)(bandLengthRight + bandLengthLeft) / 2.0 + 1.0;
return (bandLengthTotal & (~ (int)7)) + (( bandLengthTotal & (int)7) ? 8:0); // Calcule le multiple de 8 superieur
}
// TODO that's gonna be fun to doc
int calculateSizeToAllocate(int maxLen, int minLen, int LCSmin)
{
int size;
size = calculateLeftBandLength(maxLen, LCSmin);
size *= 2;
size = (size & (~ (int)7)) + (( size & (int)7) ? 8:0); // Closest greater 8 multiple
size *= 3;
size += 16;
return(size*sizeof(int16_t));
}
void iniSeq(int16_t* seq, int size, int16_t iniValue)
{
int16_t* target = seq;
int16_t* end = target + (size_t)size;
for (; target < end; target++)
*target = iniValue;
}
void putSeqInSeq(int16_t* seq, char* s, int l, bool reverse)
{
int16_t *target=seq;
int16_t *end = target + (size_t)l;
char *source=s;
if (reverse)
for (source=s + (size_t)l-1; target < end; target++, source--)
*target=*source;
else
for (; target < end; source++,target++)
*target=*source;
}
void initializeAddressWithGaps(int16_t* address, int bandLengthTotal, int bandLengthLeft, int l1)
{
int i;
int address_00, x_address_10, address_01, address_01_shifted;
int numberOfRegistersPerLine;
int bm;
int value=INT16_MAX-l1;
numberOfRegistersPerLine = bandLengthTotal / 8;
bm = bandLengthLeft%2;
for (i=0; i < (3*numberOfRegistersPerLine*8); i++)
address[i] = value;
// 0,0 set to 1 and 0,1 and 1,0 set to 2
address_00 = bandLengthLeft / 2;
x_address_10 = address_00 + bm - 1;
address_01 = numberOfRegistersPerLine*8 + x_address_10;
address_01_shifted = numberOfRegistersPerLine*16 + address_00 - bm;
// fill address_00, address_01,+1, address_01_shifted,+1
address[address_00] = 1;
address[address_01] = 2;
address[address_01+1] = 2;
address[address_01_shifted] = 2;
address[address_01_shifted+1] = 2;
}
// TODO warning on length order
double sse_banded_lcs_align(int16_t* seq1, int16_t* seq2, int l1, int l2, bool normalize, int reference, bool similarity_mode, int16_t* address, int LCSmin)
{
double id;
int bandLengthRight, bandLengthLeft, bandLengthTotal;
int ali_length;
bandLengthLeft = calculateLeftBandLength(l1, LCSmin);
bandLengthRight = calculateRightBandLength(l2, LCSmin);
// fprintf(stderr, "\nBLL = %d, BLR = %d, LCSmin = %d\n", bandLengthLeft, bandLengthRight, LCSmin);
bandLengthTotal = calculateSSEBandLength(bandLengthRight, bandLengthLeft);
// fprintf(stderr, "\nBLT = %d\n", bandLengthTotal);
if ((reference == ALILEN) && (normalize || !similarity_mode))
{
initializeAddressWithGaps(address, bandLengthTotal, bandLengthLeft, l1);
sse_banded_align_lcs_and_ali_len(seq1, seq2, l1, l2, bandLengthLeft, bandLengthTotal, address, &id, &ali_length);
}
else
id = sse_banded_align_just_lcs(seq1, seq2, l1, l2, bandLengthLeft, bandLengthTotal);
// fprintf(stderr, "\nid before normalizations = %f", id);
// fprintf(stderr, "\nlcs = %f, ali = %d\n", id, ali_length);
if (!similarity_mode && !normalize)
switch(reference) {
case ALILEN: id = ali_length - id;
break;
case MAXLEN: id = l1 - id;
break;
case MINLEN: id = l2 - id;
}
// fprintf(stderr, "\n2>>> %f, %d\n", id, ali_length);
if (normalize)
switch(reference) {
case ALILEN: id = id / (double) ali_length;
break;
case MAXLEN: id = id / (double) l1;
break;
case MINLEN: id = id / (double) l2;
}
// fprintf(stderr, "\nid = %f\n", id);
return(id);
}
// PUBLIC FUNCTIONS
int calculateLCSmin(int l1, int l2, double threshold, bool normalize, int reference, bool similarity_mode)
{
int LCSmin;
if (threshold > 0)
{
if (normalize)
{
if (reference == MINLEN)
LCSmin = threshold*l2;
else // ref = maxlen or alilen
LCSmin = threshold*l1;
}
else if (similarity_mode)
LCSmin = threshold;
else if (reference == MINLEN) // not similarity_mode
LCSmin = l2 - threshold;
else // not similarity_mode and ref = maxlen or alilen
LCSmin = l1 - threshold;
}
else
LCSmin = 0;
return(LCSmin);
}
double generic_sse_banded_lcs_align(char* seq1, char* seq2, double threshold, bool normalize, int reference, bool similarity_mode)
{
double id;
int l1, l2;
int lmax, lmin;
int sizeToAllocateForBand, sizeToAllocateForSeqs;
int maxBLL;
int LCSmin;
int shift;
int16_t* address;
int16_t* iseq1;
int16_t* iseq2;
address = NULL;
l1 = strlen(seq1);
l2 = strlen(seq2);
if (l1 > l2)
{
lmax = l1;
lmin = l2;
}
else
{
lmax = l2;
lmin = l1;
}
// If the score is expressed as a normalized distance, get the corresponding identity
if (!similarity_mode && normalize)
threshold = 1.0 - threshold;
// Calculate the minimum LCS length corresponding to the threshold
LCSmin = calculateLCSmin(lmax, lmin, threshold, normalize, reference, similarity_mode);
// Allocate space for matrix band if the alignment length must be computed
if ((reference == ALILEN) && (normalize || !similarity_mode)) // cases in which alignment length must be computed
{
sizeToAllocateForBand = calculateSizeToAllocate(lmax, lmin, LCSmin);
address = obi_get_memory_aligned_on_16(sizeToAllocateForBand, &shift);
if (address == NULL)
{
obi_set_errno(OBI_MALLOC_ERROR);
obidebug(1, "\nError getting a memory address aligned on 16 bytes boundary");
return 0; // TODO DOUBLE_MIN
}
}
// Allocate space for the int16_t arrays representing the sequences
maxBLL = calculateLeftBandLength(lmax, LCSmin);
sizeToAllocateForSeqs = 2*maxBLL+lmax;
iseq1 = (int16_t*) malloc(sizeToAllocateForSeqs*sizeof(int16_t));
iseq2 = (int16_t*) malloc(sizeToAllocateForSeqs*sizeof(int16_t));
if ((iseq1 == NULL) || (iseq2 == NULL))
{
obi_set_errno(OBI_MALLOC_ERROR);
obidebug(1, "\nError allocating memory for integer arrays to use in LCS alignment");
return 0; // TODO DOUBLE_MIN
}
// Initialize the int arrays
iniSeq(iseq1, (2*maxBLL)+lmax, 0);
iniSeq(iseq2, (2*maxBLL)+lmax, 255);
// Shift addresses to where the sequences have to be put
iseq1 = iseq1+maxBLL;
iseq2 = iseq2+maxBLL;
// Put the DNA sequences in the int arrays. Longest sequence must be first argument of sse_align function
if (l2 > l1)
{
putSeqInSeq(iseq1, seq2, l2, TRUE);
putSeqInSeq(iseq2, seq1, l1, FALSE);
// Compute alignment
id = sse_banded_lcs_align(iseq1, iseq2, l2, l1, normalize, reference, similarity_mode, address, LCSmin);
}
else
{
putSeqInSeq(iseq1, seq1, l1, TRUE);
putSeqInSeq(iseq2, seq2, l2, FALSE);
// Compute alignment
id = sse_banded_lcs_align(iseq1, iseq2, l1, l2, normalize, reference, similarity_mode, address, LCSmin);
}
// Free allocated elements
if (address != NULL)
free(address-shift);
free(iseq1-maxBLL);
free(iseq2-maxBLL);
return(id);
}

View File

@ -0,0 +1,23 @@
/*
* sse_banded_LCS_alignment.h
*
* Created on: november 29, 2012
* Author: mercier
*/
#ifndef SSE_BANDED_LCS_ALIGNMENT_H_
#define SSE_BANDED_LCS_ALIGNMENT_H_
#include <stdint.h>
#include <stdbool.h>
#define ALILEN (0) // TODO enum
#define MAXLEN (1)
#define MINLEN (2)
// TODO doc
int calculateLCSmin(int l1, int l2, double threshold, bool normalize, int reference, bool lcsmode);
double generic_sse_banded_lcs_align(char* seq1, char* seq2, double threshold, bool normalize, int reference, bool similarity_mode);
#endif

70
src/uint8_indexer.c Normal file
View File

@ -0,0 +1,70 @@
/****************************************************************************
* Uint8 indexing functions *
****************************************************************************/
/**
* @file uint8_indexer.c
* @author Celine Mercier
* @date May 4th 2016
* @brief Functions handling the indexing and retrieval of uint8 arrays.
*/
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <math.h>
#include "uint8_indexer.h"
#include "obiblob.h"
#include "obiblob_indexer.h"
#include "obidebug.h"
#include "obitypes.h"
#define DEBUG_LEVEL 0 // TODO has to be defined somewhere else (cython compil flag?)
Obi_blob_p obi_uint8_to_blob(const uint8_t* value, int value_length)
{
return obi_blob((byte_t*)value, ELEMENT_SIZE_UINT8, value_length, value_length);
}
const uint8_t* obi_blob_to_uint8(Obi_blob_p value_b)
{
return ((uint8_t*) (value_b->value));
}
index_t obi_index_uint8(Obi_indexer_p indexer, const uint8_t* value, int value_length)
{
Obi_blob_p value_b;
index_t idx;
// Encode value
value_b = obi_uint8_to_blob(value, value_length);
if (value_b == NULL)
return -1;
// Add in the indexer
idx = obi_indexer_add(indexer, value_b);
free(value_b);
return idx;
}
const uint8_t* obi_retrieve_uint8(Obi_indexer_p indexer, index_t idx, int* value_length)
{
Obi_blob_p value_b;
// Get encoded value
value_b = obi_indexer_get(indexer, idx);
// Return decoded sequence
*value_length = value_b->length_decoded_value;
return obi_blob_to_uint8(value_b);
}

92
src/uint8_indexer.h Normal file
View File

@ -0,0 +1,92 @@
/****************************************************************************
* uint8 indexer header file *
****************************************************************************/
/**
* @file uint8_indexer.h
* @author Celine Mercier
* @date May 4th 2016
* @brief Header file for the functions handling the indexing of uint8 arrays.
*/
#ifndef UINT8_INDEXER_H_
#define UINT8_INDEXER_H_
#include <stdlib.h>
#include <stdio.h>
#include "obidms.h"
#include "obitypes.h"
#include "obiblob.h"
#include "obiblob_indexer.h"
/**
* @brief Converts an uint8 array to a blob.
*
* @warning The blob must be freed by the caller.
*
* @param value The uint8 array to convert.
* @param value_length The length of the uint8 array to convert.
*
* @returns A pointer on the blob created.
* @retval NULL if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
Obi_blob_p obi_uint8_to_blob(const uint8_t* value, int value_length);
/**
* @brief Converts a blob to an uint8 array.
*
* @warning The array returned is mapped.
*
* @param value_b The blob to convert.
*
* @returns A pointer on the uint8 array contained in the blob.
* @retval NULL if an error occurred.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
const uint8_t* obi_blob_to_uint8(Obi_blob_p value_b);
/**
* @brief Stores an uint8 array in an indexer and returns the index.
*
* @param indexer The indexer structure.
* @param value The uint8 array to index.
* @param value_length The length of the uint8 array to index.
*
* @returns The index referring to the stored uint8 array in the indexer.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
index_t obi_index_uint8(Obi_indexer_p indexer, const uint8_t* value, int value_length);
/**
* @brief Retrieves an uint8 array from an indexer.
*
* @warning The array returned is mapped.
*
* @param indexer The indexer structure.
* @param idx The index referring to the uint8 array to retrieve in the indexer.
* @param value_length A pointer on an integer to store the length of the array retrieved.
*
* @returns A pointer on the uint8 array.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
const uint8_t* obi_retrieve_uint8(Obi_indexer_p indexer, index_t idx, int* value_length);
#endif /* UINT8_INDEXER_H_ */

View File

@ -65,6 +65,12 @@ char* obi_format_date(time_t date)
struct tm* tmp;
formatted_time = (char*) malloc(FORMATTED_TIME_LENGTH*sizeof(char));
if (formatted_time == NULL)
{
obi_set_errno(OBI_MALLOC_ERROR);
obidebug(1, "\nError allocating memory to format a date");
return NULL;
}
tmp = localtime(&date);
if (tmp == NULL)
@ -84,3 +90,27 @@ char* obi_format_date(time_t date)
return formatted_time;
}
void* obi_get_memory_aligned_on_16(int size, int* shift)
{
void* memory;
*shift = 0;
memory = (void*) malloc(size);
if (memory == NULL)
{
obi_set_errno(OBI_MALLOC_ERROR);
obidebug(1, "\nError allocating memory");
return NULL;
}
while ((((long long unsigned int) (memory))%16) != 0)
{
memory++;
(*shift)++;
}
return (memory);
}

View File

@ -53,4 +53,25 @@ int count_dir(char* dir_path);
char* obi_format_date(time_t date);
/**
* @brief Allocates a chunk of memory aligned on 16 bytes boundary.
*
* @warning The pointer returned must be freed by the caller.
* @warning The memory chunk pointed at by the returned pointer can be
* smaller than the size asked for, since the address is shifted
* to be aligned.
*
* @param size The size in bytes of the memory chunk to be allocated.
* Ideally the closest multiple of 16 greater than the wanted size.
* @param shift A pointer on an integer corresponding to the shift made to align
* the address. Used to free the memory chunk.
*
* @returns The pointer on the aligned memory.
*
* @since May 2016
* @author Celine Mercier (celine.mercier@metabarcoding.org)
*/
void* obi_get_memory_aligned_on_16(int size, int* shift);
#endif /* UTILS_H_ */