Compare commits

...

712 Commits

Author SHA1 Message Date
7c2787b6b3 trying to fix cython difficulties 2019-03-26 16:19:55 +01:00
14eca43eac Import taxo 2019-03-26 16:17:44 +01:00
0b4ea49539 Convert relative import an delete cfiles 2019-03-26 16:14:03 +01:00
cd88c37a7e Merge branch 'pip-standard' of git@git.metabarcoding.org:obitools/obitools3.git into pip-standard 2019-03-26 15:54:49 +01:00
1095a617a3 Patch relative import to absolute 2019-03-26 15:54:33 +01:00
5a05258fcb fixed relative cython imports to be absolute 2019-03-26 15:52:59 +01:00
10ab557259 First version of the simplified setup.py script 2019-03-26 15:40:31 +01:00
06178d9d61 Genbank file parser functions that should have been included in a
previous commit
2019-03-20 11:44:43 +01:00
3abe1b7ace obi_errno_to_exception function now properly reads obi_errno global
variable directly
2019-03-20 11:43:12 +01:00
802a3f5933 data import: entries now counted if there are multiple files 2019-03-18 18:16:39 +01:00
7e20870719 Added genbank parser 2019-03-15 16:06:27 +01:00
e8090a44c9 Fixed the ultimate bug with embl (and genbank) parsers: raising any
exception in a python generator makes it unable to resume. So now,
exceptions are not raised but printed, then functions return None and
that's handled at higher level.
2019-03-15 16:06:06 +01:00
832f582802 Fixed no-skip-on-error option :p 2019-03-15 16:04:04 +01:00
58d0c850c2 Made skip on error option True by default...... 2019-03-15 15:50:40 +01:00
7737211ac2 Small fix in embl and genbank features parser 2019-03-15 15:50:11 +01:00
c953f0cb00 Fixed embl import where sequences were not imported as Nuc_Seq objects 2019-03-15 11:41:07 +01:00
bb045c3ae9 added TAXID_COLUMN to C API declarations for Cython 2019-03-15 11:40:06 +01:00
2a4f1b8feb obi import: now properly uses macros for column names 2019-03-15 11:39:21 +01:00
24a63f8732 URIs: URIs built with autocomplete now work too 2019-03-15 10:52:27 +01:00
478d19ab43 Cleaner stderr prints 2019-03-13 18:36:31 +01:00
e3c565d6be Cleaner progress bar 2019-03-13 18:36:05 +01:00
d88390c6d8 Cython API: when importing a file in a DMS, its length is computed
beforehand for the progress bar
2019-03-13 18:35:32 +01:00
50e7cd61a6 added math.h import where needed 2019-03-13 11:17:25 +01:00
49d5f6fb1e removed deprecated comment 2019-03-13 11:17:04 +01:00
b45c2ee653 Cython API: cleaner column rewriting API 2019-03-13 11:13:55 +01:00
6afd1294a7 Cython API: Views: fixed a bug when rewriting a column with different
attributes (last line is not written anymore)
2019-03-12 16:40:30 +01:00
a9ba7744cf obidistutils: added fPIC flag needed for linux compilation and set
minimum python version to 3.7
2019-03-12 14:20:59 +01:00
185a95e667 cleaner Makefile 2019-03-11 15:20:10 +01:00
8835a1a983 removed -R compilaiton flag that gcc doesn't like 2019-03-07 15:55:46 +01:00
1ee50b7222 Fixed a bug when creating a column and checking the comments string if
it was NULL
2019-03-07 15:09:59 +01:00
720bb65b24 Installation: basic Makefile that creates the shared obi3 library used
by Cython
2019-03-07 14:01:37 +01:00
2a1ab9db29 Cython API, Views: guessing an obitype from a python value is now done
through the corresponding functions in utils
2019-03-07 13:57:37 +01:00
4bc52c08c2 minor changes 2019-03-07 13:53:37 +01:00
306da846e3 obidistutils: link obi3 C shared library instead of compiling all C
files with all modules (creating issues with global/static variables).
EXCEPT RUNTIME LINKING DOESN'T WORK YET
2019-03-07 13:50:29 +01:00
af57e532da obidistutils: create doc/sphinx dir if needed 2019-03-07 13:47:23 +01:00
52de6f2717 Update distutils for openmp and new version of pip 2019-02-19 17:30:53 +01:00
29c56572cf Add cfiles everywhere ;-) 2019-02-19 15:04:30 +01:00
de3d12af17 Renamed CAPI file 2019-02-19 14:50:30 +01:00
9ccddd5280 better cfiles 2019-02-19 14:11:29 +01:00
e026e9ec83 Fixed the new alignpaired end to work after ngsfilter with the 9879847
possible cases
2019-02-17 18:32:35 +01:00
4ddd1a1c37 embl iterator: only option on embl directories now works as intended 2019-02-12 16:46:08 +01:00
3015310535 Fixed a bug in kmer similarity computation where the fact that sequences
could be switched was not accounted for
2019-02-10 21:02:24 +01:00
08bcbcd357 ngsfilter: reworked to use apat library 2019-02-06 18:13:54 +01:00
04a3682307 Cython API: added API to use apat (pattern search) C library 2019-02-06 18:12:49 +01:00
6ca6d27ecb ecoPCR: fixed amplicon length computation bug 2019-02-06 18:11:20 +01:00
8f18907566 Cython API: changed revcomp attribute of Nuc_Seq class to is_revcomp to
be more explicit
2019-02-06 18:09:11 +01:00
0b62619e4e Various commentaries and insignificant fixes 2019-01-21 17:32:44 +01:00
c7f5b8d980 Alignpairedend: added alignment using shifting with best kmer similarity
(low level layer in C and Cython API)
2019-01-21 17:30:46 +01:00
59017c0d6b C: taxonomy: fixed a bug when checking for root node 2019-01-21 17:23:25 +01:00
9f6bba183f C: Added a function to get a nucleotide at a specific index in an
encoded sequence
2019-01-21 17:18:02 +01:00
2a6a112d29 obi import: fixed writing quality in views when appropriate (but still
not a satisfying solution)
2018-12-11 19:33:55 +01:00
c437931a35 Cython: fixed history dot graph for all views, and fixed history
recording for build_ref_db and ecotag
2018-12-10 17:09:00 +01:00
eb586b2f53 New command and C functions: obi ecotag 2018-12-09 19:19:32 +01:00
9556130b11 C obi_lcs: updated deprecated column names and associated comments 2018-12-09 19:17:13 +01:00
005aaeec06 C obi_lcs: fixed checking for identical sequences when aligning 2
columns
2018-12-09 19:16:44 +01:00
579f56bb54 obi align (pouic): fixed bug with the saved config when aligning 2
different views
2018-12-09 19:15:58 +01:00
da445066f3 C alignment filter: added a check for sequences not being equal when the
threshold requires that they are
2018-12-09 19:14:51 +01:00
0a407436da C Views: made an error message more specific 2018-12-09 19:14:05 +01:00
54efff36c4 C build ref db: fixed 2 bugs when setting arrays: size of an element in
in bits not bytes and using view API instead of column API
2018-12-09 19:13:06 +01:00
6acb21712a Missing commit for build_ref_db: C API file for cython 2018-12-09 19:11:59 +01:00
12087a6c3a C, views: made 'view_exists' function public (now 'obi_view_exists') 2018-11-27 16:20:30 +01:00
fbabbceb5a Fixed a bug in the array indexer where the value's length was no
properly set to 0 if the value was NA (ignore previous commit with the
same message)
2018-11-27 16:18:34 +01:00
6f27734d71 Cython: fxed a bug in INT tuple columns where values were converted to
double instead of int
2018-11-27 16:14:56 +01:00
b3bfa9ca65 Fixed a bug in the array indexer where the value's length was not
properly set to 0 if the value was NA
2018-11-27 16:12:41 +01:00
ece942e771 new command: build_ref_db to build a reference database with metadata
for the taxonomic assignment of sequences
2018-11-27 16:11:18 +01:00
ef8dc85f3c C, taxonomy: new function to get the lowest common ancestor of two taxa 2018-11-27 16:00:29 +01:00
f942dd856f C: new function to build a reference database with LCA and score
metadata for the taxonomic assignment of sequences
2018-11-27 15:56:50 +01:00
730ea99f85 minor fixes and comments 2018-11-19 11:23:54 +01:00
4d51f4f015 obi import: better checking of whether to import quality 2018-11-19 11:23:39 +01:00
e9c1d5e48d AVLs: made maximum number of nodes per AVL 5 millions, as this combined
with keeping all AVLs mapped seems the most efficient. Now 1 million
sequences more or less constantly takes 1 minute.
2018-11-19 11:22:26 +01:00
7fc1b578cf AVLs: AVLs in a group are not unmapped and remapped constantly anymore
when adding new values, fixed a bug when calculating if an AVL data file
has reached the maximum size, fixed a casting bug, and added a boolean
so read-only AVLs files are not truncated
2018-11-19 11:19:07 +01:00
31053591b5 Fixed 2 bugs when checking qualities matching sequences predicate: now
closing and reopening indexers so that they are mapped properly, and
fixed memory leak when reading sequences
2018-11-19 11:05:53 +01:00
b0da36cb48 New command: obi align, except it's called obi pouic for now because of
a Cython compilation bug
2018-11-07 16:05:48 +01:00
d1f1fd432e Minor fixes 2018-11-07 16:04:17 +01:00
75a28929a7 Renamed Cython alignment library in an attempt to limit some compilation
bugs potentially involving name conflicts
2018-11-07 16:03:32 +01:00
d076ea9900 Alignment: updated functions to align columns (LCS) 2018-11-07 16:00:58 +01:00
6b1c41f3fb Changed an error message to be more specific 2018-11-07 13:37:25 +01:00
362df50fe9 Removed a deprecated element from the DMS structure 2018-11-07 13:36:08 +01:00
b1090574da View import: associated column informations are now correctly updated
with the new versions
2018-11-07 13:35:11 +01:00
8faabd3ebf Cython, URI: Fixed a bug when using an output URI with just a view name
to use the default DMS
2018-11-02 19:04:27 +01:00
35f3e7c30b All commands now handle outputing to another DMS + small fixes 2018-11-02 19:03:09 +01:00
8a8e9e50b2 Fixed declaration going with previous commit 2018-10-31 18:01:04 +01:00
c7ff53b948 obi clean: temporary views are now deleted 2018-10-31 17:52:51 +01:00
1b7bccb236 Small improvement when checking if a view exists 2018-10-31 17:51:10 +01:00
d09aa43133 Cython API: added a function to get the full path to the DMS directory 2018-10-31 14:46:25 +01:00
123e5dc0ac Cython URI API: added an argument to only open the DMS and return the
rest as a character string
2018-10-31 14:45:17 +01:00
320561a582 Views: Added argument to not automatically create default columns in
typed views, a function to delete a view and fixed view history bug
2018-10-31 14:38:05 +01:00
92c0fbc9bf Fixed a bug where an imported column was not flagged as finished,
resulting in its deletion when reopening the DMS.
2018-10-29 17:39:30 +01:00
b11d52d630 Fixed a bug with the DMS counter being wrongly initialized to 0 instead
of 1 (generating memory bugs when using the counter)
2018-10-29 16:12:37 +01:00
6305282305 obi clean: made more efficient with arrays (speed ~x15 compared with
OBI1)
2018-10-21 17:59:02 +02:00
d53323e7f4 Fixed comments bug with obi head and obi tail 2018-10-21 17:39:17 +02:00
e18b762d81 Weird buggy Eclipse commit with nothing changed 2018-10-21 17:35:18 +02:00
0a0f0682a9 Better handling of errors and exceptions when new view name already
exists
2018-10-17 19:47:40 +02:00
4802e32f72 Cython: Sequence objects: repr() method now returns a Fasta or Fastq
formatted string
2018-10-17 16:53:42 +02:00
b027762059 Cython: export: fixed exception raising when no quality data when
exporting to fastq
2018-10-17 16:52:51 +02:00
da0e3d4043 Cython: added full handling of NA strings when importing files 2018-10-17 16:41:15 +02:00
da76f911db Cython: Views: improved repr() method 2018-10-17 15:54:03 +02:00
61ad2deeca obi uniq: Added line breaks when printing informations to cut progress
bar properly
2018-10-17 15:53:28 +02:00
eb6d5581bd Cython: Progress bar: added a cut option to choose whether to do line
breaks every tenth of the full bar, set to False by default for lighter
printing
2018-10-17 15:52:26 +02:00
343dbc7e4d Cython: made the logger lighter (now prints just module name instead of
full module path)
2018-10-17 15:49:55 +02:00
6d018a2d28 Cython: Added 'modulename' in the config informations 2018-10-17 15:47:44 +02:00
2c2df4e098 C: Added a trick to suppress compilation warnings about an unused
function actually called in a macro
2018-10-17 13:13:23 +02:00
8ce6dd6d1a Updated prototypes with no arguments with a void argument as suggested
by compilation warnings
2018-10-17 12:00:40 +02:00
df70086384 New command: obi export 2018-10-17 11:27:50 +02:00
32d8396ee2 Cython: Added fasta and fastq writers 2018-10-17 11:27:15 +02:00
6a8670d24a Cython: minor fixes 2018-10-17 11:26:13 +02:00
ec73fa840a Cython: obi stats fixed to work with reworked options (forgotten in a
previous commit)
2018-10-17 11:25:53 +02:00
11032ec90b Cython: Sequence objects: Quality strings are now returned as bytes
instead of str
2018-10-17 11:24:44 +02:00
8a9ba8b0a8 Cython: Added Column line methods to get a Column line as a str or
bytes, and elements (keys, values) with None values are not returned
anymore
2018-10-17 11:23:07 +02:00
135d3b6e67 Cython: updated the URI decoding to handle outputs other than DMS 2018-10-17 11:21:29 +02:00
58589e04be Cython: rearranged input and output format options to have both and
updated commands accordingly
2018-10-17 11:19:48 +02:00
e6bbe13d81 Cython: fasta and fastq parsers now return bytes and take NA string
argument
2018-10-17 11:16:20 +02:00
61b00d6013 Cython: fastq formatter 2018-10-09 16:41:14 +02:00
8029493c10 Cython: fasta and fastq header formatter 2018-10-09 16:41:00 +02:00
aa5ee53478 Cython: fasta writer 2018-10-09 16:40:30 +02:00
e31c8ea57a New command: obi history to print DMS or view history in bash, dot or
ascii formats
2018-10-07 19:11:36 +02:00
9e700ddc21 obi test: updated to test comments 2018-10-07 19:10:46 +02:00
e9a41c5b97 Commands: updated for JSON formatted comments with history 2018-10-07 19:10:34 +02:00
35cf2962cc Cython: DMS: JSON formatted comments and history handling 2018-10-07 19:06:59 +02:00
74be3c39f0 Cython: Views: JSON formatted comments and history handling 2018-10-07 19:06:23 +02:00
c6ee0bade9 Cython: Columns: goes with handling of JSON formatted comments 2018-10-07 19:04:50 +02:00
ffd5bc76bf Cython utils: functions convert to bytes or str and to remove all empty
objects from a complex object
2018-10-07 19:03:38 +02:00
704d9b0474 Cython: Columns: added support for JSON formatted comments 2018-10-07 18:59:43 +02:00
86bb582a17 Views: implemented handling of JSON formatted comments 2018-10-07 18:56:46 +02:00
bc8c394061 Columns: implemented handling of JSON formatted comments 2018-10-07 18:54:51 +02:00
cef458f570 Obierrno: added errno for JSON related errors 2018-10-07 18:53:53 +02:00
2736a92699 DMS: implemented full information file with JSON formatted comments 2018-10-07 18:53:25 +02:00
79f4185757 C library to handle JSON formatted comments using the cJSON library 2018-10-07 18:51:27 +02:00
1b6b6d825a obi grep: added all the missing filtering options 2018-08-14 17:11:41 +02:00
3847850a9d Taxonomy Cython API: added is_ancestor() function 2018-08-14 17:09:40 +02:00
b57e938cc4 New command: obi stats 2018-08-13 15:08:10 +02:00
2dc7fcceac Minor fixes 2018-08-10 10:39:46 +02:00
e096b929dc New command: obi tail 2018-08-10 10:39:26 +02:00
2c634dae7c New command: obi head 2018-08-10 10:29:37 +02:00
7a4cdc0cfe New command: obi sort 2018-08-09 18:10:47 +02:00
e8dc5eb123 Commands: ngsfilter and alignpairedend can now be used in whichever
order
2018-08-08 19:53:26 +02:00
3fcf29a76f More explicit predicate error when checking that sequences and qualities
match
2018-08-08 19:51:05 +02:00
080a97cccf Cython API: more explicit "Can't guess type" exception 2018-08-08 19:50:26 +02:00
9c9aec2556 Cython API: the associated sequence column for a quality column can now
be specified at the Python level
2018-08-08 19:49:56 +02:00
303648bd47 Cython: embl file parser 2018-07-28 17:14:10 +02:00
2ba6d16147 New command: obi ecopcr 2018-07-28 17:13:45 +02:00
275d85dc5d Cython: fixed a bug when reading an uncompressed file in binary mode
where the first 4 characters would not be read
2018-07-28 17:11:51 +02:00
a39f9697be Views: added macro for taxid column name 2018-07-28 17:10:11 +02:00
b98880b7fa Various non-important fixes and comments 2018-07-28 17:07:17 +02:00
895d09b133 obi import: 'taxid' columns are imported as 'TAXID' to fit view
predicates, and fixed taxdump import and DMS closing
2018-07-28 17:03:00 +02:00
c02c15b93f Cython API: URI decoding now returns the character string with the
object path if it could not be opened
2018-07-28 17:00:42 +02:00
3e8c187f0b Cython API: added EMBL parser and files to import are now read in binary
mode
2018-07-28 16:57:01 +02:00
7f6d1597fc Taxonomy: added functions to check if a taxonomy already exists in a
DMS, and added taxdump import from a compressed file
2018-07-28 16:48:11 +02:00
1de308a856 obi clean: option to only keep heads now works, fixed a bug where last
sequence was not properly labelled, and code is cleaned, fixed and error
checked
2018-05-31 15:11:41 +02:00
892ed83a33 Removed deprecated function declarations 2018-05-31 15:08:11 +02:00
6911bf4d70 obi clean: first version 2018-05-18 14:26:54 +02:00
f0c147c252 C API: Added a function to set an entire column to a specified (atomic)
value.
2018-05-17 15:59:16 +02:00
4aef20add8 Fixed a bug where the line selection column of a view would not be
flagged as finished
2018-05-17 15:17:19 +02:00
62614a8538 Cython API: fixed a bug in URI decoding and option handling where the
quality offset would not be read properly
2018-05-17 15:10:52 +02:00
ffebc6acfb Cython API: better handling of default quality offset value 2018-05-17 15:01:25 +02:00
b91b3176b0 obi uniq: fixed a bug where merged values were wrongly reinitialized 2018-05-17 14:58:15 +02:00
31d8ba5085 obi test: minor change 2018-05-17 14:54:45 +02:00
a166a169cf obi ngsfilter: fixed a bug with -u option 2018-05-17 14:53:53 +02:00
8a10072d99 obi annotate: fixed a bug with --with-taxon-at-rank option and minor
improvements
2018-05-17 14:51:18 +02:00
b380368264 Obi count command 2018-04-04 15:51:23 +02:00
1f4e82e6f6 Fixed three bugs in obi uniq 2018-04-04 15:50:10 +02:00
6825fc13ab Cython API: added ngsfilter file parser 2018-03-21 16:41:25 +01:00
49c17ab7b4 Cython API: added tabular file parser 2018-03-21 16:41:09 +01:00
2684535e26 New command: obi annotate 2018-03-21 16:39:31 +01:00
123fb9d7ba Cython API: in taxonomy, added get_taxon_at_rank() function for Taxonomy
class and rank_idx property for Taxon class
2018-03-21 16:38:26 +01:00
4c3478d8f8 Removed the predicate to check for a quality column (because for example
with obi annotate, clone view so clone predicate, then modify seq, so
quality is deleted, and predicate becomes a problem)
2018-03-21 16:37:19 +01:00
4a815785c4 obi import: added basic taxdump import 2018-03-21 16:35:44 +01:00
75b54c83ca obi grep: fixed bug when reading URIs 2018-03-21 16:34:57 +01:00
53cb3354b8 obi ls command 2018-03-19 13:08:41 +01:00
ea58e254da Cython API: repr function for DMS 2018-03-19 13:08:06 +01:00
9fb63d4894 Minor fixes 2018-03-16 19:05:09 +01:00
d4f7e02c85 New obi grep working with URI API 2018-03-16 19:04:54 +01:00
15e43bb9a1 Cython API: obi import can now import ngsfilter files and tabular files 2018-03-12 18:10:43 +01:00
8a0b95c1d6 New command: obi ngsfilter 2018-03-12 18:09:22 +01:00
dd225a255f obi uniq: better error checking 2018-03-12 18:04:53 +01:00
dad21823ff Cython API: trying to guess the type of a column when adding a None
value does not generate an exception anymore, and RollbackException can
now rollback several views
2018-03-12 18:03:37 +01:00
96bf2daae8 Cython API: added slices in Seq classes and fixes 2018-03-12 17:51:41 +01:00
e6c49b7941 Cython API: moved an eval function to utils 2018-03-12 17:49:54 +01:00
4960662332 Cython API: tobytes() function now handles None values 2018-03-12 17:25:12 +01:00
b2cfa4b52f Cython Sequence classes: reworked improved etc 2018-02-12 14:54:47 +01:00
94a899de12 Cython View API: added small tools 2018-02-12 14:48:27 +01:00
b48330a5c9 Fixed a little bug when cleaning unfinished views 2018-02-12 14:44:56 +01:00
74d880b817 Fixed default quality offset 2018-02-12 14:43:44 +01:00
00993d4215 Cython API: fixed a bug where the quality format would not be read
properly from the configuration values
2018-02-12 14:42:30 +01:00
370fb9272c obi uniq: better typing 2018-02-12 14:38:07 +01:00
c8097e14e1 obi import: removed old traces 2018-02-12 14:36:56 +01:00
01ef85658c New command: obi alignpairedend 2018-02-12 13:30:06 +01:00
f5a00c9322 Cython alignment library 2018-02-12 13:28:20 +01:00
156fb04e88 Implemented functions to build reverse complement sequences 2018-01-05 16:08:36 +01:00
428c4eb5e6 obi import: fixed creation of quality columns (to discuss) 2017-12-19 11:07:00 +01:00
1a5b499b5c Cython API to add an OBI_QUAL column after creating a view 2017-12-19 11:06:24 +01:00
b7b8ba7e5a Better handling of elements names in Cython 2017-12-13 23:12:14 +01:00
e9e7fac999 New obi uniq: stores columns with too many elements per line as
character strings, and keeps a minimum of things in the memory
2017-12-13 22:49:08 +01:00
1fd3323372 Columns: elements names informations are now kept in a memory arena of
adapted size in the header, and added a boolean in the header indicating
whether the values should be evaluated (typically character strings to
be evaluated in Python)
2017-12-13 22:46:50 +01:00
2df5932b67 Cython column API: fixed a memory leak, optimized the reading of
elements names, added a __len__ method to Column_line, and the API for
columns with character strings to evaluate
2017-12-13 22:27:36 +01:00
b93b982a18 Cython: added an option for input taxdump and and an option for the
maximum number of elements in columns with multiple elements per line
2017-12-13 22:25:15 +01:00
ea73047fc7 Added rewinddir before each readdir so that the directories are always
read properly
2017-11-24 18:04:58 +01:00
0998268955 Fixed two little potential bugs when cleaning unfinished columns and
deleted old trace
2017-11-24 18:03:59 +01:00
31726407a3 Taxonomy: fixed a bug where a pointer was not properly reallocated, and
a bug where the merged list of taxids was not built correctly
2017-11-24 18:01:30 +01:00
d21f4a6f90 Header parser: identifiers ending with ';' are now handled 2017-11-24 17:59:52 +01:00
9e3ac477eb OBIDMS: Opened DMS now have a counter associated so that DMS are not
actually opened several times by the same program, which triggers the
cleaning of unfinished views and columns (to discuss)
2017-11-24 17:58:47 +01:00
ee5d647d0d Taxonomy: fixed a bug un parental tree iterator 2017-11-24 17:55:17 +01:00
38fef5b9d4 obi test: better taxonomy testing 2017-11-24 17:54:10 +01:00
3ba7ce1c91 View rollback: version files and column directories aren't deleted
anymore to prevent indexer bug, and fixed a freeing bug
2017-11-15 17:27:26 +01:00
9a50803c00 Added tuple columns containing immutable indexed data arrays of any type 2017-11-15 13:48:59 +01:00
1684f96b79 Fixed a bug when flagging a read-only column as finished 2017-10-26 19:11:29 +02:00
43f65e7fd0 obi uniq: fixed bug where dictionary indexes were not read properly, and
added view rollback in case of an exception.
2017-10-26 19:00:05 +02:00
dfd51939a0 Views are now rollbacked if an error occurs, and unfinished views and
columns are deleted when an OBIDMS is opened.
2017-10-26 18:58:48 +02:00
1ae634d56b Added atexit command to obi import, obi uniq and obi less 2017-10-16 11:09:55 +02:00
04e065094a All DMS opened by a program are now listed and closed with atexit system 2017-10-16 10:35:07 +02:00
5ddd1d9ae6 obi uniq: added taxonomy handling 2017-10-04 16:13:07 +02:00
9fc6868341 Increased maximum length for elements names 2017-10-04 16:10:53 +02:00
f2ece573ff Removed deprecated command 2017-10-04 16:09:41 +02:00
fb9b219abe Fixed a bug with taxonomy URIs not being read correctly 2017-10-04 16:00:30 +02:00
09a5f89849 Column API: improvements to be more flexible when referring to elements
in columns with several elements per line.
2017-10-04 15:59:23 +02:00
535692b020 Taxonomy: new functions and improvements 2017-10-04 15:55:13 +02:00
0ab081f79e Updated obi test to work with changes in taxonomy API 2017-10-04 15:50:32 +02:00
1cb05de7e3 Basic obi less 2017-10-04 15:46:26 +02:00
532d8e9cd7 obi import: small efficiency improvement when dealing with NA values 2017-10-04 15:44:48 +02:00
b4088a7928 Cython API: Added basic taxonomy option 2017-10-04 15:42:17 +02:00
ae24a807da obi uniq: added the option to merge ids, except it only works on small
sets until lists are implemented properly using obiblobs
2017-09-25 17:28:03 +02:00
75c15594c4 obi uniq: added option to use categories additionally to the sequence to
determine uniqueness
2017-09-25 10:56:43 +02:00
5ed6835e0e Fixed a bug where the new line count when truncating a column would not
be computed correctly when dealing with high numbers (bad automatic type
for intermediate result)
2017-09-25 10:52:19 +02:00
41dec03448 Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2017-09-18 16:08:31 +02:00
7c57bd33e5 Added check to prevent views from having the name 'taxonomy' (used for
URIs)
2017-09-15 14:54:55 +02:00
a776e46e6d Add the command name in the log 2017-09-15 14:51:13 +02:00
0e140df0fb Cython API: added some imports in __init__ files 2017-09-14 18:30:04 +02:00
4bb071c048 Merge branch 'master' of
git@git.metabarcoding.org:obitools/obitools3.git

Conflicts:
	python/obitools3/commands/import.pyx
2017-09-05 08:59:45 +02:00
5045d0c2e9 xxx 2017-09-05 08:58:07 +02:00
73bca6288f New obi uniq 2017-08-20 18:04:21 +02:00
6a2759eee6 obi import with new input/ouput API 2017-08-20 17:58:36 +02:00
38029b1f77 Forgot a ; 2017-08-20 17:56:18 +02:00
663a1a1091 Cython API: column elements: added possibility to check if an element
exists from its index, and a dict-like get() method
2017-08-20 17:44:05 +02:00
c6d5436a58 Cython API: fixed a bug where iteration on a NUC_SEQS view would not be
done correctly (bug appeared with optimization modifications done
lately)
2017-08-20 17:41:41 +02:00
47cad285d6 Cython API: fixed 2 little bugs in Seq API 2017-08-20 17:39:30 +02:00
74f15d1a23 Cython API: Various fixes in input handlers (parsers, openers etc).
Mostly working but not bug-free
2017-08-20 17:37:51 +02:00
c559ddf487 BUG FIX: creation of a new column would fail because of a case not
handled when a high number of elements per line would imply less than
one line per memory page
2017-08-20 17:30:23 +02:00
93cff94e7f Fixed some compilation warnings 2017-08-20 17:25:58 +02:00
9744a48a67 BUG FIX: seemingly identical obiblobs would have different hash values
because of the padding added by the compiler. Fixed by using calloc
instead of malloc for obiblob memory allocation.
2017-08-20 17:25:15 +02:00
6afdc9fb5f AVLs: Added an error check 2017-08-20 17:21:06 +02:00
6f202363f4 Fixed a typo in doc 2017-08-20 17:20:13 +02:00
7f1ff49aa2 Cython API to import a column and a view from a DMS to another DMS 2017-08-03 16:34:02 +02:00
4b86aa67a8 New C functions to import a column and a view from a DMS to another DMS 2017-08-03 16:33:12 +02:00
a3e81930c2 Views: finished handling and documenting the conditions for an existing
column to be added to a view
2017-08-03 16:32:22 +02:00
644b55b49f Fixed doc typo 2017-08-03 16:29:25 +02:00
927c684fc2 Utils: new function to copy the content of a file into another file 2017-08-03 16:28:54 +02:00
344566d9e9 AVLs: made some functions public and changed some rights to be able to
import AVLs from a DMS to another
2017-08-03 16:27:43 +02:00
407f61a408 Add the possibility to create temporary objects like a temporary
directory and a temporary DMS
2017-07-28 16:33:19 +02:00
09ddd74652 Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2017-07-28 15:57:01 +02:00
7c0d882bc9 Patch a bug when creating a DMS not in the current directory. Use the
basename function to locate the DMS name instead of the loop...
2017-07-28 15:56:21 +02:00
35b0c55a8c Cython API: various improvements and checks 2017-07-28 13:15:13 +02:00
b9c65a871f Patch decoding of URL 2017-07-28 12:41:28 +02:00
84bb93096f Cython API: fixes and improvements in Column API 2017-07-28 10:27:04 +02:00
01c69e7e25 Cython API: fixed a bug when printing a column 2017-07-28 10:01:56 +02:00
adf5cbef97 Added DMS method to create a DMS if it doesn't already exists, otherwise
opens it
2017-07-28 09:55:43 +02:00
da48a9d1af Patch group of option : types must be callable not a string 2017-07-28 09:36:18 +02:00
9482c663c0 minor comments and changes 2017-07-27 19:46:34 +02:00
c5f3fdc295 Increased maximum element names length in columns 2017-07-27 19:44:49 +02:00
89e2f80fd8 Goes with previous commit 2017-07-27 19:43:00 +02:00
7112f44fb7 Bug fixes for input handlers, openers, parsers etc. Compiling but not
tested
2017-07-27 19:42:44 +02:00
b2fc1f4611 obi uniq: first version 2017-07-27 19:40:19 +02:00
75f691d55a Cython API: Seq classes reworked 2017-07-27 19:39:58 +02:00
0655063bb0 Cython API: view_NUC_SEQS changes to go with previous commits 2017-07-27 19:39:26 +02:00
9701b1230c Cython API: OBIWrapper.new method is now OBIWrapper.new_wrapper to avoid
mismatching method definitions with subclasses
2017-07-27 19:38:25 +02:00
f8a4428674 Cython API: DMS test_open method doesn't raise an exception anymore 2017-07-27 19:36:28 +02:00
1a0f18a11a Cython API: added a __setitem__ method to the View class that can detect
if the item is a Line and create the corresponding columns if needed +
minor changes
2017-07-27 19:35:28 +02:00
3d7aa52c90 Cython API: Fixed a bug when setting NA values in Column_multi_elts, and
added some properties
2017-07-27 19:31:15 +02:00
69c50ff922 Cython API: added a Column subclass to allow direct access to indexes
for columns that store indexes referring to other data
2017-07-27 19:29:10 +02:00
c91969126b Cython C API declarations to go with previous commit 2017-07-27 19:26:59 +02:00
15d383fa8b Added possibility to specify the offset for encoding and decoding
sequence quality character strings
2017-07-27 19:24:41 +02:00
99ceed5fff Cython API: renamed OBI_Taxonomy to Taxonomy and OBI_Taxon to Taxon 2017-07-27 19:21:45 +02:00
fa8f826cdc Cleanup the end of the file 2017-07-27 16:07:39 +02:00
dc91174a5e Complete the input option group functions 2017-07-27 16:06:48 +02:00
ec65f00cf2 Complete the fasta iterator to manage new input options 2017-07-27 16:05:30 +02:00
8d9cdb4d03 Complete the fastq iterator to manage new input options 2017-07-27 16:05:17 +02:00
949e5f9baf Make a first full version of the URI decoder 2017-07-27 16:04:31 +02:00
3c6a05be54 Add option to the default config corresponding to the parsing of the
inputs
2017-07-27 16:03:47 +02:00
8781ecab1f Add a factory checking the file format and returning the correct
iterator. First version working only with fasta and fastq nucleic
formats
2017-07-27 16:02:52 +02:00
0f6ae7dfa6 Options stuff... ;-) 2017-07-25 13:07:03 +02:00
28259cd88b Beginning of URI decoder -- !!! NOT YET FULLY IMPLEMENTED !!! 2017-07-25 13:05:58 +02:00
b24be84b0a Add a first group of options 2017-07-25 11:14:30 +02:00
59dd0a8a8c Standardized and improved the API to create new columns, updated the doc 2017-07-18 17:34:32 +02:00
c88df2e12c First version of automatic ID and COUNT columns, to discuss (for now,
columns created when NUC_SEQ views are closed if the columns don't
already exist)
2017-07-17 17:31:09 +02:00
1e57bfacb4 Fixed some C documentation 2017-07-17 16:45:08 +02:00
3e6aecc635 Added a C function to add a COUNT column to a view with all lines set to
1
2017-07-11 16:44:23 +02:00
ced9a268a1 obi import: added an option to specify the NA value in the input file
(default is 'NA', same as in R's read.table function)
2017-07-11 12:10:33 +02:00
df2ad41150 Cython APi: Added a width property to views, corresponding to their
column count
2017-07-11 11:46:32 +02:00
f8895e879d Cython API: Added a function to get a column from its index in the view 2017-07-11 11:36:42 +02:00
b729b8928f obi less: fixed bug when the length of a view would be less than the
default number of lines printed
2017-07-10 17:04:02 +02:00
b6b95f26b6 obi import: Skipping sequences is now done through the iterators so that
sequences are not uselessly parsed
2017-07-10 17:02:30 +02:00
b94ec9557f Cython API: None values aren't inclued anymore in the dictionary
returned when getting a line from a column with multiple elements per
line, and reworked that function to be more optimized
2017-07-07 17:28:53 +02:00
143bddf1d1 Cython API: Added an __iter__ method to the class Column_line (iterating
on the elements names) (previously an iteration would work but with
unexpected results)
2017-07-07 15:41:10 +02:00
a718081ebd Bug with error handling: for now obi_errno needs to be passed to the
function handling errors and exceptions, as it can't read the right
value of the global obi_errno (Cython configuration problem?)
2017-07-07 15:36:11 +02:00
740d021276 obi import: fixed bugs when rewriting a column: a bug with new elements
names ignoring previous elements names found, a bug with the global
obi_errno being reset too late, and a bug with the column dictionary
used by obi import not being updated after rewriting a column
2017-07-07 15:33:43 +02:00
906343187b Fixed bug with view option in obi less and obi check 2017-07-06 16:42:27 +02:00
c3cd57a9e3 Removed deprecated file 2017-07-06 10:57:14 +02:00
f03928c679 Committing minor comments before merging branch with master 2017-07-06 10:56:39 +02:00
717ee46f08 Commented a loose print 2017-07-05 18:02:18 +02:00
313508cc94 Better *Seq* classes but still need work 2017-07-05 17:53:46 +02:00
535fc2af83 Column rewriter and optimized View getter 2017-07-05 17:49:05 +02:00
3bbc2ae469 More optimized Column item getter 2017-07-05 17:37:19 +02:00
5ee0b3989a Cython API: set_line of Column_multi_elts now accept as values argument
any class where values are referenced by keys with an iterator
2017-07-05 17:32:32 +02:00
d10192ab0e C functions to detect IUPAC sequences 2017-07-05 17:26:03 +02:00
101f764cce New obi import with rewriting of columns when column type or line
elements (keys) change
2017-07-05 17:15:23 +02:00
cb5ad2ed2d Added functions to try to open a DMS if it exists 2017-07-05 15:38:22 +02:00
f5e992abbf Added a check on the element when setting a value in a column 2017-07-05 14:49:20 +02:00
1d2996c6c0 Better handling and tracing of Index Errors between C and Cython 2017-07-05 14:45:43 +02:00
f6631f3857 Removed deprecated declarations 2017-07-05 14:42:21 +02:00
3f5fef10b9 obi test: minor changes 2017-07-05 14:37:27 +02:00
20c72af697 Basic obi check command to check DMS and view informations 2017-07-05 13:54:19 +02:00
d252131950 Basic obi less command 2017-07-05 13:44:12 +02:00
ca16ce0bb0 Basic obi grep with new Cython API 2017-07-05 11:58:10 +02:00
ac94b35336 Removed unused import 2017-07-05 11:52:31 +02:00
2d65db4ebc Goes with c2af955b : forgotten files for NUC_SEQS views 2017-04-21 15:15:12 +02:00
4b037ae236 Updated obi test to test NUC_SEQS views and the taxonomy API 2017-04-21 12:09:04 +02:00
c2af955b78 Cython view API: added NUC_SEQS views and sequence classes + changed
cloning API
2017-04-21 12:08:14 +02:00
71b1a43df8 Added functions to clone views with a simpler API 2017-04-21 11:58:15 +02:00
1725b8b80c Reworked taxonomy Cython API to be a subclass of OBIWrapper 2017-04-21 11:54:05 +02:00
ab0d08293e Cython API: removed unnecessary imports 2017-04-21 11:51:05 +02:00
2f0c4b90d7 Fixed a problem where a view would have a wrong line count after adding
a first column to it if there was already a Line selection associated
(happening when cloning), and fixed a bad error check.
2017-04-14 16:25:55 +02:00
537b9847da Minor C doc clarification 2017-04-14 16:23:17 +02:00
b998373be5 Cython API: updated the test command for the new API and deactivated the
other commands for now
2017-04-14 16:21:33 +02:00
6f780148e2 Cython API: added taxonomy API 2017-04-14 16:20:30 +02:00
0e08fc486a Cython API: fixed bug when deleting a column from a view where the
Cython wrapper wasn't closed, and fixed the Line selection
materialization
2017-04-14 16:19:18 +02:00
2bbee64e57 Cython API: fixed problems with Column class 2017-04-14 16:14:41 +02:00
693859eec2 Cython API: fixed conversion bugs when setting and getting values
(especially NA values) in OBI_CHAR, OBI_STR and OBI_SEQ columns
2017-04-14 16:07:23 +02:00
a3fad27190 Cython API: automatic importing of column classes now works 2017-04-06 15:45:02 +02:00
f351540b0b Merge branch 'Eric_new_Python_API' of git@git.metabarcoding.org:obitools/obitools3.git into Eric_new_Python_API 2017-04-06 15:39:52 +02:00
6dccaa0213 Patch the registering function : register_all_column_classes 2017-04-06 15:37:51 +02:00
5de9e0de51 Cython API: now using const char* instead of char* for the type of
values read from OBI_STR columns
2017-04-06 15:15:20 +02:00
ad8de80353 Views: better checks when adding an existing column to a view 2017-04-06 14:44:07 +02:00
8cd3e3604f Cython Column API 2017-04-06 14:42:11 +02:00
255f3c92ae Cython View API 2017-04-06 14:41:58 +02:00
08be4e231d Cython Object API 2017-04-06 14:41:43 +02:00
b5b7995411 new Cython DMS API 2017-04-06 14:41:26 +02:00
0dfb1eb3e6 Cython typed columns 2017-04-06 14:40:44 +02:00
381194194c Cython API: compiling but not working 2017-03-06 16:07:02 +01:00
778acc48cd Added linked lists to handle lists of column pointers in views (not
tested)
2017-03-06 16:06:17 +01:00
3319ede837 Views: Column dictionaries now store and return pointers on column
pointers instead of column pointers.
2017-02-22 13:49:50 +01:00
fc20b83ad1 Merging 2017-02-20 14:56:04 +01:00
431c1c8c6a Merge branch 'Eric_new_Python_API' of
git@git.metabarcoding.org:obitools/obitools3.git into
Eric_new_Python_API

Conflicts:
	python/obitools3/obidms/_obidms.pxd
	python/obitools3/obidms/_obidms.pyx
	python/obitools3/obidms/_obidmscolumn_bool.pyx
	python/obitools3/obidms/_obidmscolumn_str.pyx
	python/obitools3/obidms/_obiseq.pxd
	python/obitools3/obidms/_obiseq.pyx
	python/obitools3/obidms/_obitaxo.pxd
	python/obitools3/obidms/_obitaxo.pyx
	python/obitools3/obidms/_obiview.pxd
	python/obitools3/obidms/_obiview.pyx
	python/obitools3/obidms/_obiview_nuc_seq.pxd
	python/obitools3/obidms/_obiview_nuc_seq.pyx
	python/obitools3/obidms/_obiview_nuc_seq_qual.pxd
	python/obitools3/obidms/_obiview_nuc_seq_qual.pyx
	python/obitools3/obidms/capi/obialign.pxd
	python/obitools3/obidms/capi/obidmscolumn.pxd
	python/obitools3/obidms/capi/obitaxonomy.pxd
	python/obitools3/obidms/capi/obiview.pxd
2017-02-20 14:55:36 +01:00
f23315e26f New Cython API: compile but doesn't work 2017-02-17 15:14:06 +01:00
071a3b61ab Merged master fixed conflict. 2017-02-14 10:58:43 +01:00
e524041013 Views: Files for unfinished views now have the extension
'.obiview_unfinished', renamed to '.obiview' when the view is finished.
2017-02-07 17:16:09 +01:00
a9102620f5 Fixed missing email address 2017-02-07 17:14:10 +01:00
7e9932f488 Fixed a C function declaration 2017-02-07 17:12:56 +01:00
e50da64ea1 The elements names when a column contains several elements per line are
now formatted with '\0' as separator and handled in a more optimized way
2017-01-31 16:48:06 +01:00
651c1d7845 utilities: bsearch and qsort with additional user_data pointer argument 2017-01-31 16:45:47 +01:00
c0bcdce724 Taxonomy: documentation for all the functions, and fixed bugs when
closing the taxonomy (overwriting of .pdx files, missing freeing, and
re-placed a misplaced condition)
2017-01-18 18:22:49 +01:00
c065c1914a Taxonomy: adding, writing and reading preferred names, changed some
function names, and fixed a bug with taxa indices not being properly
initialized
2017-01-16 17:28:20 +01:00
0385a92e02 Taxonomy: Refactored the taxdump reading, and little fixes 2017-01-11 16:36:08 +01:00
cf7f2de016 Modify __init__ and close method to deal with registration process 2017-01-10 14:26:16 +01:00
5122ad52a7 Merge branch 'Eric_new_Python_API' of git@git.metabarcoding.org:obitools/obitools3.git into Eric_new_Python_API 2017-01-10 14:07:50 +01:00
4b02ba73ac Add the OBIObject concept 2017-01-10 14:07:10 +01:00
41ad3deec0 Taxonomy: informations about deleted taxids is now read from
delnodes.dmp file and added to *.adx file
2017-01-09 17:28:49 +01:00
d68374018b Taxonomy: functions to read the *.adx file (containing the deprecated
and current taxids and their corresponding indices in the taxa
structure) and to find the taxa using the merged index.
2017-01-06 15:52:21 +01:00
f396625f98 Taxonomy: function to write *.adx files 2017-01-05 15:37:13 +01:00
897032387f Taxonomy: reading merged.dmp file in taxdump 2017-01-05 14:28:36 +01:00
4a1d3167a7 Last change on my branch 2017-01-02 16:46:52 +01:00
153c22257f Last change on my branch 2017-01-02 16:46:17 +01:00
2139bfc748 refactoring... 2017-01-02 13:05:22 +01:00
65f3b16e6d Refactoring ... 2016-12-29 18:22:05 +01:00
0526386337 first working DMS class 2016-12-27 06:17:45 +01:00
62caf1346e temporary remove some files 2016-12-26 15:03:24 +01:00
3ac6e85fb3 Big refactoring 4 2016-12-26 14:58:03 +01:00
5156f6bb9e Big refactoring 3 2016-12-26 14:18:01 +01:00
e6db2086d5 Big refactoring 2 2016-12-26 13:56:31 +01:00
daacd0df76 Strong refactoring 1 2016-12-26 13:35:31 +01:00
8e92bf6dac LCS alignment: it is now checked that sequences are not longer than what
a 16 bits integer can code for (as the LCS and alignment lengths are
kept in 16 bits registers)
2016-12-22 17:06:23 +01:00
30e4359c85 LCS alignment: documentation for all the lowest level functions 2016-12-22 17:03:51 +01:00
5c50e5b378 Embryo of code for openMP parallelization of LCS alignment but
deactivated for now because can't make it compile with cython/clang
2016-12-20 11:46:58 +01:00
3cedd00d7f Add register function for column type 2016-12-20 11:13:57 +01:00
82fbe43980 transfert method to obiviews 2016-12-20 08:18:47 +01:00
d1a972dfcb patch import 2016-12-20 08:15:42 +01:00
f43dc3e3ab separate the obicolumn classes in new files 2016-12-20 08:15:08 +01:00
9c71b06117 Removed deprecated TODOs 2016-12-19 14:36:40 +01:00
3bf5260174 Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2016-12-19 10:31:18 +01:00
857a5198e4 Updated `obi lcs` for the LCS alignment of two columns 2016-12-16 19:40:36 +01:00
d99447c12b C function for LCS alignment of two columns, and optimized and fixed
line count bug in function to align one column
2016-12-16 19:39:02 +01:00
303bd6f445 Added function to build kmer table for 2 columns, and fixed bug (with
line count) when building kmer table of one column
2016-12-16 19:10:18 +01:00
490f5fe6b9 Updated deprecated code in cython API for columns (using line count of
view instead of column)
2016-12-16 19:04:21 +01:00
191c83aafc Added missing *.cfiles 2016-12-15 15:28:34 +01:00
04d39c62ab Try for a new API 2016-12-14 08:44:44 +01:00
9b24818fe2 Refactored alignment code for minimum redundancy between the function
that aligns 1 column and the function that aligns 2 columns
2016-12-13 17:18:12 +01:00
06cb7a9a58 Some change in the way to manage access to special items of the
dictionary like sequence or quality
2016-12-13 12:49:34 +01:00
fc55fc117d Some cosmetic on the code 2016-12-13 12:48:13 +01:00
4ef5cb0d87 Move the OBIView_NUC_SEQS class to files _obiview_nuc_seq.pxd and
_obiview_nuc_seq.pyx to avoid circular inclusion
2016-12-13 12:46:49 +01:00
fc805e5443 Remove some warnings in the editor 2016-12-13 08:29:22 +01:00
8d7ef7d3d1 patch the distutils to add the C source directory in the include path.
This should solve most of the compilation problems related to .h files
located in this directory
2016-12-13 08:02:09 +01:00
8afb1644e9 Alignment: API rework. 'obi align' is now 'obi lcs', and the results are
now written to columns automatically created in the output view, all
optimally handled at the C level.
2016-12-12 11:58:59 +01:00
fa4e4ffaff Changed the cython API to create new views so as to have different
functions for the different cases
2016-12-07 14:17:57 +01:00
936be64c34 Goes with 5e0c9f87 (missing ';' and fixed compilation warnings) 2016-12-05 11:18:29 +01:00
5e0c9f878b Added the doc for the function building the element names, and a missing
free
2016-12-05 10:46:21 +01:00
852e5488c8 The default element names for columns with multiple elements per line
are now "O;1;2;...;n"
2016-12-02 17:54:51 +01:00
e60497651c Updated the documentation for the functions to set and get in the
context of a view
2016-11-30 12:22:47 +01:00
4ad8c16a73 Finished adding all the functions to directly set and get indices in
columns containing indices referring to any type of data.
2016-11-30 11:08:11 +01:00
6f6099687d Sequence alignment: if no sequence column is given and the view has the
type NUC_SEQS_VIEW, the default sequence column is aligned
2016-11-29 16:52:41 +01:00
98d0849653 Sequence alignment: added the possibility to specify the index of the
sequences to align in a column containing multiple sequences per line (C
level for now)
2016-11-29 16:15:02 +01:00
5fb025f310 When aligning, it is now quickly checked whether the sequences are
identical using their indexes
2016-11-28 11:39:29 +01:00
8ce6f6c80b Added an argument to specify whether the two sequences can be identical
when applying filters before aligning
2016-11-28 11:38:02 +01:00
3e53f9418b Added functions to recover the indexes themselves from any column
referring to indexed values
2016-11-28 11:35:19 +01:00
d40d2d0c76 Fixed error in documentation 2016-11-28 10:55:23 +01:00
f897e87600 When closing a view, it is now automatically checked that all OBI_QUAL
columns correspond to their associated OBI_SEQ column
2016-11-25 12:04:57 +01:00
70e056a2aa It is now impossible to open or clone a view that is not finished (= has
been closed at least once)
2016-11-24 11:19:07 +01:00
8abbfa203a Good file for commit 6fa9a8bd: When a view is cloned, a comment is added
to the new view specifying the name of the cloned view
2016-11-23 11:32:39 +01:00
6fa9a8bd76 When a view is cloned, a comment is added to the new view specifying the
name of the cloned view
2016-11-23 11:29:21 +01:00
76a4c6b14e Fixed a bug when cloning a view and checking its type 2016-11-23 11:28:17 +01:00
0ab9e6c05a When adding an existing column to a view, it is checked that the
column's line count is at least the view's line count. This can't be
more stringent for reasons that need to be rediscussed
2016-11-23 11:04:53 +01:00
70c49e214a Added the kmer filter to LCS alignments, and now obiblobs containing
encoded sequences are directly put in int16_t arrays for the alignment
2016-11-18 16:29:28 +01:00
08e67a090f Changed the inline functions syntax, which should make it compatible
with more compilers
2016-11-18 16:21:26 +01:00
621b4972db Functions to get obiblobs through views 2016-11-18 15:59:50 +01:00
7d022c1a52 If the indexer name is NULL when creating a column, it now becomes the
column name
2016-11-18 15:56:51 +01:00
1c71c195fc Goes with a0ebc2d8 2016-11-10 15:01:29 +01:00
54cfeffd85 Goes with 8f724f4f, forgotten file 2016-11-10 14:48:31 +01:00
a0ebc2d871 Functions to directly retrieve Obiblobs from indexers 2016-11-10 14:45:28 +01:00
8f724f4f8e Some code refactoring 2016-11-09 16:48:00 +01:00
359578814b Added view type property to OBIView cython class and updated obi export
to use it
2016-11-08 17:49:59 +01:00
51b23915ca Added properties for Nuc_Seq cython classes (and updated commands using
them)
2016-11-08 16:59:32 +01:00
b5b889c4a2 Fixed the OBI_Nuc_Seq_Stored cython class not being up to date with the
new properties of its parent class
2016-11-08 11:26:37 +01:00
36ac315125 Fixed bugs with python view type when creating a new view, and a bug
when trying to guess the obi type of a nucleotide sequence when its type
was bytes
2016-11-08 11:23:54 +01:00
8291693309 obi grep: updated to work with the new line selection class and within
the local sequence environment, and progress bar functioning
2016-11-08 11:19:12 +01:00
4bc19c3e49 obi export: view type is now checked and progress bar functioning 2016-11-08 11:17:20 +01:00
2d2fe5279d Added functions to add new taxa to a taxonomy with handling of
associated *.ldx files
2016-11-03 17:59:21 +01:00
2504bf0fa9 Added an iterator to the OBI_Taxonomy cython class 2016-11-02 11:08:18 +01:00
d8a257e711 Taxonomy handling functions in C. Features: read taxdump, read binary
files, write binary files. Not fully handled yet: *.adx, *.pdx, *.ldx,
merged.dmp and delnodes.dmp files.
2016-10-27 18:56:11 +02:00
b63d0fb9fb Added C functions to write .rdx, .tdx, .ndx binary taxonomy files from a
taxonomy C structure
2016-10-14 17:03:10 +02:00
0dfd67ec89 The endianness of binary taxonomy files is now correctly checked 2016-10-10 17:04:29 +02:00
0faaac49cf The taxonomy directory of the DMS is now automatically created with the
DMS
2016-10-10 17:02:51 +02:00
1b07109e51 Removed deprecated code 2016-10-10 17:01:51 +02:00
60ab503a14 Added properties in the OBI_Taxonomy class 2016-10-10 17:01:17 +02:00
2dcfdc59fc When a new view is created with a line selection, the view to clone is
automatically found + compacted redundant code + fixed potential bug
when cloning a NUC_SEQS view by name
2016-10-06 17:55:18 +02:00
399fc2c051 Removed deprecated source files previously used for tests 2016-09-30 17:49:37 +02:00
9cd57deca9 Added OBIView_line_selection class to make new line selections
associated with the view to clone, and improved and renamed method
closing a view
2016-09-30 17:48:53 +02:00
d88811ed7d Added a seed option to the obi test command for reproducible tests 2016-09-29 17:34:48 +02:00
8c402101e4 Renamed private attributes as _* and removed some deprecated code 2016-09-28 16:56:44 +02:00
1a7b42018e Added some error checking when opening or creating a view 2016-09-28 14:28:34 +02:00
b717e8bb8b Added properties for the OBIView class and cleaned up deprecated code 2016-09-28 14:26:23 +02:00
03a2c8ef7c Finished restructuring the OBIDMS_column class properties 2016-09-27 14:16:30 +02:00
a7f891d1c9 Added a lines_used property to the OBIDS_column class 2016-09-26 18:04:28 +02:00
bd50b3f972 Added version property to OBIDMS_column class 2016-09-26 17:45:10 +02:00
81380363b7 Added original_name property to OBIDMS_column class 2016-09-26 17:31:32 +02:00
a4b8349274 Added data_type property to OBIDMS_column class 2016-09-26 17:12:20 +02:00
a474391b27 Added nb_elements_per_line property to OBIDMS_column class 2016-09-26 17:01:13 +02:00
a0bc45cc92 Added elements_names property to OBIDMS_column class 2016-09-26 16:53:16 +02:00
76f89717fe Added alias property to OBIDMS_column cython class 2016-09-26 16:12:48 +02:00
b408a4f6eb Changed file name limits to adapt to system limits + minor changes 2016-09-22 18:05:07 +02:00
b083745f56 Deleted the "new line selection while editing a view" system 2016-09-22 11:19:29 +02:00
43f3c69a40 Fixed bug when cloning column with line selection 2016-09-21 17:50:21 +02:00
e79507b629 Fixed bugs in the process ensuring that all the columns of a view have
the same line count, fixed a bug when trying to set a value in a view
when a line selection exists, fixed a bug when adding a new column to a
view where line counts would be wrong
2016-09-21 17:42:17 +02:00
bb25723d99 Improved documentation of a function 2016-09-21 17:30:39 +02:00
a0da984003 Fixed bug where columns would not get truncated to the right size, and
fixed bug where column directories would be open and not closed in some
instances
2016-09-21 17:28:52 +02:00
802bae110b Removed deprecated function 2016-09-21 17:09:59 +02:00
dd55aef3e5 Added column class method to get the unique references (name and
version) of a column
2016-09-21 17:08:44 +02:00
9ac522fde1 Better obi test command 2016-09-21 17:06:35 +02:00
6adb9eb623 Should solde issue #56 2016-09-19 21:40:40 +02:00
8f49553d5a First version of the obi test command, testing that the OBITools3 work
correctly
2016-09-15 12:26:07 +02:00
986f90c59e Fixed bug where column directories weren't closed correctly, leading to
too many file descriptors open, and added error checking when closing
file descriptors
2016-09-15 12:18:40 +02:00
a240ec0169 Added error checking when closing file descriptors 2016-09-15 11:58:56 +02:00
0a3c23d9d0 Added a missing closedir 2016-09-15 11:58:34 +02:00
8724445fa1 Added error checking when closing files 2016-09-15 11:50:30 +02:00
de189fd7e0 Fixed major bug when cloning an AVL where the bloom filter was not
copied properly (because the sutructure copy via assignation does not
work for structures with a variable size)
2016-09-15 11:47:02 +02:00
9a97f1f633 View predicates are now carried over when cloning a view 2016-09-06 16:22:24 +02:00
00014eb023 View files now have the *.obiview extension 2016-09-06 14:19:13 +02:00
acc0da2d0b Readjusted some limits for file names and file numbers to be under OS
limits
2016-09-05 12:39:04 +02:00
668696fc5a Fixed major bug: when setting all the columns of a view to the same
number of lines, columns are now cloned before being enlarged if needed
+ predicate functions now print error messages if the predicates are not
respected
2016-09-05 12:37:36 +02:00
ba84ef4847 Fixed typo 2016-09-05 12:31:06 +02:00
c9dce03295 Fixed major bug when cloning an AVL group (last AVL of new group was not
correctly enlarged before copying the data) + minor improvements
2016-09-05 12:29:52 +02:00
eb82d088cb Added some view class methods 2016-09-05 12:20:00 +02:00
f46ea0b988 Finished fixing issues with DMS paths 2016-08-30 11:09:45 +02:00
5b2e370ffb Fixed a bug when using an absolute path for a DMS 2016-08-29 17:30:31 +02:00
8d360b0fac Minor improvements to obi export command 2016-08-19 17:49:22 +02:00
b34769b27c Minor improvements to obi export command 2016-08-19 17:46:55 +02:00
2d0a714e37 Basic obi export command exporting from view to fasta or fastq format,
for testing purposes
2016-08-19 17:40:58 +02:00
7b780ffb28 View files now have a dynamic size to allow unlimited comments size 2016-08-18 17:57:03 +02:00
e4129610cf Quality columns are now optional in NUC_SEQS views + minor fixes 2016-08-16 15:17:26 +02:00
cf839522e7 Minor update and fix to obi grep command 2016-08-12 17:45:44 +02:00
10b22f79da The cython subclass is now correctly chosen when cloning a view 2016-08-12 17:39:19 +02:00
ad8e10f2d1 Reworked a bit alignment API 2016-08-12 15:56:07 +02:00
92cad61417 Fixed bug when closing views with no associated predicate 2016-08-12 15:52:38 +02:00
64a745ce0b First very basic version of obi grep command 2016-08-11 17:32:08 +02:00
2d8ac2b035 Fixed bug when creating an OBI_IDX column 2016-08-11 17:30:32 +02:00
5b7917bb5a Fixed bug when writing predicates in view file 2016-08-11 17:30:09 +02:00
d3c58780a0 Added __len__ function do OBIViews that returns the line count 2016-08-10 17:20:23 +02:00
029d395da1 Added __iter__ function to OBIView lines 2016-08-10 17:08:22 +02:00
bea02cc7a5 Added (temporary?) check for the type of quality strings because the
import now seems to return them with bytes type
2016-08-10 16:25:45 +02:00
4ba01617af Fixed obscure compilation bug 2016-08-10 15:26:40 +02:00
bec684d5e2 Fixed merge conflict 2016-08-10 15:05:37 +02:00
2aaa87edcc 1st version of obi align command and reworked functions that handle
column alignment
2016-08-10 14:51:02 +02:00
400a3f9f3d Merge branch 'Eric_version_for_sequence'
Conflicts:
	python/obitools3/obidms/_obidmscolumn_seq.pyx
2016-08-04 09:42:42 +02:00
d1d26b9028 Simplify the code 2016-08-04 08:00:54 +02:00
465ea81c77 Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2016-08-03 10:13:47 +02:00
1e6d6e32e0 Switch to Cython version >= 0.24 2016-08-03 10:13:10 +02:00
ccc877764e Patch a bug in the printing of the progress bar leading to a bus error
when compiled with some C compilers and Cython >= 0.24
2016-08-03 10:12:23 +02:00
8f0462c407 Merge branch 'master' into Eric_version_for_sequence
Conflicts:
	python/obitools3/obidms/_obidmscolumn_seq.pyx
2016-08-03 10:09:20 +02:00
26b8e1f215 Modified C API to set and get in columns: added functions to set and get
using column names instead of pointers, and changed function names
2016-08-02 16:33:19 +02:00
312f50ff0f Major update: Column aliases. Columns are now identified in the context
of a view by an alias that can be modified.
2016-08-01 18:25:30 +02:00
3843485a04 Deleted deprecated function declaration that would make compilation
impossible and fixed error in documentation
2016-07-22 16:21:02 +02:00
20425a5d2b Deleted deprecated structure declarations 2016-07-19 15:48:56 +02:00
56e4848ebd The predicates associated with a view are now described in its comments
field
2016-07-19 15:31:21 +02:00
8850e40b6e Minor changes for better presentation 2016-07-19 15:30:17 +02:00
b89af38109 Goes with 38718320 2016-07-18 13:57:49 +02:00
38718320f9 First version for the association of one column to another. Closes #55 2016-07-15 15:38:49 +02:00
8ee85c3005 A first version of predicate functions that are checked when a new view
is saved and closed
2016-07-12 14:54:11 +02:00
000b9999ad Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2016-07-03 09:22:22 +02:00
aff9831c13 Substitute fprintf call by fputs call to conform with the new ubuntu
compilation rules
2016-07-03 09:21:56 +02:00
448fa8d325 first trial for a fasta formater 2016-07-03 09:18:52 +02:00
6af62d8124 Change a fprintf without argument to a fputs to comply with the new
default parameter on ubuntu
2016-07-03 08:25:06 +02:00
0869b9ba3f Closes issue #47 by storing each view in a separate file named with the
view's name and created upon view creation.
2016-06-30 11:41:30 +02:00
ad2af0b512 Some comments updated 2016-06-16 11:26:54 +02:00
38e603ed57 Deleted some redundant cython code 2016-06-10 10:34:47 +02:00
f438c3d913 OBIQUAL columns can now handle multiple elements per line 2016-06-09 15:54:36 +02:00
2a1ea3ba3f Setting NA values is now handled properly for OBI_SEQ, OBI_STR and
OBI_QUAL columns
2016-06-09 14:22:36 +02:00
fc3641d7ff Read-only AVLs are now hard-linked instead of copied when cloning an AVL
group to make it writable. Also fixed several bugs when handling AVL
groups.
2016-06-03 19:02:46 +02:00
799b942017 Deleted old debugging print 2016-06-03 18:57:32 +02:00
6e3f5b230e Fixed typo in doc 2016-06-03 18:56:45 +02:00
2f57f80c63 Fixed a bug where an unmapped variable would be read 2016-06-03 18:55:58 +02:00
2962c4d250 Goes with previous commit 2016-06-03 18:54:25 +02:00
69bf7ec2e7 NA value for OBI_STR and OBI_SEQ columns is now NULL 2016-06-03 18:53:22 +02:00
bac7ce7184 Start of the implementation of the export methods 2016-06-02 19:10:33 +02:00
f186395661 Trap potential exception generated by char* to bytes casts 2016-05-29 21:18:20 +02:00
85395dfc1a value returned for sequence is now bytes and no more str 2016-05-29 13:53:32 +02:00
f830389974 Add some comment on the location of the align method. 2016-05-29 12:58:31 +02:00
2e35229357 Add conversion checking on the value of a seq column 2016-05-29 12:54:13 +02:00
a8ed57dc6e few small changes 2016-05-21 12:29:55 +02:00
c3274d419c remove an extra debug log 2016-05-21 12:29:08 +02:00
cca0dbb46b Close issue #54 by adding a read1 method to the MagicKeyFile class 2016-05-21 12:24:48 +02:00
5a78157112 increase parsing speed of the header 2016-05-21 10:29:11 +02:00
0b9a41d952 Patch a bug about the reading of the last sequence 2016-05-21 10:28:03 +02:00
e681ca646d Fixed a problem with some columns being shorter in views and triggering
errors when trying to get values. Temporary fix that needs discussion
2016-05-20 18:45:29 +02:00
3b59043ea8 Major update: New column type to store sequence qualities. Closes #41 2016-05-20 16:45:22 +02:00
ffff91e76c Fixed variable name that had been accidentally changed for better
clarity
2016-05-18 13:27:41 +02:00
6a8df069ad Indexers are now cloned if needed to modify them after they've been
closed. Obligatory indexers' names now follow the same pattern as other
indexers (columnname_version). Closes #46 and #49.
2016-05-18 13:23:48 +02:00
8ae7644945 First version of quality handling (not working yet) and now it is
checked that a column is writable before enlarging it
2016-05-11 16:38:14 +02:00
b3c47809da First version of alignment functions (imported from suma* programs) 2016-05-11 16:36:23 +02:00
3567681339 Now when a column is added to a view, if there is a line selection, all
columns in the view are cloned first
2016-05-11 16:34:20 +02:00
757ef8509a Deleting CeCILL license duplicates 2016-05-09 11:17:45 +02:00
f961621f5d Minor improvements in _obidms Cython layer 2016-05-04 13:43:26 +02:00
bc12360490 Reworked and commented a bit the cython layer for dms, columns and views 2016-05-02 15:16:06 +02:00
872071b104 Removed a list of column pointers kept in the OBIView class that was not
really needed
2016-05-02 14:23:42 +02:00
32cc8968e8 Adding CeCILL license 2016-05-02 11:51:59 +02:00
d6481f0db8 Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2016-04-29 17:46:59 +02:00
a32920e401 Relative paths when creating or opening a DMS now work 2016-04-29 17:46:36 +02:00
31cf27d676 Added indexer function that returns the name of the indexer 2016-04-29 16:18:56 +02:00
baba2d742e commenting _obidms.pyx 2016-04-29 16:07:03 +02:00
5bd12079ae Added comments about listing columns and indexers in obidms functions 2016-04-29 16:06:01 +02:00
072ee5ac03 Re-re-fixed line breaks in README file 2016-04-29 15:44:40 +02:00
9fe21316ff Refixed line breaks in README file 2016-04-29 15:39:46 +02:00
3dc3aaa46b Fixed line breaks in README file 2016-04-29 15:36:58 +02:00
b371030edd Adding README file 2016-04-29 15:35:08 +02:00
b3976fa461 Merge branch 'luke_tests' 2016-04-28 11:17:24 +02:00
6ea2cfb9ca Merging luke_tests branch without the commit turning inline functions in macros 2016-04-28 11:17:18 +02:00
0eca86107e Pseudo obihead for tests 2016-04-27 14:27:28 +02:00
0de953a3ef pseudo obigrep for tests 2016-04-27 14:19:55 +02:00
f3b20b809d Fixed bug with indexer names being defined and generating seg fault if
creating a column not using indexers
2016-04-27 14:01:36 +02:00
d159b921eb Fixed obi import trying to print all lines at the end (source of
segfault?)
2016-04-27 13:14:19 +02:00
4e4cf46b16 Added all C files as source files for all cython files to stop having
that kind of problem with linux systems
2016-04-27 10:44:24 +02:00
6b61533650 Added more C source files for _obiseq 2016-04-27 10:41:00 +02:00
419885485b Added files in _obitaxo C sources for cython 2016-04-27 10:30:16 +02:00
0c8504b6db Commented #ifdef directive for detect_bucket_size function because it
causes errors
2016-04-27 10:24:40 +02:00
654c34a1a6 changed inline functions to macros to make it work on Luke 2016-04-26 15:40:12 +02:00
2d8c06f7b7 Fixed variable initialization for error detection 2016-04-26 14:38:46 +02:00
a6c8d35491 import command a bit modified for tests 2016-04-26 14:29:54 +02:00
366264828e Renamed MurmurHash2.c file to murmurhash2.c as it could be a problem 2016-04-26 14:29:17 +02:00
d3a6ff6043 Removed deprecated code 2016-04-26 14:27:16 +02:00
5ca84b91dc Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2016-04-25 18:35:57 +02:00
87935c6678 Fixed all compilation problems with new function names, locations etc 2016-04-25 18:35:02 +02:00
92980508c0 Made the function to clone a column in the context of a view private 2016-04-25 18:15:25 +02:00
65880db422 Made function to update the line count of a view private 2016-04-25 18:11:37 +02:00
767d9c7804 Reordered view functions for better coherence 2016-04-25 18:07:58 +02:00
2566377e2a Updated the documentation for utils functions 2016-04-25 18:02:58 +02:00
1fbbdd43f9 Updated obiversion_t declaration 2016-04-25 17:58:37 +02:00
8cdfbb379e Documentation for views and reworked the code a little 2016-04-25 17:58:12 +02:00
0a55e26520 Reworked obiview code and added more comments 2016-04-25 11:37:53 +02:00
68a8509c12 Updated documentation in obitypes.h 2016-04-25 10:33:01 +02:00
5f98d2ed5c Fixed the calculation of the size of data for OBI_STR and OBI_SEQ
columns
2016-04-25 10:26:51 +02:00
ef1be141c1 Update Licence to english version 2016-04-23 18:03:50 +02:00
bbfd40d56d Add license 2016-04-23 18:03:10 +02:00
5d08da46a2 Updated the documentation in obidmscolumn.h 2016-04-22 17:55:53 +02:00
66045acf1d Creating a column now uses the function to create the indexer name if
one was not provided
2016-04-22 17:47:00 +02:00
6977c4315c Improved function to build an indexer name 2016-04-22 17:38:23 +02:00
839b3000a8 Added a function to build indexer names 2016-04-22 17:08:23 +02:00
ffa4557928 changed MAP_PRIVATE flags to MAP_SHARED when opening a column because it
seems a lot more efficient
2016-04-22 16:26:24 +02:00
003cd11362 Fixed initialization of NA values for OBI_STR and OBI_SEQ columns 2016-04-22 16:14:23 +02:00
c87227b65a Uncommented an error message that doesn't need to be commented anymore 2016-04-22 16:11:56 +02:00
c07e75f2ac Updated the documentation for OBI_STR columns 2016-04-22 15:59:32 +02:00
6b394a5cf7 Updated the documentation for OBI_SEQ columns 2016-04-22 15:58:20 +02:00
2416b8ccd8 Deleted more unused inclusions in OBI_STR and OBI_SEQ column types code 2016-04-22 15:56:09 +02:00
b9921e111d Removed unused inclusions and definitions in all column types code 2016-04-22 15:50:19 +02:00
8f5aa8841d Removed unused definition in OBI_IDX columns code 2016-04-22 15:44:30 +02:00
900d67de87 Updated the documentation for columns with the type OBI_IDX 2016-04-22 15:43:39 +02:00
22e3c3eeed Updated the documentation for obidms functions 2016-04-22 11:28:09 +02:00
4ead37ee48 Finished moving obiblob functions to obiblob files and documentation for
obiblob functions
2016-04-21 15:18:14 +02:00
bce360bbd5 Documentation for obiblob indexer API 2016-04-21 15:08:40 +02:00
2a68cb26f8 Improved AVL tree documentation 2016-04-21 15:07:27 +02:00
043e70ff49 Updated AVL documentation 2016-04-21 14:39:03 +02:00
66021367f6 Moved some blob functions to obiblob.c 2016-04-21 14:20:26 +02:00
e69f44ae3d Little annotations for the murmur hash function. 2016-04-21 13:53:29 +02:00
1941a3785e Updated encode functions documentation 2016-04-21 13:46:02 +02:00
c7b8db6a2e Replaced malloc+memset with calloc 2016-04-21 13:45:39 +02:00
1dc4a3be49 Documentation for DNA sequence indexing functions 2016-04-21 13:36:51 +02:00
09597016fd Short doc for crc function 2016-04-21 13:23:52 +02:00
1a2fa0923c Documented the functions indexing and retrieving character strings 2016-04-21 11:35:21 +02:00
00f2f2cc51 Documented changes made in bloom functions 2016-04-21 11:22:31 +02:00
7a88ca619a First obi import (doesn't import tags yet because NA values aren't
handled)
2016-04-15 17:00:08 +02:00
eddd19a245 Changes in obi commands 2016-04-15 16:59:21 +02:00
2aafecc3b5 Changed sequence 'description' to 'definition' everywhere 2016-04-15 16:31:43 +02:00
094b2371e9 Deleted obsolete directory 2016-04-15 14:44:31 +02:00
c1034d300d merging and fixed git conflict with obiavl.h 2016-04-15 13:23:29 +02:00
02d67c257f The default name of an AVL is now the column name + '_indexer', and when
an AVL is opened (as opposed to created), it is read-only
2016-04-15 12:55:26 +02:00
e04ea85d1e Fixed problematic __str__ method and useless declarations in the
OBI_Nuc_Seq_Stored class
2016-04-15 11:22:05 +02:00
527d3555f0 Moved the functions getting full paths for files and directories to
obidms.c/.h files
2016-04-15 11:11:13 +02:00
71492ad229 Made the handling of listing and unlisting opened columns and indexers
functions in the obidms files.
2016-04-15 10:49:12 +02:00
73d64e5aff Renamed 'unmap_header' function to 'close_header' 2016-04-14 15:19:27 +02:00
4cb52e1632 Made the truncating of columns automatic when closing them (note:
already the case for AVLs)
2016-04-14 15:13:30 +02:00
9d042f7bd0 Refactored and relocated the set and get functions of all column types,
both within and out of the context of a view
2016-04-13 15:10:24 +02:00
5ec2d8842e Character string indexer API 2016-04-12 17:21:01 +02:00
04c9470f7d Fixed and cleaned DNA_seq_indexer API 2016-04-12 17:20:24 +02:00
be05c889e2 DNA_seq_indexer API 2016-04-12 16:38:47 +02:00
04e3a7b5a9 Added more references in cython .cfiles files because it seems necessary
for linux distributions
2016-04-12 15:10:54 +02:00
d8107533d8 Obiblob_indexer API 2016-04-12 14:53:33 +02:00
cd4e65e190 Fixed typo and includes in obiblob files 2016-04-12 14:52:27 +02:00
375bfcce8a Renamed "Obi_byte_arrays" to "Obiblobs" and moved Obiblob functions to
separate obiblob.c and obiblob.h files
2016-04-12 11:21:14 +02:00
c225cfd8b6 Fixed bug with retrieval of values from AVLs (bad cast in byte array
structure)
2016-04-11 17:07:22 +02:00
6fe4c6134a Allows for calling getConfiguration without parametter for geting the
default configuration
2016-04-11 13:31:09 +02:00
966b1325ed Deleted declaration of obsolete public function 2016-04-11 11:14:20 +02:00
019dfc01b4 Branch to refactor and debug (AVLs bugged) 2016-04-08 15:38:57 +02:00
45c9c5075c A first version of the fasta parser 2016-04-01 18:15:54 +02:00
20b97c972b Add boolean type in the tag evaluation 2016-04-01 13:42:24 +02:00
efc4a4a3c6 Reduce the call count to eval. This reduce by 3 the time of fast(q|a)
header processing
2016-04-01 08:54:06 +02:00
ce6ea89c21 Add the missing bootstrappip module 2016-03-31 17:28:03 +02:00
4207db7c17 Transfers bug patch from orgasm 2016-03-31 16:53:09 +02:00
1cd35b3359 firt version of a fastq parser 2016-03-31 10:47:12 +02:00
f51a6df5b2 Add a class buffering lines during a text file reading 2016-03-30 14:53:25 +02:00
94417e1330 patch the uncompress module to be able to deal with remote file 2016-03-29 20:57:39 +02:00
2e17dbce55 Adds a uopen function able to open transparently a local or a remote
file compressed or not
2016-03-29 20:56:54 +02:00
a9eed1f5d9 Adds class for uncompressing transparently compressed files on line 2016-03-29 18:21:04 +02:00
2dfab3f378 Some changes in relation with the new obitools3.apps module 2016-03-28 15:05:59 +02:00
e583098a96 change in the obi programme according to the new obitools3.apps module
creation
2016-03-28 15:05:02 +02:00
b926ca3997 A template for a command 2016-03-28 15:04:06 +02:00
aacfefad26 A set of utilitaty function for creating commands 2016-03-28 15:03:26 +02:00
edc4fd7b3e Fixed minor warning 2016-03-25 16:11:52 +01:00
ff6c27acf2 Implemented the retrieval of values with groups of AVLs 2016-03-25 15:35:16 +01:00
69856f18dd untested (and no possible retrieval) of CRC used to represent data in
AVL trees
2016-03-24 16:38:11 +01:00
2c084c8cf7 Switch to 10000000 per avl 2016-03-23 16:13:28 +01:00
58ac860cc7 Added macro for the bloom filter parameters and deleted old unused
macros for crc
2016-03-23 13:33:40 +01:00
d44117d625 obiimport function for testing purposes 2016-03-23 13:00:02 +01:00
6bd42132c4 Minor fixes to silence warnings and replaced two asprintf uses 2016-03-23 12:58:53 +01:00
4085904362 Merge branch 'multiple_avls_bloom' 2016-03-22 14:14:10 +01:00
b04b4b5902 made POSIX compliant 2016-03-21 11:33:06 +01:00
383e738ab7 Merge branch 'master' of git@git.metabarcoding.org:obitools/obitools3.git 2016-03-18 15:49:53 +01:00
3681cecb4d Multiple AVLs with bloom filters (very raw test version) 2016-03-18 11:06:02 +01:00
545ed8111a Code for tests storing data in multiple AVLs.
(note: unretrievable data as implemented)
2016-03-11 15:34:55 +01:00
86071d30c9 Minor improvement in AVL initial size calculation 2016-03-11 14:07:40 +01:00
21d1b2ed3e First implementation of taxonomy reading 2016-03-11 13:56:38 +01:00
6157633137 prototype for the obi unix command and the count sub command 2016-03-08 16:06:00 +01:00
a08def47e6 It is now impossible to create a view with a name identical to one of an
existing written view
2016-03-01 13:36:54 +01:00
fc5a12bad7 Closes #34 2016-02-29 17:56:55 +01:00
e323d8e702 Cython classes for nucleotide sequences (outside or in the context of a
view)
2016-02-29 16:33:30 +01:00
b350ea0393 Fixed minor error 2016-02-29 16:28:34 +01:00
8e9e21a02e Increased the maximum depth of AVL trees 2016-02-29 16:27:23 +01:00
4df313c54a Added Obiviews specialized for the handling of nucleotide sequences 2016-02-25 09:43:27 +01:00
ffc68d448f Deleted a forgotten print statement 2016-02-18 15:15:42 +01:00
a8f03248a8 Major update : views 2016-02-18 10:38:51 +01:00
cfaf069095 Fixed more typos and formatting imperfections. 2015-12-11 17:37:25 +01:00
a6144eabe2 Fixed typos 2015-12-11 17:26:20 +01:00
c139367555 DNA sequences and character strings are now handled using AVL trees. 2015-12-11 17:24:44 +01:00
1586956d57 Added the lists of opened columns and arrays in the OBIDMS structure,
and a counter in the OBIDMS column structure; fixed some bugs and
created tests for referring columns that are bound to disappear anyway.
2015-12-02 17:32:07 +01:00
b45b496b0e Major update: new type of columns containing indices referring to lines
in other columns
2015-11-29 11:57:07 +01:00
2cf10cb6f0 Column type is now passed as a character string when creating the column
(either 'OBI_INT', 'OBI_FLOAT', 'OBI_BOOL', 'OBI_CHAR', 'OBI_STR' or
'OBI_SEQ')
2015-11-23 15:48:27 +01:00
5a5516303d deleting useless .pyc files 2015-11-23 14:43:34 +01:00
d6a99bafea Fixed a major bug with the versioning of columns that was introduced in
f6ec8ba9
2015-11-23 13:34:51 +01:00
08f2657e18 Increased maximum line count of columns to 1^9 2015-11-23 13:23:18 +01:00
6aa2f92930 DNA sequences are now encoded on 4 bits when they are in IUPAC 2015-11-20 15:32:09 +01:00
87044b41d8 modified the encoding function on 2 bits a little 2015-11-20 11:32:47 +01:00
6ab1c83302 New column type for DNA sequences. Only for those coded on 2 bits (only
'ATGCatgc') for now.
2015-11-19 18:12:48 +01:00
e371248567 changed version to 0.0.0 2015-11-19 18:11:21 +01:00
dbf9463238 The endianness of a DMS is now stored in the OBIDMS structure 2015-11-18 15:35:09 +01:00
eb12af4da4 Fixed minor error in the documentation of a function. 2015-11-16 15:38:01 +01:00
e8417b4f6f The endianness of an OBIDMS is now stored in an informations file that
is read when opening the OBIDMS.
2015-11-16 14:37:51 +01:00
6579566c6e Minor changes in code to improve readability and fix C compilation
warnings
2015-11-10 14:37:58 +01:00
410e2e02a0 When retrieving the header of a column, the version number of the column
wanted can now be provided.
2015-11-10 13:30:10 +01:00
8ce4f264aa When enlarging a column, the function doesn't try anymore to keep the
mapped region at the same pointer (never works), and unmap/remap
instead.
2015-11-10 13:18:36 +01:00
d885eb48ff The header size when creating a column is now calculated according to
the size of the header structure and the page size of the platform.
2015-11-10 13:09:30 +01:00
661fe3606a In OBI_CHAR columns, characters are now given and retrieved as decoded
(unicode) characters.
2015-11-10 11:24:08 +01:00
c4b7e579cf Comments in column headers are now working. 2015-11-10 10:56:45 +01:00
f6ec8ba963 The header size is now directly read in the file when a column or an
array is opened.
2015-11-09 17:50:32 +01:00
0e3d6ed2d7 Methods __len__ (number of lines used) and __sizeof__ (total size in
bytes) implemented for columns.
2015-11-09 15:56:20 +01:00
01bfc14503 The data size in bytes is now stored in the header of a column. 2015-11-09 15:55:00 +01:00
65c1b1e8b2 Minor changes to make the creation of files and directories cleaner 2015-11-09 15:22:01 +01:00
b37bd8f21c File descriptors for dms, column and array directories are now stored in
structures.
2015-11-09 15:06:02 +01:00
05e3956a0c Minor changes in code to improve readability (freeing some character
strings earlier)
2015-11-09 11:22:51 +01:00
9b066f4327 Major update: obiarrays with columns containing indices referring to
character strings.
2015-11-06 17:55:15 +01:00
456551ffeb obi arrays that don't work because of cython bug passing wrong pointers 2015-11-03 14:22:00 +01:00
ecb9d97adb Reorganized the code to have less functions, and the functions to get
and format the creation date of a column are now working.
2015-10-15 15:12:45 +02:00
0eaa5aa784 Major changes : new cython subclasses to handle columns with multiple
elements per line in a more efficient way + now elements_names are
passed as a list + new function to recover only the header of a column
2015-10-14 18:05:34 +02:00
21923e213d The unit tests now test for None values 2015-10-12 18:02:40 +02:00
6877fc4892 Fixed a critical bug where values were initialized to NA at the wrong
location when there was multiple elements per line
2015-10-12 17:54:36 +02:00
dbed3d9d1d New module for unit testing with PyUnit 2015-10-09 15:42:57 +02:00
fc8bf16769 Fixed a critical bug in the computation of the new number of lines of a
column when truncating
2015-10-09 13:49:48 +02:00
e114a3c9cb fixed a critical bug where data size was not calculated correctly and
column directory is now closed when column is closed
2015-10-09 10:25:40 +02:00
ebc9f6f512 fixed a bug where Cython was casting doubles in floats 2015-10-08 15:28:30 +02:00
2b3f03ec28 Removed deprecated script 2015-10-08 10:46:46 +02:00
8fd9c06be2 Fixed missing file for documentation compilation 2015-10-08 10:45:54 +02:00
b553eef781 Method to close a DMS is uncommented but not complete yet (columns have
to be closed separately)
2015-10-08 10:44:13 +02:00
ee4c513fd4 Fixed a bug where cloning a column would fail if the data was empty 2015-10-08 10:36:02 +02:00
c013e6ad33 fixed typo in doxygen doc 2015-10-08 10:33:19 +02:00
c98d567e2f Updated the documentation and restructured a bit because it wasn't
compiling (note: Breathe not working)
2015-10-06 11:09:01 +02:00
392f110c8d new functions in the OBIDMS_column class to raise NotImplementedError
exceptions and to get the creation date of a column
2015-10-02 13:51:26 +02:00
6ced3c4896 new functions to get the creation date of a column 2015-10-02 13:47:53 +02:00
4b8bf41a71 closes #13, obi_errno is initialized to 0 2015-10-02 13:46:34 +02:00
c59a244e9d Fixed little typo 2015-09-30 12:07:13 +02:00
4b7f2d268b Doxygen documentation corrected and completed. 2015-09-30 12:03:46 +02:00
389 changed files with 63325 additions and 5040 deletions

518
LICENSE Executable file
View File

@ -0,0 +1,518 @@
CeCILL FREE SOFTWARE LICENSE AGREEMENT
Version 2.1 dated 2013-06-21
Notice
This Agreement is a Free Software license agreement that is the result
of discussions between its authors in order to ensure compliance with
the two main principles guiding its drafting:
* firstly, compliance with the principles governing the distribution
of Free Software: access to source code, broad rights granted to users,
* secondly, the election of a governing law, French law, with which it
is conformant, both as regards the law of torts and intellectual
property law, and the protection that it offers to both authors and
holders of the economic rights over software.
The authors of the CeCILL (for Ce[a] C[nrs] I[nria] L[ogiciel] L[ibre])
license are:
Commissariat à l'énergie atomique et aux énergies alternatives - CEA, a
public scientific, technical and industrial research establishment,
having its principal place of business at 25 rue Leblanc, immeuble Le
Ponant D, 75015 Paris, France.
Centre National de la Recherche Scientifique - CNRS, a public scientific
and technological establishment, having its principal place of business
at 3 rue Michel-Ange, 75794 Paris cedex 16, France.
Institut National de Recherche en Informatique et en Automatique -
Inria, a public scientific and technological establishment, having its
principal place of business at Domaine de Voluceau, Rocquencourt, BP
105, 78153 Le Chesnay cedex, France.
Preamble
The purpose of this Free Software license agreement is to grant users
the right to modify and redistribute the software governed by this
license within the framework of an open source distribution model.
The exercising of this right is conditional upon certain obligations for
users so as to preserve this status for all subsequent redistributions.
In consideration of access to the source code and the rights to copy,
modify and redistribute granted by the license, users are provided only
with a limited warranty and the software's author, the holder of the
economic rights, and the successive licensors only have limited liability.
In this respect, the risks associated with loading, using, modifying
and/or developing or reproducing the software by the user are brought to
the user's attention, given its Free Software status, which may make it
complicated to use, with the result that its use is reserved for
developers and experienced professionals having in-depth computer
knowledge. Users are therefore encouraged to load and test the
suitability of the software as regards their requirements in conditions
enabling the security of their systems and/or data to be ensured and,
more generally, to use and operate it in the same conditions of
security. This Agreement may be freely reproduced and published,
provided it is not altered, and that no provisions are either added or
removed herefrom.
This Agreement may apply to any or all software for which the holder of
the economic rights decides to submit the use thereof to its provisions.
Frequently asked questions can be found on the official website of the
CeCILL licenses family (http://www.cecill.info/index.en.html) for any
necessary clarification.
Article 1 - DEFINITIONS
For the purpose of this Agreement, when the following expressions
commence with a capital letter, they shall have the following meaning:
Agreement: means this license agreement, and its possible subsequent
versions and annexes.
Software: means the software in its Object Code and/or Source Code form
and, where applicable, its documentation, "as is" when the Licensee
accepts the Agreement.
Initial Software: means the Software in its Source Code and possibly its
Object Code form and, where applicable, its documentation, "as is" when
it is first distributed under the terms and conditions of the Agreement.
Modified Software: means the Software modified by at least one
Contribution.
Source Code: means all the Software's instructions and program lines to
which access is required so as to modify the Software.
Object Code: means the binary files originating from the compilation of
the Source Code.
Holder: means the holder(s) of the economic rights over the Initial
Software.
Licensee: means the Software user(s) having accepted the Agreement.
Contributor: means a Licensee having made at least one Contribution.
Licensor: means the Holder, or any other individual or legal entity, who
distributes the Software under the Agreement.
Contribution: means any or all modifications, corrections, translations,
adaptations and/or new functions integrated into the Software by any or
all Contributors, as well as any or all Internal Modules.
Module: means a set of sources files including their documentation that
enables supplementary functions or services in addition to those offered
by the Software.
External Module: means any or all Modules, not derived from the
Software, so that this Module and the Software run in separate address
spaces, with one calling the other when they are run.
Internal Module: means any or all Module, connected to the Software so
that they both execute in the same address space.
GNU GPL: means the GNU General Public License version 2 or any
subsequent version, as published by the Free Software Foundation Inc.
GNU Affero GPL: means the GNU Affero General Public License version 3 or
any subsequent version, as published by the Free Software Foundation Inc.
EUPL: means the European Union Public License version 1.1 or any
subsequent version, as published by the European Commission.
Parties: mean both the Licensee and the Licensor.
These expressions may be used both in singular and plural form.
Article 2 - PURPOSE
The purpose of the Agreement is the grant by the Licensor to the
Licensee of a non-exclusive, transferable and worldwide license for the
Software as set forth in Article 5 <#scope> hereinafter for the whole
term of the protection granted by the rights over said Software.
Article 3 - ACCEPTANCE
3.1 The Licensee shall be deemed as having accepted the terms and
conditions of this Agreement upon the occurrence of the first of the
following events:
* (i) loading the Software by any or all means, notably, by
downloading from a remote server, or by loading from a physical medium;
* (ii) the first time the Licensee exercises any of the rights granted
hereunder.
3.2 One copy of the Agreement, containing a notice relating to the
characteristics of the Software, to the limited warranty, and to the
fact that its use is restricted to experienced users has been provided
to the Licensee prior to its acceptance as set forth in Article 3.1
<#accepting> hereinabove, and the Licensee hereby acknowledges that it
has read and understood it.
Article 4 - EFFECTIVE DATE AND TERM
4.1 EFFECTIVE DATE
The Agreement shall become effective on the date when it is accepted by
the Licensee as set forth in Article 3.1 <#accepting>.
4.2 TERM
The Agreement shall remain in force for the entire legal term of
protection of the economic rights over the Software.
Article 5 - SCOPE OF RIGHTS GRANTED
The Licensor hereby grants to the Licensee, who accepts, the following
rights over the Software for any or all use, and for the term of the
Agreement, on the basis of the terms and conditions set forth hereinafter.
Besides, if the Licensor owns or comes to own one or more patents
protecting all or part of the functions of the Software or of its
components, the Licensor undertakes not to enforce the rights granted by
these patents against successive Licensees using, exploiting or
modifying the Software. If these patents are transferred, the Licensor
undertakes to have the transferees subscribe to the obligations set
forth in this paragraph.
5.1 RIGHT OF USE
The Licensee is authorized to use the Software, without any limitation
as to its fields of application, with it being hereinafter specified
that this comprises:
1. permanent or temporary reproduction of all or part of the Software
by any or all means and in any or all form.
2. loading, displaying, running, or storing the Software on any or all
medium.
3. entitlement to observe, study or test its operation so as to
determine the ideas and principles behind any or all constituent
elements of said Software. This shall apply when the Licensee
carries out any or all loading, displaying, running, transmission or
storage operation as regards the Software, that it is entitled to
carry out hereunder.
5.2 ENTITLEMENT TO MAKE CONTRIBUTIONS
The right to make Contributions includes the right to translate, adapt,
arrange, or make any or all modifications to the Software, and the right
to reproduce the resulting software.
The Licensee is authorized to make any or all Contributions to the
Software provided that it includes an explicit notice that it is the
author of said Contribution and indicates the date of the creation thereof.
5.3 RIGHT OF DISTRIBUTION
In particular, the right of distribution includes the right to publish,
transmit and communicate the Software to the general public on any or
all medium, and by any or all means, and the right to market, either in
consideration of a fee, or free of charge, one or more copies of the
Software by any means.
The Licensee is further authorized to distribute copies of the modified
or unmodified Software to third parties according to the terms and
conditions set forth hereinafter.
5.3.1 DISTRIBUTION OF SOFTWARE WITHOUT MODIFICATION
The Licensee is authorized to distribute true copies of the Software in
Source Code or Object Code form, provided that said distribution
complies with all the provisions of the Agreement and is accompanied by:
1. a copy of the Agreement,
2. a notice relating to the limitation of both the Licensor's warranty
and liability as set forth in Articles 8 and 9,
and that, in the event that only the Object Code of the Software is
redistributed, the Licensee allows effective access to the full Source
Code of the Software for a period of at least three years from the
distribution of the Software, it being understood that the additional
acquisition cost of the Source Code shall not exceed the cost of the
data transfer.
5.3.2 DISTRIBUTION OF MODIFIED SOFTWARE
When the Licensee makes a Contribution to the Software, the terms and
conditions for the distribution of the resulting Modified Software
become subject to all the provisions of this Agreement.
The Licensee is authorized to distribute the Modified Software, in
source code or object code form, provided that said distribution
complies with all the provisions of the Agreement and is accompanied by:
1. a copy of the Agreement,
2. a notice relating to the limitation of both the Licensor's warranty
and liability as set forth in Articles 8 and 9,
and, in the event that only the object code of the Modified Software is
redistributed,
3. a note stating the conditions of effective access to the full source
code of the Modified Software for a period of at least three years
from the distribution of the Modified Software, it being understood
that the additional acquisition cost of the source code shall not
exceed the cost of the data transfer.
5.3.3 DISTRIBUTION OF EXTERNAL MODULES
When the Licensee has developed an External Module, the terms and
conditions of this Agreement do not apply to said External Module, that
may be distributed under a separate license agreement.
5.3.4 COMPATIBILITY WITH OTHER LICENSES
The Licensee can include a code that is subject to the provisions of one
of the versions of the GNU GPL, GNU Affero GPL and/or EUPL in the
Modified or unmodified Software, and distribute that entire code under
the terms of the same version of the GNU GPL, GNU Affero GPL and/or EUPL.
The Licensee can include the Modified or unmodified Software in a code
that is subject to the provisions of one of the versions of the GNU GPL,
GNU Affero GPL and/or EUPL and distribute that entire code under the
terms of the same version of the GNU GPL, GNU Affero GPL and/or EUPL.
Article 6 - INTELLECTUAL PROPERTY
6.1 OVER THE INITIAL SOFTWARE
The Holder owns the economic rights over the Initial Software. Any or
all use of the Initial Software is subject to compliance with the terms
and conditions under which the Holder has elected to distribute its work
and no one shall be entitled to modify the terms and conditions for the
distribution of said Initial Software.
The Holder undertakes that the Initial Software will remain ruled at
least by this Agreement, for the duration set forth in Article 4.2 <#term>.
6.2 OVER THE CONTRIBUTIONS
The Licensee who develops a Contribution is the owner of the
intellectual property rights over this Contribution as defined by
applicable law.
6.3 OVER THE EXTERNAL MODULES
The Licensee who develops an External Module is the owner of the
intellectual property rights over this External Module as defined by
applicable law and is free to choose the type of agreement that shall
govern its distribution.
6.4 JOINT PROVISIONS
The Licensee expressly undertakes:
1. not to remove, or modify, in any manner, the intellectual property
notices attached to the Software;
2. to reproduce said notices, in an identical manner, in the copies of
the Software modified or not.
The Licensee undertakes not to directly or indirectly infringe the
intellectual property rights on the Software of the Holder and/or
Contributors, and to take, where applicable, vis-à-vis its staff, any
and all measures required to ensure respect of said intellectual
property rights of the Holder and/or Contributors.
Article 7 - RELATED SERVICES
7.1 Under no circumstances shall the Agreement oblige the Licensor to
provide technical assistance or maintenance services for the Software.
However, the Licensor is entitled to offer this type of services. The
terms and conditions of such technical assistance, and/or such
maintenance, shall be set forth in a separate instrument. Only the
Licensor offering said maintenance and/or technical assistance services
shall incur liability therefor.
7.2 Similarly, any Licensor is entitled to offer to its licensees, under
its sole responsibility, a warranty, that shall only be binding upon
itself, for the redistribution of the Software and/or the Modified
Software, under terms and conditions that it is free to decide. Said
warranty, and the financial terms and conditions of its application,
shall be subject of a separate instrument executed between the Licensor
and the Licensee.
Article 8 - LIABILITY
8.1 Subject to the provisions of Article 8.2, the Licensee shall be
entitled to claim compensation for any direct loss it may have suffered
from the Software as a result of a fault on the part of the relevant
Licensor, subject to providing evidence thereof.
8.2 The Licensor's liability is limited to the commitments made under
this Agreement and shall not be incurred as a result of in particular:
(i) loss due the Licensee's total or partial failure to fulfill its
obligations, (ii) direct or consequential loss that is suffered by the
Licensee due to the use or performance of the Software, and (iii) more
generally, any consequential loss. In particular the Parties expressly
agree that any or all pecuniary or business loss (i.e. loss of data,
loss of profits, operating loss, loss of customers or orders,
opportunity cost, any disturbance to business activities) or any or all
legal proceedings instituted against the Licensee by a third party,
shall constitute consequential loss and shall not provide entitlement to
any or all compensation from the Licensor.
Article 9 - WARRANTY
9.1 The Licensee acknowledges that the scientific and technical
state-of-the-art when the Software was distributed did not enable all
possible uses to be tested and verified, nor for the presence of
possible defects to be detected. In this respect, the Licensee's
attention has been drawn to the risks associated with loading, using,
modifying and/or developing and reproducing the Software which are
reserved for experienced users.
The Licensee shall be responsible for verifying, by any or all means,
the suitability of the product for its requirements, its good working
order, and for ensuring that it shall not cause damage to either persons
or properties.
9.2 The Licensor hereby represents, in good faith, that it is entitled
to grant all the rights over the Software (including in particular the
rights set forth in Article 5 <#scope>).
9.3 The Licensee acknowledges that the Software is supplied "as is" by
the Licensor without any other express or tacit warranty, other than
that provided for in Article 9.2 <#good-faith> and, in particular,
without any warranty as to its commercial value, its secured, safe,
innovative or relevant nature.
Specifically, the Licensor does not warrant that the Software is free
from any error, that it will operate without interruption, that it will
be compatible with the Licensee's own equipment and software
configuration, nor that it will meet the Licensee's requirements.
9.4 The Licensor does not either expressly or tacitly warrant that the
Software does not infringe any third party intellectual property right
relating to a patent, software or any other property right. Therefore,
the Licensor disclaims any and all liability towards the Licensee
arising out of any or all proceedings for infringement that may be
instituted in respect of the use, modification and redistribution of the
Software. Nevertheless, should such proceedings be instituted against
the Licensee, the Licensor shall provide it with technical and legal
expertise for its defense. Such technical and legal expertise shall be
decided on a case-by-case basis between the relevant Licensor and the
Licensee pursuant to a memorandum of understanding. The Licensor
disclaims any and all liability as regards the Licensee's use of the
name of the Software. No warranty is given as regards the existence of
prior rights over the name of the Software or as regards the existence
of a trademark.
Article 10 - TERMINATION
10.1 In the event of a breach by the Licensee of its obligations
hereunder, the Licensor may automatically terminate this Agreement
thirty (30) days after notice has been sent to the Licensee and has
remained ineffective.
10.2 A Licensee whose Agreement is terminated shall no longer be
authorized to use, modify or distribute the Software. However, any
licenses that it may have granted prior to termination of the Agreement
shall remain valid subject to their having been granted in compliance
with the terms and conditions hereof.
Article 11 - MISCELLANEOUS
11.1 EXCUSABLE EVENTS
Neither Party shall be liable for any or all delay, or failure to
perform the Agreement, that may be attributable to an event of force
majeure, an act of God or an outside cause, such as defective
functioning or interruptions of the electricity or telecommunications
networks, network paralysis following a virus attack, intervention by
government authorities, natural disasters, water damage, earthquakes,
fire, explosions, strikes and labor unrest, war, etc.
11.2 Any failure by either Party, on one or more occasions, to invoke
one or more of the provisions hereof, shall under no circumstances be
interpreted as being a waiver by the interested Party of its right to
invoke said provision(s) subsequently.
11.3 The Agreement cancels and replaces any or all previous agreements,
whether written or oral, between the Parties and having the same
purpose, and constitutes the entirety of the agreement between said
Parties concerning said purpose. No supplement or modification to the
terms and conditions hereof shall be effective as between the Parties
unless it is made in writing and signed by their duly authorized
representatives.
11.4 In the event that one or more of the provisions hereof were to
conflict with a current or future applicable act or legislative text,
said act or legislative text shall prevail, and the Parties shall make
the necessary amendments so as to comply with said act or legislative
text. All other provisions shall remain effective. Similarly, invalidity
of a provision of the Agreement, for any reason whatsoever, shall not
cause the Agreement as a whole to be invalid.
11.5 LANGUAGE
The Agreement is drafted in both French and English and both versions
are deemed authentic.
Article 12 - NEW VERSIONS OF THE AGREEMENT
12.1 Any person is authorized to duplicate and distribute copies of this
Agreement.
12.2 So as to ensure coherence, the wording of this Agreement is
protected and may only be modified by the authors of the License, who
reserve the right to periodically publish updates or new versions of the
Agreement, each with a separate number. These subsequent versions may
address new issues encountered by Free Software.
12.3 Any Software distributed under a given version of the Agreement may
only be subsequently distributed under the same version of the Agreement
or a subsequent version, subject to the provisions of Article 5.3.4
<#compatibility>.
Article 13 - GOVERNING LAW AND JURISDICTION
13.1 The Agreement is governed by French law. The Parties agree to
endeavor to seek an amicable solution to any disagreements or disputes
that may arise during the performance of the Agreement.
13.2 Failing an amicable solution within two (2) months as from their
occurrence, and unless emergency proceedings are necessary, the
disagreements or disputes shall be referred to the Paris Courts having
jurisdiction, by the more diligent Party.

0
MANIFEST.in Normal file → Executable file
View File

40
README.md Executable file
View File

@ -0,0 +1,40 @@
The `OBITools3`: A package for the management of analyses and data in DNA metabarcoding
---------------------------------------------
DNA metabarcoding offers new perspectives for biodiversity research [1]. This approach of ecosystem studies relies heavily on the use of Next-Generation Sequencing (NGS), and consequently requires the ability to to treat large volumes of data. The `OBITools` package satisfies this requirement thanks to a set of programs specifically designed for analyzing NGS data in a DNA metabarcoding context [2] - <http://metabarcoding.org/obitools>. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to setup tailored-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses.
**The `OBITools3`.** This new version of the `OBITools` looks to significantly improve the storage efficiency and the data processing speed. To this end, the `OBITools3` rely on an ad hoc database system, inside which all the data that a DNA metabarcoding experiment must consider is stored: the sequences, the metadata (describing for instance the samples), the database containing the reference sequences used for the taxonomic annotation, as well as the taxonomic databases. Besides the gain in efficiency, this new structure allows an easier access to all the data associated with an experiment.
**Column-oriented storage.** An analysis pipeline corresponds to a succession of commands, each computing one step of the analysis, and where the result of the command *n* is used by the command *n+1*. DNA metabarcoding data can easily be represented in the form of tables, and each command can be regarded as an operation transforming one or several 'input' tables into one or several 'output' tables, which can be used by the next command. Many of the basic operations in a pipeline copy without modification an important part of the input tables to the result tables, and use for their calculations only a small part of the input data. In the original `OBITools`, those tables are kept in the form of annotated sequence files in the FASTA or FASTQ format. This has two consequences: i) keeping the transitional results of the analysis pipeline means using disk space for an important volume of redundant data, ii) The coding and decoding of informations that are not actually used represent an important part of the treatment process. The new database system used by the `OBITools3` (called DMS for Data Management System) relies on column-oriented storage. The columns are immutable and can be assembled in views representing the data tables. This way, the data not modified by a command in an input table can easily be associated to the result table without duplicating any information ; and the data not used at all by a command can be associated with the result table without being read. This strategy results in a gain in disk space efficiency by limiting data redundancy, as well as a gain in execution time by limiting data reading, writing and conversion operations. Finally, as a mean to optimize data access, each column is stored in a binary file directly mapped in memory for reading and writing operations.
**Storage optimization.** DNA metabarcoding data is intrinsically very redundant. For example, the same sequence corresponding to a species will be present several thousand times across all samples. In order to limit the disk space used and make comparison operations more efficient, data in the form of character strings is stored in columns using a complex indexing structure, efficient on millions of values, coupling hash functions, Bloom filters and AVL trees. Finally, DNA sequences are compressed by encoding each nucleotide on two or four bits depending on whether the sequences contain only the four nucleotides (A, C, G, T) or use the IUPAC codes.
**Saving the data processing history.** The totality of the informations used by the `OBITools3` is stored in immutable data structures in the DMS. If a command has to modify a column used as input to produce its result, a new version of that column is created, leaving the initial version intact. This storage system enables to keep, at minimal cost, the totality of the transitional results produced by the pipeline. The storage of metadata describing all the operations that have produced a view (a result table) in the DMS makes possible the creation of an oriented hypergraph, where each node corresponds to a view and each arrow to an operation. By retracing the dependency relationships in this hypergraph, it is possible to rebuild *a posteriori* the entirety of the process that has produced a result table.
**Tools.** The `OBITools3` offer the same tools as the original `OBITools`. Eventually, new versions of `ecoPrimers` (PCR primer design) [3], `ecoPCR` (*in silico* PCR) [4], as well as `Sumatra` (sequence alignment) and `Sumaclust` (sequence alignment and clustering) [5] will be added, taking advantage of the database structure developed for the `OBITools3`.
**Implementation and disponibility.** The lower layers managing the DMS as well as all the compute-intensive functions are coded in `C99` for efficiency reasons. A `Cython` (<http://www.cython.org>) object layer allows for a simple but efficient implementation of the `OBITools3` commands in `Python 3.5`. The `OBITools3` are still in development, and the first functional versions are expected for autumn 2016.
**References.**
1. Taberlet P, Coissac E, Hajibabaei M, Rieseberg LH: Environmental DNA. Mol Ecol 2012:17891793.
2. Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E: OBITools: a Unix-inspired software package for DNA metabarcoding. Mol Ecol Resour 2015:n/an/a.
3. Riaz T, Shehzad W, Viari A, Pompanon F, Taberlet P, Coissac E: ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic Acids Res 2011, 39:e145.
4. Ficetola GF, Coissac E, Zundel S, Riaz T, Shehzad W, Bessière J, Taberlet P, Pompanon F: An in silico approach for the evaluation of DNA barcodes. BMC Genomics 2010, 11:434.
5. Mercier C, Boyer F, Bonin A, Coissac E (2013) SUMATRA and SUMACLUST: fast and exact comparison and clustering of sequences. Available: <http://metabarcoding.org/sumatra> and <http://metabarcoding.org/sumaclust>

0
c-sandbox/obicount/Makefile Normal file → Executable file
View File

0
c-sandbox/obicount/obicount.c Normal file → Executable file
View File

View File

@ -6,12 +6,28 @@ Created on 20 oct. 2012
import os
from distutils import sysconfig
from distutils.core import Command
from distutils.sysconfig import customize_compiler
from distutils.sysconfig import customize_compiler as customize_compiler_ori
from distutils.errors import DistutilsSetupError
from distutils import log
from distutils.ccompiler import show_compilers
def customize_compiler(compiler):
customize_compiler_ori(compiler)
compilername = compiler.compiler[0]
if ("gcc" in compilername or "g++" in compilername):
cc_cmd = ' '.join(compiler.compiler + ['-fopenmp'])
ccshared= ' '.join(x for x in sysconfig.get_config_vars("ccshared") if x is not None)
compiler.set_executables(
compiler=cc_cmd,
compiler_so=cc_cmd + ' ' + ccshared
)
class build_exe(Command):
description = "build an executable -- Abstract command "
@ -80,6 +96,7 @@ class build_exe(Command):
else:
self.extra_compile_args.append('-m%s' % self.sse)
# XXX same as for build_ext -- what about 'self.define' and
# 'self.undef' ?
@ -96,7 +113,7 @@ class build_exe(Command):
dry_run=self.dry_run,
force=self.force)
customize_compiler(self.compiler)
if self.include_dirs is not None:
self.compiler.set_include_dirs(self.include_dirs)
if self.define is not None:

View File

@ -7,107 +7,137 @@ Created on 13 fevr. 2014
from distutils import log
import os
from Cython.Distutils import build_ext as ori_build_ext # @UnresolvedImport
from Cython.Compiler import Options as cython_options # @UnresolvedImport
from distutils import sysconfig
from distutils.errors import DistutilsSetupError
class build_ext(ori_build_ext):
def modifyDocScripts(self):
build_dir_file=open("doc/sphinx/build_dir.txt","w")
print(self.build_lib,file=build_dir_file)
build_dir_file.close()
def initialize_options(self):
ori_build_ext.initialize_options(self) # @UndefinedVariable
self.littlebigman = None
self.built_files = None
def finalize_options(self):
ori_build_ext.finalize_options(self) # @UndefinedVariable
self.set_undefined_options('littlebigman',
('littlebigman', 'littlebigman'))
self.set_undefined_options('build_files',
('files', 'built_files'))
self.cython_c_in_temp = 1
if self.littlebigman =='-DLITTLE_END':
if self.define is None:
self.define=[('LITTLE_END',None)]
else:
self.define.append('LITTLE_END',None)
def substitute_sources(self,exe_name,sources):
"""
Substitutes source file name starting by an @ by the actual
name of the built file (see --> build_files)
"""
sources = list(sources)
for i in range(len(sources)):
message = "%s :-> %s" % (exe_name,sources[i])
if sources[i][0]=='@':
try:
filename = self.built_files[sources[i][1:]]
except KeyError:
tmpfilename = os.path.join(self.build_temp,sources[i][1:])
if os.path.isfile (tmpfilename):
filename = tmpfilename
else:
raise DistutilsSetupError(
'The %s filename declared in the source '
'files of the program %s have not been '
'built by the installation process' % (sources[i],
exe_name))
sources[i]=filename
log.info("%s changed to %s",message,filename)
else:
log.info("%s ok",message)
return sources
def build_extensions(self):
# First, sanity-check the 'extensions' list
for ext in self.extensions:
ext.sources = self.substitute_sources(ext.name,ext.sources)
def _customize_compiler(compiler):
compilername = compiler.compiler[0]
if ("gcc" in compilername or "g++" in compilername):
cc_cmd = ' '.join(compiler.compiler + ['-fopenmp'])
ccshared= ' '.join(x for x in sysconfig.get_config_vars("ccshared") if x is not None)
self.check_extensions_list(self.extensions)
compiler.set_executables(
compiler=cc_cmd,
compiler_so=cc_cmd + ' ' + ccshared
)
for ext in self.extensions:
log.info("%s :-> %s",ext.name,ext.sources)
ext.sources = self.cython_sources(ext.sources, ext)
self.build_extension(ext)
def run(self):
self.modifyDocScripts()
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
cython_options.annotate = True
ori_build_ext.run(self) # @UndefinedVariable
def has_files(self):
return self.distribution.has_files()
def has_executables(self):
return self.distribution.has_executables()
try:
from Cython.Distutils import build_ext as ori_build_ext # @UnresolvedImport
from Cython.Compiler import Options as cython_options # @UnresolvedImport
class build_ext(ori_build_ext):
sub_commands = [('build_files',has_files),
('build_cexe', has_executables)
] + \
ori_build_ext.sub_commands
def modifyDocScripts(self):
try:
os.mkdir("doc/sphinx")
except:
pass
build_dir_file=open("doc/sphinx/build_dir.txt","w")
print(self.build_lib,file=build_dir_file)
build_dir_file.close()
def initialize_options(self):
ori_build_ext.initialize_options(self) # @UndefinedVariable
self.littlebigman = None
self.built_files = None
def finalize_options(self):
super(build_ext, self).finalize_options()
self.set_undefined_options('littlebigman',
('littlebigman', 'littlebigman'))
self.set_undefined_options('build_files',
('files', 'built_files'))
self.cython_c_in_temp = 1
if self.littlebigman =='-DLITTLE_END':
if self.define is None:
self.define=[('LITTLE_END',None)]
else:
self.define.append('LITTLE_END',None)
def substitute_sources(self,exe_name,sources):
"""
Substitutes source file name starting by an @ by the actual
name of the built file (see --> build_files)
"""
sources = list(sources)
for i in range(len(sources)):
message = "%s :-> %s" % (exe_name,sources[i])
if sources[i][0]=='@':
try:
filename = self.built_files[sources[i][1:]]
except KeyError:
tmpfilename = os.path.join(self.build_temp,sources[i][1:])
if os.path.isfile (tmpfilename):
filename = tmpfilename
else:
raise DistutilsSetupError(
'The %s filename declared in the source '
'files of the program %s have not been '
'built by the installation process' % (sources[i],
exe_name))
sources[i]=filename
log.info("%s changed to %s",message,filename)
else:
log.info("%s ok",message)
return sources
def build_extensions(self):
# First, sanity-check the 'extensions' list
for ext in self.extensions:
ext.sources = self.substitute_sources(ext.name,ext.sources)
self.check_extensions_list(self.extensions)
print("pouic")
print(ext.sources)
print("pouac")
for ext in self.extensions:
log.info("%s :-> %s",ext.name,ext.sources)
ext.sources = self.cython_sources(ext.sources, ext)
self.build_extension(ext)
def build_extensions(self): # TODO what?? double? is it supposed to be build_extension?
if hasattr(self, 'compiler'):
_customize_compiler(self.compiler)
if hasattr(self, 'shlib_compiler'):
_customize_compiler(self.shlib_compiler)
ori_build_ext.build_extensions(self)
def run(self):
self.modifyDocScripts()
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
cython_options.annotate = True
ori_build_ext.run(self) # @UndefinedVariable
def has_files(self):
return self.distribution.has_files()
def has_executables(self):
return self.distribution.has_executables()
sub_commands = [('build_files',has_files),
('build_cexe', has_executables)
] + ori_build_ext.sub_commands
except ImportError:
from distutils.command import build_ext # @UnusedImport

View File

@ -9,12 +9,12 @@ import os.path
import glob
import sys
# try:
# from setuptools.extension import Extension
# except ImportError:
# from distutils.extension import Extension
try:
from setuptools.extension import Extension
except ImportError:
from distutils.extension import Extension
from distutils.extension import Extension
# from distutils.extension import Extension
from obidistutils.serenity.checkpackage import install_requirements,\
check_requirements, \
@ -46,11 +46,17 @@ def findCython(root,base=None,pyrexs=None):
base=[]
for module in (path.basename(path.dirname(x))
for x in glob.glob(path.join(root,'*','__init__.py'))):
for pyrex in glob.glob(path.join(root,module,'*.pyx')):
libabspath = os.path.abspath('obi_libdir')
obiabspath = os.path.abspath('.')
pyrexs.append(Extension('.'.join(base+[module,path.splitext(path.basename(pyrex))[0]]),
[pyrex]
[pyrex],
library_dirs=[libabspath],
include_dirs=[libabspath],
libraries=["obi3"],
runtime_library_dirs=[libabspath],
extra_link_args=["-Wl,-rpath,"+libabspath, "-L"+libabspath]
)
)
try:
@ -63,14 +69,16 @@ def findCython(root,base=None,pyrexs=None):
log.info("Cython module : %s",cfiles)
incdir = set(os.path.dirname(x) for x in cfiles if x[-2:]==".h")
cfiles = [x for x in cfiles if x[-2:]==".c"]
pyrexs[-1].sources.extend(cfiles)
#cfiles = [x for x in cfiles if x[-2:]==".c"]
#pyrexs[-1].sources.extend(cfiles)
pyrexs[-1].include_dirs.extend(incdir)
pyrexs[-1].extra_compile_args.extend(['-msse2',
'-Wno-unused-function',
'-Wmissing-braces',
'-Wchar-subscripts'])
'-Wchar-subscripts',
'-fPIC'
])
except IOError:
pass
@ -135,7 +143,7 @@ def setup(**attrs):
log.set_threshold(log.INFO)
minversion = attrs.get("pythonmin",'3.4')
minversion = attrs.get("pythonmin",'3.7')
maxversion = attrs.get('pythonmax',None)
fork = attrs.get('fork',False)
requirementfile = attrs.get('requirements','requirements.txt')
@ -223,4 +231,4 @@ def setup(**attrs):
from distutils.core import setup as ori_setup
ori_setup(**attrs)
return ori_setup(**attrs)

View File

@ -4,12 +4,12 @@ Created on 20 oct. 2012
@author: coissac
'''
# try:
# from setuptools.dist import Distribution as ori_Distribution
# except ImportError:
# from distutils.dist import Distribution as ori_Distribution
try:
from setuptools.dist import Distribution as ori_Distribution
except ImportError:
from distutils.dist import Distribution as ori_Distribution
from distutils.dist import Distribution as ori_Distribution
# from distutils.dist import Distribution as ori_Distribution
class Distribution(ori_Distribution):

View File

@ -81,9 +81,15 @@ def serenity_mode(package,version):
argparser.add_argument('--serenity',
dest='serenity',
action='store_true',
default=False,
default=True,
help='Switch the installer in serenity mode. Everythings are installed in a virtualenv')
argparser.add_argument('--no-serenity',
dest='serenity',
action='store_false',
default=True,
help='Switch the installer in the no serenity mode.')
argparser.add_argument('--virtualenv',
dest='virtual',
type=str,

View File

@ -0,0 +1,36 @@
'''
Created on 22 janv. 2016
@author: coissac
'''
import sys
from urllib import request
import os.path
from obidistutils.serenity.util import get_serenity_dir
from obidistutils.serenity.rerun import rerun_with_anothe_python
from obidistutils.serenity.checkpython import is_a_virtualenv_python
getpipurl="https://bootstrap.pypa.io/get-pip.py"
def bootstrap():
getpipfile=os.path.join(get_serenity_dir(),"get-pip.py")
with request.urlopen(getpipurl) as getpip:
with open(getpipfile,"wb") as out:
for l in getpip:
out.write(l)
python = sys.executable
if is_a_virtualenv_python():
command= "%s %s" % (python,getpipfile)
else:
command= "%s %s --user" % (python,getpipfile)
os.system(command)
rerun_with_anothe_python(python)

View File

@ -5,27 +5,35 @@ Created on 2 oct. 2014
'''
import re
import os
import pip # @UnresolvedImport
from pip.utils import get_installed_distributions # @UnresolvedImport
from distutils.version import StrictVersion # @UnusedImport
from distutils.errors import DistutilsError
from distutils import log
import os.path
import sys
import subprocess
class RequirementError(Exception):
pass
def is_installed(requirement):
pipcommand = os.path.join(os.path.dirname(sys.executable),'pip')
pipjson = subprocess.run([pipcommand,"list","--format=json"],
capture_output=True).stdout
packages = eval(pipjson)
requirement_project,requirement_relation,requirement_version = parse_package_requirement(requirement)
package = [x for x in get_installed_distributions() if x.project_name==requirement_project]
package = [x for x in packages if x["name"]==requirement_project]
if len(package)==1:
if requirement_version is not None and requirement_relation is not None:
rep = (len(package)==1) and eval("StrictVersion('%s') %s StrictVersion('%s')" % (package[0].version,
if ( requirement_version is not None
and requirement_relation is not None):
rep = (len(package)==1) and eval("StrictVersion('%s') %s StrictVersion('%s')" % (package[0]["version"],
requirement_relation,
requirement_version)
)
@ -39,20 +47,23 @@ def is_installed(requirement):
log.info("Look for package %s (%s%s) : ok version %s installed" % (requirement_project,
requirement_relation,
requirement_version,
package[0].version))
package[0]["version"]))
else:
log.info("Look for package %s : ok version %s installed" % (requirement_project,
package[0].version))
package[0]["version"]))
else:
if len(package)!=1:
log.info("Look for package %s (%s%s) : not installed" % (requirement_project,
requirement_relation,
requirement_version))
if requirement_version is not None and requirement_relation is not None:
log.info("Look for package %s (%s%s) : not installed" % (requirement_project,
requirement_relation,
requirement_version))
else:
log.info("Look for package %s : not installed" % requirement_project)
else:
log.info("Look for package %s (%s%s) : failed only version %s installed" % (requirement_project,
requirement_relation,
requirement_version,
package[0].version))
package[0]["version"]))
return rep
@ -81,7 +92,7 @@ def install_requirements(requirementfile='requirements.txt'):
ok = is_installed(x)
if not ok:
log.info(" Installing requirement : %s" % x)
pip_install_package(x)
pip_install_package(x,requirement=requirementfile)
install_something=True
if x[0:3]=='pip':
return True
@ -134,8 +145,9 @@ def get_package_requirement(package,requirementfile='requirements.txt'):
return None
def pip_install_package(package,directory=None,upgrade=True):
def pip_install_package(package,directory=None,requirement=None):
pipcommand = os.path.join(os.path.dirname(sys.executable),'pip')
if directory is not None:
log.info(' installing %s in directory %s' % (package,str(directory)))
@ -145,8 +157,9 @@ def pip_install_package(package,directory=None,upgrade=True):
args = ['install']
if upgrade:
args.append('--upgrade')
if requirement:
args.append('--requirement')
args.append(requirement)
if 'https_proxy' in os.environ:
args.append('--proxy=%s' % os.environ['https_proxy'])
@ -156,5 +169,7 @@ def pip_install_package(package,directory=None,upgrade=True):
args.append(package)
return pip.main(args)
pip = subprocess.run([pipcommand] + args)
return pip

View File

@ -59,7 +59,7 @@ def serenity_virtualenv(envname,package,version,minversion='3.4',maxversion=None
clear=True,
symlinks=False,
with_pip=True)
# check the newly created virtualenv
return serenity_virtualenv(envname,package,version)

View File

@ -5,7 +5,7 @@
* Author: coissac
*/
#include<stdio.h>
#include <stdio.h>
int main(int argc, char *argv[])
{

2
doc/.gitignore vendored Normal file → Executable file
View File

@ -1,3 +1,5 @@
/build/
/doxygen/
/build_dir.txt
/.DS_Store
/.gitignore

0
doc/Doxyfile Normal file → Executable file
View File

2
doc/Makefile Normal file → Executable file
View File

@ -57,7 +57,7 @@ html:
@echo "Generating Doxygen documentation..."
doxygen Doxyfile
@echo "Doxygen documentation generated. \n"
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
$(SPHINXBUILD) -b html -c ./ $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

12
doc/sphinx/source/conf.py → doc/conf.py Normal file → Executable file
View File

@ -33,7 +33,7 @@ extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.todo',
'sphinx.ext.coverage',
'sphinx.ext.pngmath',
'sphinx.ext.imgmath',
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode',
'breathe',
@ -51,7 +51,7 @@ source_suffix = '.rst'
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
master_doc = 'source/index'
# General information about the project.
project = u'OBITools3'
@ -292,7 +292,9 @@ texinfo_documents = [
#texinfo_no_detailmenu = False
#Breathe configuration
sys.path.append( "../breathe/" )
breathe_projects = { "OBITools3": "../doxygen/xml/" }
sys.path.append( "breathe/" )
breathe_projects = { "OBITools3": "doxygen/xml/" }
breathe_default_project = "OBITools3"
#breathe_projects_source = {
# "auto" : ( "../src", ["obidms.h", "obiavl.h"] )
# }

View File

@ -1,4 +0,0 @@
# Ignore everything in this directory
*
# Except this file
!.gitignore

30
doc/sphinx/source/DMS.rst → doc/source/DMS.rst Normal file → Executable file
View File

@ -13,7 +13,7 @@ Up to now, each of these categories of data were stored in separate
files, and nothing made it mandatory to keep them together.
The `Data Management System` (DMS) of OBITools3 can be regarded as a basic
The `Data Management System` (DMS) of OBITools3 can be viewed like a basic
database system.
@ -27,9 +27,7 @@ OBIDMS UML
An OBIDMS directory contains :
* one `OBIDMS history file <#obidms-history-files>`_
* Two different kinds of directories :
* OBIDMS column directories
* OBIDMS column group directories containing OBIDMS column directories
* OBIDMS column directories
OBIDMS column directories
@ -39,16 +37,9 @@ OBIDMS column directories contain :
* all the different versions of one OBIDMS column, under the form of different files (`OBIDMS column files <#obidms-column-files>`_)
* one `OBIDMS version file <#obidms-version-files>`_
The directory name is the column attribute, or sub-attribute if the column directory is in a column group directory.
The directory name is the column attribute with the extension ``.obicol``.
OBIDMS column group directories
===============================
OBIDMS column group directories contain OBIDMS column directories. They are used to store dictionary-like data, where
each key corresponds to an OBIDMS column.
The directory name is the dictionary attribute. Each key is considered a sub-attribute and is associated to its column.
Example: ``count.obicol``
OBIDMS column files
@ -57,7 +48,7 @@ OBIDMS column files
Each OBIDMS column file contains :
* a header of a size equal to a multiple of PAGESIZE (PAGESIZE being equal to 4096 bytes
on most systems) containing metadata
* one column of data with the same `OBIType <types.html#obitypes>`_
* Lines of data with the same `OBIType <types.html#obitypes>`_
Header
@ -79,7 +70,14 @@ The header of an OBIDMS column contains :
Data
----
A column of data with the same `OBIType <types.html#obitypes>`_.
A line of data corresponds to a vector of elements. Each element is associated with an element name.
Elements names are stored in the header. The correspondance between an element and its name is done
using their order in the lists of elements and elements names. This structure allows the storage of
dictionary-like data.
Example: In the header, the attribute ``elements_names`` will be associated with the value ``"sample_1;
sample_2;sample_3"``, and a line of data with the type ``OBInt_t`` will be stored as an ``OBInt_t`` vector
of size three e.g. ``5|8|4``.
Mandatory columns
@ -158,3 +156,5 @@ operations ever done in the OBIDMS directory and the views in between them :
.. image:: ./images/history.png
:width: 150 px
:align: center

BIN
doc/source/UML/OBIDMS_UML.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

View File

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 20 KiB

File diff suppressed because it is too large Load Diff

View File

4
doc/sphinx/source/data.rst → doc/source/data.rst Normal file → Executable file
View File

@ -2,8 +2,8 @@
Data in OBITools3
#################
The OBITools3 inaugure a new way to manage DNA metabarcoding data.
They rely on a `Data management System` (DMS) that can be considered as
The OBITools3 introduce a new way to manage DNA metabarcoding data.
They rely on a `Data management System` (DMS) that can be viewed like
a simplified database system.

View File

View File

@ -70,7 +70,7 @@ Tickets should always be labeled with the branches for which they are relevant.
Documentation
*************
C functions are documented in the header files.
C functions are documented in the header files for public functions, and in the source file for private functions.
**************
@ -92,7 +92,7 @@ C99 :
* Object layer
* OBITools3 library
`Python 3 <https://www.python.org/>`_ :
`Python 3.5 <https://www.python.org/>`_ :
* Top layer code (scripts)
For the documentation, `Sphinx <http://sphinx-doc.org/>`_ should be used for both the original
@ -111,6 +111,8 @@ Enum members, macros, constants: ``ALL_CAPS``
Functions, local variables: ``lower_case``
Public functions: ``obi_lower_case``
Functions that shouldn't be called directly: ``_lower_case`` (``_`` prefix)
Global variables: ``g_lower_case`` (``g_`` prefix)
@ -120,9 +122,6 @@ Pointers: ``pointer_ptr`` (``_ptr`` suffi
.. note::
Underscores are used to delimit 'words'.
.. todo::
``obi_function`` for public functions names?
*****************
Programming rules

View File

Before

Width:  |  Height:  |  Size: 17 KiB

After

Width:  |  Height:  |  Size: 17 KiB

View File

Before

Width:  |  Height:  |  Size: 48 KiB

After

Width:  |  Height:  |  Size: 48 KiB

3
doc/sphinx/source/index.rst → doc/source/index.rst Normal file → Executable file
View File

@ -11,8 +11,7 @@ OBITools3 documentation
Programming guidelines <guidelines>
Data structures <data>
Pistes de reflexion <pistes>
Code documentation <code_doc/codedoc>
Indices and tables
------------------

View File

@ -7,13 +7,16 @@ NA values
=========
All OBITypes have an associated NA (Not Available) value.
NA values are implemented by specifying an explicit NA value for each type, corresponding to the R standards:
NA values are implemented by specifying an explicit NA value for each type,
corresponding to the R standards as much as possible:
* For the types ``OBIInt_t``, ``OBIBool_t``, ``OBIIdx_t`` and ``OBITaxid_t``, the NA value is ``INT_MIN``.
* For the type ``OBIInt_t``, the NA value is ``INT_MIN``.
* For the type ``OBIChar_t``: the NA value is ``\0`` (?).
* For the type ``OBIBool_t``, the NA value is ``2``.
* For the type ``OBIStr_t`` : the NA value is ``\0`` (?).
* For the type ``OBIIdx_t`` and ``OBITaxid_t``, the NA value is ``SIZE_MAX``.
* For the type ``OBIChar_t``: the NA value is ``\0``.
* For the type ``OBIFloat_t``::
@ -29,7 +32,7 @@ NA values are implemented by specifying an explicit NA value for each type, corr
x.word[hw] = 0x7ff00000;
x.word[lw] = 1954;
return x.value;
}
}
Minimum and maximum values for ``OBIInt_t``

8
doc/sphinx/source/types.rst → doc/source/types.rst Normal file → Executable file
View File

@ -4,14 +4,18 @@ OBITypes
.. image:: ./UML/OBITypes_UML.png
:download:`html version of the OBITypes UML file <UML/OBITypes_UML.class.violet.html>`
.. image:: ./UML/Obicolumn_classes_UML.png
:download:`html version of the OBIDMS classes UML file <UML/Obicolumn_classes_UML.class.violet.html>`
.. toctree::
:maxdepth: 2
The elementary types <elementary>
The containers <containers>
Special values <specialvalues>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 66 KiB

File diff suppressed because it is too large Load Diff

View File

@ -1,23 +0,0 @@
###################
Pistes de reflexion
###################
******************************
Ce que l'on veut pouvoir faire
******************************
* Gerer les valeurs manquantes
* Modifier une colonne en cours d'ecriture (mmap)
* Ajouter des valeurs a la fin du fichier d'une colonne en cours d'ecriture (mmap)
******
Divers
******
* Si l'ordre d'une colonne est change, elle est reecrite (pas d'index).
* Utilisation de semaphores pour la lecture
* Utilisation de tas pour l'indexation des chaines de caracteres. Chaque colonne dont
le type est OBIStr_t est stockee dans 3 fichiers : un fichier contenant les chaines, un
fichier contenant les index, et un fichier contenant le tas.

71
python/obi.py Executable file
View File

@ -0,0 +1,71 @@
#!/usr/local/bin/python3.4
'''
obi -- shortdesc
obi is a description
It defines classes_and_methods
@author: user_name
@copyright: 2014 organization_name. All rights reserved.
@license: license
@contact: user_email
@deffield updated: Updated
'''
default_config = { 'software' : "The OBITools",
'log' : False,
'loglevel' : 'INFO',
'progress' : True,
'inputURI' : None,
'outputURI' : None,
'defaultdms' : None,
'inputview' : None,
'outputview' : None,
'skip' : 0,
'only' : None,
'fileformat' : None,
'skiperror' : True,
'qualityformat' : b'sanger',
'offset' : -1,
'noquality' : False,
'seqtype' : b'nuc',
"header" : False,
"sep" : None,
"quote" : [b"'",b'"'],
"dec" : b".",
"nastring" : b"NA",
"stripwhite" : True,
"blanklineskip" : True,
"commentchar" : b"#",
"nocreatedms" : False
}
root_config_name='obi'
from obitools3.apps.config import getConfiguration # @UnresolvedImport
from obitools3.version import version
__all__ = []
__version__ = version
__date__ = '2014-09-28'
__updated__ = '2014-09-28'
DEBUG = 1
TESTRUN = 0
PROFILE = 0
if __name__ =="__main__":
config = getConfiguration(root_config_name,
default_config)
config[root_config_name]['module'].run(config)

0
python/obitools3/__init__.py Normal file → Executable file
View File

0
python/obitools3/__init__.pyc Normal file → Executable file
View File

View File

@ -0,0 +1,110 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,3 @@
#cython: language_level=3
cpdef buildArgumentParser(str configname, str softname)

View File

@ -0,0 +1,62 @@
#cython: language_level=3
'''
Created on 27 mars 2016
@author: coissac
'''
import argparse
import sys
from .command import getCommandsList
class ObiParser(argparse.ArgumentParser):
def error(self, message):
sys.stderr.write('error: %s\n' % message)
self.print_help()
sys.exit(2)
cpdef buildArgumentParser(str configname,
str softname):
parser = ObiParser()
parser.add_argument('--version', dest='%s:version' % configname,
action='store_true',
default=False,
help='Print the version of %s' % softname)
parser.add_argument('--log', dest='%s:log' % configname,
action='store',
type=str,
default=None,
help='Create a logfile')
parser.add_argument('--no-progress', dest='%s:progress' % configname,
action='store_false',
default=None,
help='Do not print the progress bar during analyzes')
subparsers = parser.add_subparsers(title='subcommands',
description='valid subcommands',
help='additional help')
commands = getCommandsList()
for c in commands:
module = commands[c]
if hasattr(module, "run"):
if hasattr(module, "__title__"):
sub = subparsers.add_parser(c,help=module.__title__)
else:
sub = subparsers.add_parser(c)
if hasattr(module, "addOptions"):
module.addOptions(sub)
sub.set_defaults(**{'%s:module' % configname : module})
sub.set_defaults(**{'%s:modulename' % configname : c})
return parser

View File

@ -0,0 +1,110 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,3 @@
#cython: language_level=3
cdef object loadCommand(str name,loader)

View File

@ -0,0 +1,44 @@
#cython: language_level=3
'''
Created on 27 mars 2016
@author: coissac
'''
import pkgutil
from obitools3 import commands
cdef object loadCommand(str name,loader):
'''
Load a command module from its name and an ImpLoader
This function is for internal use
@param name: name of the module
@type name: str
@param loader: the module loader
@type loader: ImpLoader
@return the loaded module
@rtype: module
'''
module = loader.find_module(name).load_module(name)
return module
def getCommandsList():
'''
Returns the list of sub-commands available to the main `obi` command
@return: a dict instance with key corresponding to each command and
value corresponding to the module
@rtype: dict
'''
cdef dict cmds = dict((x[1],loadCommand(x[1],x[0]))
for x in pkgutil.iter_modules(commands.__path__)
if not x[2])
return cmds

View File

@ -0,0 +1,110 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,10 @@
#cython: language_level=3
cpdef str setRootConfigName(str rootname)
cpdef str getRootConfigName()
cdef dict buildDefaultConfiguration(str root_config_name,
dict config)
cpdef dict getConfiguration(str root_config_name=?,
dict config=?)

114
python/obitools3/apps/config.pyx Executable file
View File

@ -0,0 +1,114 @@
#cython: language_level=3
'''
Created on 27 mars 2016
@author: coissac
'''
import sys
from .command import getCommandsList
from .logging cimport getLogger
from .arguments cimport buildArgumentParser
from ..version import version
from _curses import version
cdef dict __default_config__ = {}
cpdef str setRootConfigName(str rootname):
global __default_config__
if '__root_config__' in __default_config__:
if __default_config__["__root_config__"] in __default_config__:
__default_config__[rootname]=__default_config__[__default_config__["__root_config__"]]
del __default_config__[__default_config__["__root_config__"]]
__default_config__['__root_config__']=rootname
return rootname
cpdef str getRootConfigName():
global __default_config__
return __default_config__.get('__root_config__',None)
cdef dict buildDefaultConfiguration(str root_config_name,
dict config):
global __default_config__
__default_config__.clear()
setRootConfigName(root_config_name)
__default_config__[root_config_name]=config
config['version']=version
commands = getCommandsList()
for c in commands:
module = commands[c]
assert hasattr(module, "run")
if hasattr(module, 'default_config'):
__default_config__[c]=module.default_config
else:
__default_config__[c]={}
return __default_config__
cpdef dict getConfiguration(str root_config_name="__default__",
dict config={}):
global __default_config__
if '__done__' in __default_config__:
return __default_config__
if root_config_name=="__default__":
raise RuntimeError("No root_config_name specified")
if not config:
raise RuntimeError("Base configuration is empty")
config = buildDefaultConfiguration(root_config_name,
config)
parser = buildArgumentParser(root_config_name,
config[root_config_name]['software'])
options = vars(parser.parse_args())
if options['%s:version' % root_config_name]:
print("%s - Version %s" % (config[root_config_name]['software'],
config[root_config_name]['version']))
sys.exit(0)
for k in options:
section,key = k.split(':')
s = config[section]
if options[k] is not None:
s[key]=options[k]
if not 'module' in config[root_config_name]:
print('\nError: No command specified',file=sys.stderr)
parser.print_help()
sys.exit(2)
getLogger(config)
config['__done__']=True
return config
def logger(level, *messages):
try:
config=getConfiguration()
root = config["__root_config__"]
l = config[root]['logger']
if config[root]['verbose']:
getattr(l, level)(*messages)
except:
print(*messages,file=sys.stderr)

View File

@ -0,0 +1,110 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,3 @@
#cython: language_level=3
cpdef getLogger(dict config)

View File

@ -0,0 +1,48 @@
#cython: language_level=3
'''
Created on 27 mars 2016
@author: coissac
'''
import logging
import sys
cpdef getLogger(dict config):
'''
Returns the logger as defined by the command line option
or by the config file
:param config:
'''
root = config["__root_config__"]
level = config[root]['loglevel']
logfile= config[root]['log']
rootlogger = logging.getLogger()
logFormatter = logging.Formatter("%%(asctime)s [%s : %%(levelname)-5.5s] %%(message)s" % config[root]['modulename'])
stderrHandler = logging.StreamHandler(sys.stderr)
stderrHandler.setFormatter(logFormatter)
rootlogger.addHandler(stderrHandler)
if logfile:
fileHandler = logging.FileHandler(logfile)
fileHandler.setFormatter(logFormatter)
rootlogger.addHandler(fileHandler)
try:
loglevel = getattr(logging, level)
except:
loglevel = logging.INFO
rootlogger.setLevel(loglevel)
config[root]['logger']=rootlogger
config[root]['verbose']=True
return rootlogger

View File

@ -0,0 +1,272 @@
def __addInputOption(optionManager):
optionManager.add_argument(
dest='obi:inputURI',
metavar='INPUT',
help='Data source URI')
group = optionManager.add_argument_group("Restriction to a sub-part options",
"Allows to limit analysis to a sub-part of the input")
group.add_argument('--skip',
action="store", dest="obi:skip",
metavar='<N>',
default=None,
type=int,
help="skip the N first sequences")
group.add_argument('--only',
action="store", dest="obi:only",
metavar='<N>',
default=None,
type=int,
help="treat only N sequences")
def __addImportInputOption(optionManager):
group = optionManager.add_argument_group("Input format options for imported files")
group.add_argument('--fasta-input',
action="store_const", dest="obi:inputformat",
default=None,
const=b'fasta',
help="Input file is in sanger fasta format")
group.add_argument('--fastq-input',
action="store_const", dest="obi:inputformat",
default=None,
const=b'fastq',
help="Input file is in fastq format")
group.add_argument('--embl-input',
action="store_const", dest="obi:inputformat",
default=None,
const=b'embl',
help="Input file is in embl nucleic format")
group.add_argument('--genbank-input',
action="store_const", dest="obi:inputformat",
default=None,
const=b'genbank',
help="Input file is in genbank nucleic format")
group.add_argument('--ngsfilter-input',
action="store_const", dest="obi:inputformat",
default=None,
const=b'ngsfilter',
help="Input file is an ngsfilter file")
group.add_argument('--ecopcr-result-input',
action="store_const", dest="obi:inputformat",
default=None,
const=b'ecopcr',
help="Input file is the result of an ecoPCR (version 2)")
group.add_argument('--ecoprimers-result-input',
action="store_const", dest="obi:inputformat",
default=None,
const=b'ecoprimers',
help="Input file is the result of an ecoprimers")
group.add_argument('--tabular-input',
action="store_const", dest="obi:inputformat",
default=None,
const=b'tabular',
help="Input file is a tabular file")
group.add_argument('--no-skip-on-error',
action="store_false", dest="obi:skiperror",
default=True,
help="Don't skip sequence entries with parsing errors (default: they are skipped)")
group.add_argument('--no-quality',
action="store_true", dest="obi:noquality",
default=False,
help="Do not import fastQ quality")
group.add_argument('--quality-sanger',
action="store_const", dest="obi:qualityformat",
default=None,
const=b'sanger',
help="Fastq quality is encoded following sanger format (standard fastq)")
group.add_argument('--quality-solexa',
action="store_const", dest="obi:qualityformat",
default=None,
const=b'solexa',
help="Fastq quality is encoded following solexa sequencer format")
group.add_argument('--nuc',
action="store_const", dest="obi:moltype",
default=None,
const=b'nuc',
help="Input file contains nucleic sequences")
group.add_argument('--prot',
action="store_const", dest="obi:moltype",
default=None,
const=b'pep',
help="Input file contains protein sequences")
group.add_argument('--input-na-string',
action="store", dest="obi:inputnastring",
default="NA",
type=str,
help="String associated with Non Available (NA) values in the input")
def __addTabularInputOption(optionManager):
group = optionManager.add_argument_group("Input format options for tabular files")
group.add_argument('--header',
action="store_true", dest="obi:header",
default=False,
help="First line of tabular file contains column names")
group.add_argument('--sep',
action="store", dest="obi:sep",
default=None,
type=str,
help="Column separator")
group.add_argument('--dec',
action="store", dest="obi:dec",
default=".",
type=str,
help="Decimal separator")
group.add_argument('--strip-white',
action="store_false", dest="obi:stripwhite",
default=True,
help="Remove white chars at the beginning and the end of values")
group.add_argument('--blank-line-skip',
action="store_false", dest="obi:blanklineskip",
default=True,
help="Skip empty lines")
group.add_argument('--comment-char',
action="store", dest="obi:commentchar",
default="#",
type=str,
help="Lines starting by this char are considered as comment")
def __addTaxdumpInputOption(optionManager): # TODO maybe not the best way to do it
group = optionManager.add_argument_group("Input format options for taxdump")
group.add_argument('--taxdump',
action="store_true", dest="obi:taxdump",
default=False,
help="Whether the input is a taxdump")
def __addTaxonomyOption(optionManager):
group = optionManager.add_argument_group("Input format options for taxonomy")
group.add_argument('--taxonomy',
action="store", dest="obi:taxoURI",
default=None,
help="Taxonomy URI")
#TODO option bool to download taxo if URI doesn't exist
def addMinimalInputOption(optionManager):
__addInputOption(optionManager)
def addImportInputOption(optionManager):
__addInputOption(optionManager)
__addImportInputOption(optionManager)
def addTabularInputOption(optionManager):
__addTabularInputOption(optionManager)
def addTaxonomyOption(optionManager):
__addTaxonomyOption(optionManager)
def addTaxdumpInputOption(optionManager):
__addTaxdumpInputOption(optionManager)
def addAllInputOption(optionManager):
__addInputOption(optionManager)
__addImportInputOption(optionManager)
__addTabularInputOption(optionManager)
__addTaxonomyOption(optionManager)
__addTaxdumpInputOption(optionManager)
def __addOutputOption(optionManager):
optionManager.add_argument(
dest='obi:outputURI',
metavar='OUTPUT',
help='Data destination URI')
def __addDMSOutputOption(optionManager):
group = optionManager.add_argument_group("Output options for DMS data")
group.add_argument('--no-create-dms',
action="store_true", dest="obi:nocreatedms",
default=False,
help="Don't create an output DMS is it is not existing")
group.add_argument('--max-elts',
action="store", dest="obi:maxelts",
metavar='<N>',
default=1000,
type=int,
help="Maximum number of elements per line in a column "
"(e.g. the number of different keys in a dictionary-type "
"key from sequence headers). If the number of different keys "
"is greater than N, the values are stored as character strings")
def __addExportOutputOption(optionManager):
group = optionManager.add_argument_group("Output format options for exported files")
group.add_argument('--fasta-output',
action="store_const", dest="obi:outputformat",
default=None,
const=b'fasta',
help="Output file is in sanger fasta format")
group.add_argument('--fastq-output',
action="store_const", dest="obi:outputformat",
default=None,
const=b'fastq',
help="Output file is in fastq format")
group.add_argument('--print-na',
action="store_true", dest="obi:printna",
default=False,
help="Print Non Available (NA) values in the output")
group.add_argument('--output-na-string',
action="store", dest="obi:outputnastring",
default="NA",
type=str,
help="String associated with Non Available (NA) values in the output")
def addMinimalOutputOption(optionManager):
__addOutputOption(optionManager)
__addDMSOutputOption(optionManager)
def addExportOutputOption(optionManager):
__addOutputOption(optionManager)
__addExportOutputOption(optionManager)
def addAllOutputOption(optionManager):
__addOutputOption(optionManager)
__addDMSOutputOption(optionManager)
__addExportOutputOption(optionManager)

View File

@ -0,0 +1,110 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,65 @@
#cython: language_level=3
cdef extern from "stdio.h":
struct FILE
int fprintf(FILE *stream, char *format, ...)
int fputs(char *string, FILE *stream)
FILE* stderr
ctypedef unsigned int off_t "unsigned long long"
cdef extern from "unistd.h":
int fsync(int fd);
cdef extern from "time.h":
struct tm :
int tm_yday
int tm_hour
int tm_min
int tm_sec
enum: CLOCKS_PER_SEC
ctypedef int time_t
ctypedef int clock_t
ctypedef int suseconds_t
struct timeval:
time_t tv_sec # seconds */
suseconds_t tv_usec # microseconds */
struct timezone :
int tz_minuteswest; # minutes west of Greenwich
int tz_dsttime; # type of DST correction
int gettimeofday(timeval *tv, timezone *tz)
tm *gmtime_r(time_t *clock, tm *result)
time_t time(time_t *tloc)
clock_t clock()
cdef class ProgressBar:
cdef off_t maxi
cdef clock_t starttime
cdef clock_t lasttime
cdef clock_t tickcount
cdef int freq
cdef int cycle
cdef int arrow
cdef int lastlog
cdef bint ontty
cdef int fd
cdef bint cut
cdef bytes _head
cdef char *chead
cdef object logger
cdef char *wheel
cdef char *spaces
cdef char* diese
cdef clock_t clock(self)

View File

@ -0,0 +1,157 @@
#cython: language_level=3
'''
Created on 27 mars 2016
@author: coissac
'''
from ..utils cimport str2bytes, bytes2str
from .config cimport getConfiguration
import sys
cdef class ProgressBar:
cdef clock_t clock(self):
cdef clock_t t
cdef timeval tp
cdef clock_t s
<void> gettimeofday(&tp,NULL)
s = <clock_t> (<double> tp.tv_usec * 1.e-6 * <double> CLOCKS_PER_SEC)
t = tp.tv_sec * CLOCKS_PER_SEC + s
return t
def __init__(self,
off_t maxi,
dict config={},
str head="",
double seconde=0.1,
cut=False):
self.starttime = self.clock()
self.lasttime = self.starttime
self.tickcount = <clock_t> (seconde * CLOCKS_PER_SEC)
self.freq = 1
self.cycle = 0
self.arrow = 0
self.lastlog = 0
if not config:
config=getConfiguration()
self.ontty = sys.stderr.isatty()
if (maxi<=0):
maxi=1
self.maxi = maxi
self.head = head
self.chead = self._head
self.cut = cut
self.logger=config[config["__root_config__"]]["logger"]
self.wheel = '|/-\\'
self.spaces=' ' \
' ' \
' ' \
' ' \
' '
self.diese ='##########' \
'##########' \
'##########' \
'##########' \
'##########'
def __call__(self, object pos, bint force=False):
cdef off_t ipos
cdef clock_t elapsed
cdef clock_t newtime
cdef clock_t delta
cdef clock_t more
cdef double percent
cdef tm remain
cdef int days,hour,minu,sec
cdef off_t fraction
cdef int twentyth
self.cycle+=1
if self.cycle % self.freq == 0 or force:
self.cycle=1
newtime = self.clock()
delta = newtime - self.lasttime
self.lasttime = newtime
elapsed = newtime - self.starttime
# print(" ",delta,elapsed,elapsed/CLOCKS_PER_SEC,self.tickcount)
if delta < self.tickcount / 5 :
self.freq*=2
elif delta > self.tickcount * 5 and self.freq>1:
self.freq/=2
if callable(pos):
ipos=pos()
else:
ipos=pos
if ipos==0:
ipos=1
percent = <double>ipos/<double>self.maxi
more = <time_t>((<double>elapsed / percent * (1. - percent))/CLOCKS_PER_SEC)
<void>gmtime_r(&more, &remain)
days = remain.tm_yday
hour = remain.tm_hour
minu = remain.tm_min
sec = remain.tm_sec
if self.ontty:
fraction=<int>(percent * 50.)
self.arrow=(self.arrow+1) % 4
if days:
<void>fprintf(stderr,b'\r%s %5.1f %% |%.*s%c%.*s] remain : %d days %02d:%02d:%02d\033[K',
self.chead,
percent*100,
fraction,self.diese,
self.wheel[self.arrow],
50-fraction,self.spaces,
days,hour,minu,sec)
else:
<void>fprintf(stderr,b'\r%s %5.1f %% |%.*s%c%.*s] remain : %02d:%02d:%02d\033[K',
self.chead,
percent*100.,
fraction,self.diese,
self.wheel[self.arrow],
50-fraction,self.spaces,
hour,minu,sec)
if self.cut:
tenth = int(percent * 10)
if tenth != self.lastlog:
if self.ontty:
<void>fputs(b'\n',stderr)
self.logger.info('%s %5.1f %% remain : %02d:%02d:%02d\033[K' % (
bytes2str(self._head),
percent*100.,
hour,minu,sec))
self.lastlog=tenth
else:
self.cycle+=1
property head:
def __get__(self):
return self._head
def __set__(self,str value):
self._head=str2bytes(value)
self.chead=self._head

View File

@ -0,0 +1,110 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

8
python/obitools3/apps/temp.pxd Executable file
View File

@ -0,0 +1,8 @@
#cython: language_level=3
'''
Created on 28 juillet 2017
@author: coissac
'''

99
python/obitools3/apps/temp.pyx Executable file
View File

@ -0,0 +1,99 @@
#cython: language_level=3
'''
Created on 28 juillet 2017
@author: coissac
'''
from os import environb,getpid
from os.path import join, isdir
from tempfile import TemporaryDirectory, _get_candidate_names
from shutil import rmtree
from atexit import register
from obitools3.dms.dms import DMS
from obitools3.apps.config import getConfiguration
from obitools3.apps.config import logger
from obitools3.dms.dms cimport DMS
from obitools3.utils cimport tobytes,tostr
cpdef get_temp_dir():
"""
Returns a temporary directory object specific of this instance of obitools.
This is an application function. It cannot be called out of an obi command.
It requires a valid configuration.
If the function is called several time from the same obi session, the same
directory is returned.
If the OBITMP environment variable exist, the temporary directory is created
inside this directory.
The directory is automatically destroyed at the end of the end of the process.
@return: a temporary python directory object.
"""
cdef bytes tmpdirname
cdef dict config = getConfiguration()
root = config["__root_config__"]
try:
return config[root]["tempdir"].name
except KeyError:
pass
try:
basedir=environb[b'OBITMP']
except KeyError:
basedir=None
tmp = TemporaryDirectory(dir=basedir)
config[root]["tempdir"]=tmp
return tmp.name
cpdef get_temp_dir_name():
"""
Returns the name of the temporary directory object
specific of this instance of obitools.
@return: the name of the temporary directory.
@see get_temp_dir
"""
return get_temp_dir_name().name
cpdef get_temp_dms():
cdef bytes tmpdirname # @DuplicatedSignature
cdef dict config = getConfiguration() # @DuplicatedSignature
cdef DMS tmpdms
root = config["__root_config__"]
try:
return config[root]["tempdms"]
except KeyError:
pass
tmpdirname=get_temp_dir()
tempname = join(tmpdirname,
b"obi.%d.%s" % (getpid(),
tobytes(next(_get_candidate_names())))
)
tmpdms = DMS.new(tempname)
config[root]["tempdms"]=tmpdms
return tmpdms

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,18 @@
#cython: language_level=3
cpdef align_columns(bytes dms_n,
bytes input_view_1_n,
bytes output_view_n,
bytes input_view_2_n=*,
bytes input_column_1_n=*,
bytes input_column_2_n=*,
bytes input_elt_1_n=*,
bytes input_elt_2_n=*,
bytes id_column_1_n=*,
bytes id_column_2_n=*,
double threshold=*, bint normalize=*,
int reference=*, bint similarity_mode=*,
bint print_seq=*, bint print_count=*,
bytes comments=*,
int thread_count=*)

View File

@ -0,0 +1,274 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.dms.view.view cimport View
from obitools3.uri.decode import open_uri
from obitools3.apps.optiongroups import addMinimalInputOption, addMinimalOutputOption
from obitools3.dms.view import RollbackException
from obitools3.apps.config import logger
from obitools3.utils cimport tobytes, str2bytes
from obitools3.dms.capi.obilcsalign cimport obi_lcs_align_one_column, \
obi_lcs_align_two_columns
import time
import sys
__title__="Aligns one sequence column with itself or two sequence columns"
def addOptions(parser):
addMinimalInputOption(parser)
addMinimalOutputOption(parser)
group=parser.add_argument_group('obi align specific options')
group.add_argument('--input-2', '-I',
action="store", dest="align:inputuri2",
metavar='<INPUT URI>',
default="",
type=str,
help="Eventually, the URI of the second input to align with the first one.")
group.add_argument('--threshold','-t',
action="store", dest="align:threshold",
metavar='<THRESHOLD>',
default=0.0,
type=float,
help="Score threshold. If the score is normalized and expressed in similarity (default),"
" it is an identity, e.g. 0.95 for an identity of 95%%. If the score is normalized"
" and expressed in distance, it is (1.0 - identity), e.g. 0.05 for an identity of 95%%."
" If the score is not normalized and expressed in similarity, it is the length of the"
" Longest Common Subsequence. If the score is not normalized and expressed in distance,"
" it is (reference length - LCS length)."
" Only sequence pairs with a similarity above <THRESHOLD> are printed. Default: 0.00"
" (no threshold).")
group.add_argument('--longest-length','-L',
action="store_const", dest="align:reflength",
default=0,
const=1,
help="The reference length is the length of the longest sequence."
" Default: the reference length is the length of the alignment.")
group.add_argument('--shortest-length','-l',
action="store_const", dest="align:reflength",
default=0,
const=2,
help="The reference length is the length of the shortest sequence."
" Default: the reference length is the length of the alignment.")
group.add_argument('--raw','-r',
action="store_false", dest="align:normalize",
default=True,
help="Raw score, not normalized. Default: score is normalized with the reference sequence length.")
group.add_argument('--distance','-D',
action="store_false", dest="align:similarity",
default=True,
help="Score is expressed in distance. Default: score is expressed in similarity.")
group.add_argument('--print-seq','-s',
action="store_true", dest="align:printseq",
default=False,
help="The nucleotide sequences are written in the output view. Default: they are not written.")
group.add_argument('--print-count','-n',
action="store_true", dest="align:printcount",
default=False,
help="Sequence counts are written in the output view. Default: they are not written.")
group.add_argument('--thread-count','-p', # TODO should probably be in a specific option group
action="store", dest="align:threadcount",
metavar='<THREAD COUNT>',
default=1,
type=int,
help="Number of threads to use for the computation. Default: one.")
cpdef align_columns(bytes dms_n,
bytes input_view_1_n,
bytes output_view_n,
bytes input_view_2_n=b"",
bytes input_column_1_n=b"",
bytes input_column_2_n=b"",
bytes input_elt_1_n=b"",
bytes input_elt_2_n=b"",
bytes id_column_1_n=b"",
bytes id_column_2_n=b"",
double threshold=0.0, bint normalize=True,
int reference=0, bint similarity_mode=True,
bint print_seq=False, bint print_count=False,
bytes comments=b"{}",
int thread_count=1) :
if input_view_2_n == b"" and input_column_2_n == b"" :
if obi_lcs_align_one_column(dms_n, \
input_view_1_n, \
input_column_1_n, \
input_elt_1_n, \
id_column_1_n, \
output_view_n, \
comments, \
print_seq, \
print_count, \
threshold, normalize, reference, similarity_mode,
thread_count) < 0 :
raise Exception("Error aligning sequences")
else:
if obi_lcs_align_two_columns(dms_n, \
input_view_1_n, \
input_view_2_n, \
input_column_1_n, \
input_column_2_n, \
input_elt_1_n, \
input_elt_2_n, \
id_column_1_n, \
id_column_2_n, \
output_view_n, \
comments, \
print_seq, \
print_count, \
threshold, normalize, reference, similarity_mode) < 0 :
raise Exception("Error aligning sequences")
def run(config):
DMS.obi_atexit()
logger("info", "obi align")
# Open the input: only the DMS
input = open_uri(config['obi']['inputURI'],
dms_only=True)
if input is None:
raise Exception("Could not read input")
i_dms = input[0]
i_dms_name = input[0].name
i_uri = input[1]
i_view_name = i_uri.split(b"/")[0]
i_column_name = b""
i_element_name = b""
if len(i_uri.split(b"/")) == 2:
i_column_name = i_uri.split(b"/")[1]
if len(i_uri.split(b"/")) == 3:
i_element_name = i_uri.split(b"/")[2]
if len(i_uri.split(b"/")) > 3:
raise Exception("Input URI contains too many elements:", config['obi']['inputURI'])
# Open the second input if there is one
i_dms_2 = None
i_dms_name_2 = b""
original_i_view_name_2 = b""
i_view_name_2 = b""
i_column_name_2 = b""
i_element_name_2 = b""
if config['align']['inputuri2']:
input_2 = open_uri(config['align']['inputuri2'],
dms_only=True)
if input_2 is None:
raise Exception("Could not read second input")
i_dms_2 = input_2[0]
i_dms_name_2 = i_dms_2.name
i_uri_2 = input_2[1]
original_i_view_name_2 = i_uri_2.split(b"/")[0]
if len(i_uri_2.split(b"/")) == 2:
i_column_name_2 = i_uri_2.split(b"/")[1]
if len(i_uri_2.split(b"/")) == 3:
i_element_name_2 = i_uri_2.split(b"/")[2]
if len(i_uri_2.split(b"/")) > 3:
raise Exception("Input URI contains too many elements:", config['align']['inputuri2'])
# If the 2 input DMS are not the same, temporarily import 2nd input view in first input DMS
if i_dms != i_dms_2:
temp_i_view_name_2 = original_i_view_name_2
i=0
while temp_i_view_name_2 in i_dms: # Making sure view name is unique in input DMS
temp_i_view_name_2 = original_i_view_name_2+b"_"+str2bytes(str(i))
i+=1
i_view_name_2 = temp_i_view_name_2
View.import_view(i_dms_2.full_path[:-7], i_dms.full_path[:-7], original_i_view_name_2, i_view_name_2)
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output")
o_dms = output[0]
o_dms_name = o_dms.name
final_o_view_name = output[1]
# If the input and output DMS are not the same, align creating a temporary view in the input dms that will be exported to
# the right DMS and deleted in the other afterwards.
if i_dms != o_dms:
temporary_view_name = final_o_view_name
i=0
while temporary_view_name in i_dms: # Making sure view name is unique in input DMS
temporary_view_name = final_o_view_name+b"_"+str2bytes(str(i))
i+=1
o_view_name = temporary_view_name
else:
o_view_name = final_o_view_name
# Save command config in View comments
command_line = " ".join(sys.argv[1:])
i_dms_list = [i_dms_name]
if i_dms_name_2:
i_dms_list.append(i_dms_name_2)
i_view_list = [i_view_name]
if original_i_view_name_2:
i_view_list.append(original_i_view_name_2)
comments = View.print_config(config, "align", command_line, input_dms_name=i_dms_list, input_view_name=i_view_list)
# Call cython alignment function
# Using default ID columns of the view. TODO discuss adding option
align_columns(i_dms_name, \
i_view_name, \
o_view_name, \
input_view_2_n = i_view_name_2, \
input_column_1_n = i_column_name, \
input_column_2_n = i_column_name_2, \
input_elt_1_n = i_element_name, \
input_elt_2_n = i_element_name_2, \
id_column_1_n = b"", \
id_column_2_n = b"", \
threshold = config['align']['threshold'], \
normalize = config['align']['normalize'], \
reference = config['align']['reflength'], \
similarity_mode = config['align']['similarity'], \
print_seq = config['align']['printseq'], \
print_count = config['align']['printcount'], \
comments = comments, \
thread_count = config['align']['threadcount'])
# If the input and output DMS are not the same, export result view to output DMS
if i_dms != o_dms:
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], o_view_name, final_o_view_name)
# Save command config in output DMS comments
o_dms.record_command_line(command_line)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_dms[final_o_view_name]), file=sys.stderr)
# If the two input DMS are different, delete the temporary input view in the first input DMS
if i_dms_2 and i_dms != i_dms_2:
View.delete_view(i_dms, i_view_name_2)
i_dms_2.close()
# If the input and the output DMS are different, delete the temporary result view in the input DMS
if i_dms != o_dms:
View.delete_view(i_dms, o_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,4 @@
#cython: language_level=3
cdef object buildAlignment(object direct, object reverse)

View File

@ -0,0 +1,249 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.dms.view import RollbackException
from obitools3.dms.view.typed_view.view_NUC_SEQS cimport View_NUC_SEQS
from obitools3.dms.column.column cimport Column
from obitools3.dms.capi.obiview cimport QUALITY_COLUMN
from obitools3.dms.capi.obitypes cimport OBI_QUAL
from obitools3.apps.optiongroups import addMinimalInputOption, addMinimalOutputOption
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.libalign._qsassemble import QSolexaReverseAssemble
from obitools3.libalign._qsrassemble import QSolexaRightReverseAssemble
from obitools3.libalign._solexapairend import buildConsensus, buildJoinedSequence
from obitools3.dms.obiseq cimport Nuc_Seq
from obitools3.libalign.shifted_ali cimport Kmer_similarity, Ali_shifted
from obitools3.commands.ngsfilter import REVERSE_SEQ_COLUMN_NAME, REVERSE_QUALITY_COLUMN_NAME
import sys
import os
__title__="Aligns paired-ended reads"
def addOptions(parser):
addMinimalInputOption(parser)
addMinimalOutputOption(parser)
group = parser.add_argument_group('obi alignpairedend specific options')
group.add_argument('-R', '--reverse-reads',
action="store", dest="alignpairedend:reverse",
metavar="<URI>",
default=None,
type=str,
help="URI to the reverse reads if they are in a different view than the forward reads")
group.add_argument('--score-min',
action="store", dest="alignpairedend:smin",
metavar="#.###",
default=None,
type=float,
help="Minimum score for keeping alignments")
group.add_argument('-A', '--true-ali',
action="store_true", dest="alignpairedend:trueali",
default=False,
help="Performs gap free end alignment of sequences instead of using kmers to compute alignments (slower).")
group.add_argument('-k', '--kmer-size',
action="store", dest="alignpairedend:kmersize",
metavar="#",
default=3,
type=int,
help="K-mer size for kmer comparisons, between 1 and 4 (not when using -A option; default: 3)")
la = QSolexaReverseAssemble()
ra = QSolexaRightReverseAssemble()
cdef object buildAlignment(object direct, object reverse):
if len(direct)==0 or len(reverse)==0:
return None
la.seqA = direct
la.seqB = reverse
ali=la()
ali.direction='left'
ra.seqA = direct
ra.seqB = reverse
rali=ra()
rali.direction='right'
if ali.score < rali.score:
ali = rali
return ali
def alignmentIterator(entries, aligner):
if type(entries) == list:
two_views = True
forward = entries[0]
reverse = entries[1]
entries_len = len(forward)
else:
two_views = False
entries_len = len(entries)
for i in range(entries_len):
if two_views:
seqF = forward[i]
seqR = reverse[i]
else:
seqF = Nuc_Seq.new_from_stored(entries[i])
seqR = Nuc_Seq(seqF.id, seqF[REVERSE_SEQ_COLUMN_NAME], quality=seqF[REVERSE_QUALITY_COLUMN_NAME])
seqR.index = i
ali = aligner(seqF, seqR)
if ali is None:
continue
yield ali
def run(config):
DMS.obi_atexit()
logger("info", "obi alignpairedend")
# Open the input
two_views = False
forward = None
reverse = None
input = None
input = open_uri(config['obi']['inputURI'])
if input is None:
raise Exception("Could not open input reads")
if input[2] != View_NUC_SEQS:
raise NotImplementedError('obi alignpairedend only works on NUC_SEQS views')
if "reverse" in config["alignpairedend"]:
two_views = True
forward = input[1]
rinput = open_uri(config["alignpairedend"]["reverse"])
if rinput is None:
raise Exception("Could not open reverse reads")
if rinput[2] != View_NUC_SEQS:
raise NotImplementedError('obi alignpairedend only works on NUC_SEQS views')
reverse = rinput[1]
if len(forward) != len(reverse):
raise Exception("Error: the number of forward and reverse reads are different")
entries = [forward, reverse]
input_dms_name = [forward.dms.name, reverse.dms.name]
input_view_name = [forward.name, reverse.name]
else:
entries = input[1]
input_dms_name = [entries.dms.name]
input_view_name = [entries.name]
if two_views:
entries_len = len(forward)
else:
entries_len = len(entries)
# Open the output
output = open_uri(config['obi']['outputURI'],
input=False,
newviewtype=View_NUC_SEQS)
if output is None:
raise Exception("Could not create output view")
view = output[1]
Column.new_column(view, QUALITY_COLUMN, OBI_QUAL) #TODO output URI quality option?
if 'smin' in config['alignpairedend']:
smin = config['alignpairedend']['smin']
else:
smin = 0
# Initialize the progress bar
pb = ProgressBar(entries_len, config, seconde=5)
if config['alignpairedend']['trueali']:
kmer_ali = False
aligner = buildAlignment
else :
kmer_ali = True
if type(entries) == list:
forward = entries[0]
reverse = entries[1]
aligner = Kmer_similarity(forward, view2=reverse, kmer_size=config['alignpairedend']['kmersize'])
else:
aligner = Kmer_similarity(entries, column2=entries[REVERSE_SEQ_COLUMN_NAME], qual_column2=entries[REVERSE_QUALITY_COLUMN_NAME], kmer_size=config['alignpairedend']['kmersize'])
ba = alignmentIterator(entries, aligner)
i = 0
for ali in ba:
pb(i)
consensus = view[i]
if not two_views:
seqF = entries[i]
else:
seqF = forward[i]
if smin > 0:
if (ali.score > smin) :
buildConsensus(ali, consensus, seqF)
else:
if not two_views:
seqR = Nuc_Seq(seqF.id, seqF[REVERSE_SEQ_COLUMN_NAME], quality = seqF[REVERSE_QUALITY_COLUMN_NAME])
else:
seqR = reverse[i]
buildJoinedSequence(ali, seqR, consensus, forward=seqF)
consensus[b"smin"] = smin
else:
buildConsensus(ali, consensus, seqF)
if kmer_ali :
ali.free()
i+=1
pb(i, force=True)
print("", file=sys.stderr)
if kmer_ali :
aligner.free()
# Save command config in View and DMS comments
command_line = " ".join(sys.argv[1:])
view.write_config(config, "alignpairedend", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
output[0].record_command_line(command_line)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(view), file=sys.stderr)
input[0].close()
if two_views:
rinput[0].close()
output[0].close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,382 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.dms.view.view cimport View, Line_selection
from obitools3.uri.decode import open_uri
from obitools3.apps.optiongroups import addMinimalInputOption, addTaxonomyOption, addMinimalOutputOption
from obitools3.dms.view import RollbackException
from functools import reduce
from obitools3.apps.config import logger
from obitools3.utils cimport tobytes, str2bytes
from obitools3.dms.capi.obiview cimport NUC_SEQUENCE_COLUMN, \
ID_COLUMN, \
DEFINITION_COLUMN, \
QUALITY_COLUMN, \
COUNT_COLUMN
import time
import math
import sys
__title__="Annotate views with new tags and edit existing annotations"
SPECIAL_COLUMNS = [NUC_SEQUENCE_COLUMN, ID_COLUMN, DEFINITION_COLUMN, QUALITY_COLUMN]
def addOptions(parser):
addMinimalInputOption(parser)
addTaxonomyOption(parser)
addMinimalOutputOption(parser)
group=parser.add_argument_group('obi annotate specific options')
group.add_argument('--seq-rank', # TODO seq/elt/line???
action="store_true",
dest="annotate:add_rank",
default=False,
help="Add a rank attribute to the sequence "
"indicating the sequence position in the data.")
group.add_argument('-R', '--rename-tag',
action="append",
dest="annotate:rename_tags",
metavar="<OLD_NAME:NEW_NAME>",
type=str,
default=[],
help="Change tag name from OLD_NAME to NEW_NAME.")
group.add_argument('-D', '--delete-tag',
action="append",
dest="annotate:delete_tags",
metavar="<TAG_NAME>",
type=str,
default=[],
help="Delete tag TAG_NAME.")
group.add_argument('-S', '--set-tag',
action="append",
dest="annotate:set_tags",
metavar="<TAG_NAME:PYTHON_EXPRESSION>",
type=str,
default=[],
help="Add a new tag named TAG_NAME with "
"a value computed from PYTHON_EXPRESSION.")
group.add_argument('--set-identifier',
action="store",
dest="annotate:set_identifier",
metavar="<PYTHON_EXPRESSION>",
type=str,
default=None,
help="Set sequence identifier with "
"a value computed from PYTHON_EXPRESSION.")
group.add_argument('--set-sequence',
action="store",
dest="annotate:set_sequence",
metavar="<PYTHON_EXPRESSION>",
type=str,
default=None,
help="Change the sequence itself with "
"a value computed from PYTHON_EXPRESSION.")
group.add_argument('--set-definition',
action="store",
dest="annotate:set_definition",
metavar="<PYTHON_EXPRESSION>",
type=str,
default=None,
help="Set sequence definition with "
"a value computed from PYTHON_EXPRESSION.")
group.add_argument('--run',
action="store",
dest="annotate:run",
metavar="<PYTHON_EXPRESSION>",
type=str,
default=None,
help="Run a python expression on each element.")
group.add_argument('-C', '--clear',
action="store_true",
dest="annotate:clear",
default=False,
help="Clear all tags except the obligatory ones.")
group.add_argument('-k','--keep',
action='append',
dest="annotate:keep",
metavar="<TAG>",
default=[],
type=str,
help="Only keep this tag. (Can be specified several times.)")
group.add_argument('--length',
action="store_true",
dest="annotate:length",
default=False,
help="Add 'seq_length' tag with sequence length.")
group.add_argument('--with-taxon-at-rank',
action='append',
dest="annotate:taxon_at_rank",
metavar="<RANK_NAME>",
default=[],
type=str,
help="Add taxonomy annotation at the specified rank level RANK_NAME.")
def sequenceTaggerGenerator(config, taxo=None):
toSet=None
newId=None
newDef=None
newSeq=None
length=None
add_rank=None
run=None
if 'set_tags' in config['annotate']: # TODO default option problem, to fix
toSet = [x.split(':',1) for x in config['annotate']['set_tags'] if len(x.split(':',1))==2]
if 'set_identifier' in config['annotate']:
newId = config['annotate']['set_identifier']
if 'set_definition' in config['annotate']:
newDef = config['annotate']['set_definition']
if 'set_sequence' in config['annotate']:
newSeq = config['annotate']['set_sequence']
if 'length' in config['annotate']:
length = config['annotate']['length']
if 'add_rank' in config["annotate"]:
add_rank = config["annotate"]["add_rank"]
if 'run' in config['annotate']:
run = config['annotate']['run']
counter = [0]
for i in range(len(toSet)):
for j in range(len(toSet[i])):
toSet[i][j] = tobytes(toSet[i][j])
annoteRank=[]
if config['annotate']['taxon_at_rank']:
if taxo is not None:
annoteRank = config['annotate']['taxon_at_rank']
else:
raise Exception("A taxonomy must be provided to annotate taxon ranks")
def sequenceTagger(seq):
if counter[0]>=0:
counter[0]+=1
for rank in annoteRank:
if 'taxid' in seq:
taxid = seq['taxid']
if taxid is not None:
rtaxid = taxo.get_taxon_at_rank(taxid, rank)
if rtaxid is not None:
scn = taxo.get_scientific_name(rtaxid)
else:
scn=None
seq[rank]=rtaxid
seq["%s_name"%rank]=scn
if add_rank:
seq['seq_rank']=counter[0]
for i,v in toSet:
#try:
if taxo is not None:
environ = {'taxonomy' : taxo, 'sequence':seq, 'counter':counter[0], 'math':math}
else:
environ = {'sequence':seq, 'counter':counter[0], 'math':math}
val = eval(v, environ, seq)
#except Exception,e: # TODO discuss usefulness of this
# if options.onlyValid:
# raise e
# val = v
seq[i]=val
if length:
seq['seq_length']=len(seq)
if newId is not None:
# try:
if taxo is not None:
environ = {'taxonomy' : taxo, 'sequence':seq, 'counter':counter[0], 'math':math}
else:
environ = {'sequence':seq, 'counter':counter[0], 'math':math}
val = eval(newId, environ, seq)
# except Exception,e:
# if options.onlyValid:
# raise e
# val = newId
seq.id=val
if newDef is not None:
# try:
if taxo is not None:
environ = {'taxonomy' : taxo, 'sequence':seq, 'counter':counter[0], 'math':math}
else:
environ = {'sequence':seq, 'counter':counter[0], 'math':math}
val = eval(newDef, environ, seq)
# except Exception,e:
# if options.onlyValid:
# raise e
# val = newDef
seq.definition=val
#
if newSeq is not None:
# try:
if taxo is not None:
environ = {'taxonomy' : taxo, 'sequence':seq, 'counter':counter[0], 'math':math}
else:
environ = {'sequence':seq, 'counter':counter[0], 'math':math}
val = eval(newSeq, environ, seq)
# except Exception,e:
# if options.onlyValid:
# raise e
# val = newSeq
seq.seq=val
if 'seq_length' in seq:
seq['seq_length']=len(seq)
# Delete quality since it must match the sequence.
# TODO discuss deleting for each sequence separately
if QUALITY_COLUMN in seq:
seq.view.delete_column(QUALITY_COLUMN)
if run is not None:
# try:
if taxo is not None:
environ = {'taxonomy' : taxo, 'sequence':seq, 'counter':counter[0], 'math':math}
else:
environ = {'sequence':seq, 'counter':counter[0], 'math':math}
eval(run, environ, seq)
# except Exception,e:
# if options.onlyValid:
# raise e
return sequenceTagger
def run(config):
DMS.obi_atexit()
logger("info", "obi annotate")
# Open the input
input = open_uri(config['obi']['inputURI'])
if input is None:
raise Exception("Could not read input view")
i_dms = input[0]
i_view = input[1]
i_view_name = input[1].name
# Open the output: only the DMS, as the output view is going to be created by cloning the input view
# (could eventually be done via an open_uri() argument)
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output view")
o_dms = output[0]
o_view_name = output[1]
# If the input and output DMS are not the same, import the input view in the output DMS before cloning it to modify it
# (could be the other way around: clone and modify in the input DMS then import the new view in the output DMS)
if i_dms != o_dms:
imported_view_name = i_view_name
i=0
while imported_view_name in o_dms: # Making sure view name is unique in output DMS
imported_view_name = i_view_name+b"_"+str2bytes(str(i))
i+=1
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], i_view_name, imported_view_name)
i_view = o_dms[imported_view_name]
# Clone output view from input view
o_view = i_view.clone(o_view_name)
if o_view is None:
raise Exception("Couldn't create output view")
i_view.close()
# Open taxonomy if there is one
if 'taxoURI' in config['obi'] and config['obi']['taxoURI'] is not None:
taxo_uri = open_uri(config['obi']['taxoURI'])
if taxo_uri is None:
raise Exception("Couldn't open taxonomy")
taxo = taxo_uri[1]
else :
taxo = None
# Initialize the progress bar
pb = ProgressBar(len(o_view), config, seconde=5)
try:
# Apply editions
# Editions at view level
if 'delete_tags' in config['annotate']:
toDelete = config['annotate']['delete_tags'][:]
if 'rename_tags' in config['annotate']:
toRename = [x.split(':',1) for x in config['annotate']['rename_tags'] if len(x.split(':',1))==2]
if 'clear' in config['annotate']:
clear = config['annotate']['clear']
if 'keep' in config['annotate']:
keep = config['annotate']['keep']
for i in range(len(toDelete)):
toDelete[i] = tobytes(toDelete[i])
for i in range(len(toRename)):
for j in range(len(toRename[i])):
toRename[i][j] = tobytes(toRename[i][j])
for i in range(len(keep)):
keep[i] = tobytes(keep[i])
keep = set(keep)
if clear or keep:
keys = [k for k in o_view.keys()]
for k in keys:
if k not in keep and k not in SPECIAL_COLUMNS:
o_view.delete_column(k)
else:
for k in toDelete:
o_view.delete_column(k)
for old_name, new_name in toRename:
if old_name in o_view:
o_view.rename_column(old_name, new_name)
# Editions at line level
sequenceTagger = sequenceTaggerGenerator(config, taxo=taxo)
for i in range(len(o_view)):
pb(i)
sequenceTagger(o_view[i])
except Exception, e:
raise RollbackException("obi annotate error, rollbacking view: "+str(e), o_view)
pb(i, force=True)
print("", file=sys.stderr)
# Save command config in View and DMS comments
command_line = " ".join(sys.argv[1:])
input_dms_name=[input[0].name]
input_view_name=[i_view_name]
if 'taxoURI' in config['obi'] and config['obi']['taxoURI'] is not None:
input_dms_name.append(config['obi']['taxoURI'].split("/")[-3])
input_view_name.append("taxonomy/"+config['obi']['taxoURI'].split("/")[-1])
o_view.write_config(config, "annotate", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
output[0].record_command_line(command_line)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_view), file=sys.stderr)
# If the input and the output DMS are different, delete the temporary imported view used to create the final view
if i_dms != o_dms:
View.delete_view(o_dms, imported_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,105 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms.dms cimport DMS
from obitools3.dms.view import RollbackException
from obitools3.dms.capi.build_reference_db cimport build_reference_db
from obitools3.apps.optiongroups import addMinimalInputOption, addTaxonomyOption, addMinimalOutputOption
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.utils cimport tobytes, str2bytes
from obitools3.dms.view.view cimport View
from obitools3.dms.view.typed_view.view_NUC_SEQS cimport View_NUC_SEQS
import sys
__title__="Tag a set of sequences for PCR and sequencing errors identification"
def addOptions(parser):
addMinimalInputOption(parser)
addTaxonomyOption(parser)
addMinimalOutputOption(parser)
group = parser.add_argument_group('obi build_ref_db specific options')
group.add_argument('--threshold','-t',
action="store", dest="build_ref_db:threshold",
metavar='<THRESHOLD>',
default=0.0,
type=float,
help="Score threshold as a normalized identity, e.g. 0.95 for an identity of 95%%. Default: 0.00"
" (no threshold).")
def run(config):
DMS.obi_atexit()
logger("info", "obi build_ref_db")
# Open the input: only the DMS
input = open_uri(config['obi']['inputURI'],
dms_only=True)
if input is None:
raise Exception("Could not read input")
i_dms = input[0]
i_dms_name = input[0].name
i_view_name = input[1]
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output")
o_dms = output[0]
final_o_view_name = output[1]
# If the input and output DMS are not the same, build the database creating a temporary view that will be exported to
# the right DMS and deleted in the other afterwards.
if i_dms != o_dms:
temporary_view_name = final_o_view_name
i=0
while temporary_view_name in i_dms: # Making sure view name is unique in input DMS
temporary_view_name = final_o_view_name+b"_"+str2bytes(str(i))
i+=1
o_view_name = temporary_view_name
else:
o_view_name = final_o_view_name
# Read taxonomy name
taxonomy_name = config['obi']['taxoURI'].split("/")[-1] # Robust in theory
# Save command config in View comments
command_line = " ".join(sys.argv[1:])
input_dms_name=[i_dms_name]
input_view_name= [i_view_name]
input_dms_name.append(config['obi']['taxoURI'].split("/")[-3])
input_view_name.append("taxonomy/"+config['obi']['taxoURI'].split("/")[-1])
comments = View.print_config(config, "build_ref_db", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
if build_reference_db(tobytes(i_dms_name), tobytes(i_view_name), tobytes(taxonomy_name), tobytes(o_view_name), comments, config['build_ref_db']['threshold']) < 0:
raise Exception("Error building a reference database")
# If the input and output DMS are not the same, export result view to output DMS
if i_dms != o_dms:
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], o_view_name, final_o_view_name)
# Save command config in DMS comments
o_dms.record_command_line(command_line)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_dms[final_o_view_name]), file=sys.stderr)
# If the input and the output DMS are different, delete the temporary result view in the input DMS
if i_dms != o_dms:
View.delete_view(i_dms, o_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,124 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms.dms cimport DMS
from obitools3.dms.view import RollbackException
from obitools3.dms.capi.obiclean cimport obi_clean
from obitools3.apps.optiongroups import addMinimalInputOption, addMinimalOutputOption
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.utils cimport tobytes, str2bytes
from obitools3.dms.view.view cimport View
from obitools3.dms.view.typed_view.view_NUC_SEQS cimport View_NUC_SEQS
import sys
__title__="Tag a set of sequences for PCR and sequencing errors identification"
def addOptions(parser):
addMinimalInputOption(parser)
addMinimalOutputOption(parser)
group = parser.add_argument_group('obi clean specific options')
group.add_argument('--distance', '-d',
action="store", dest="clean:distance",
metavar='<DISTANCE>',
default=1.0,
type=float,
help="Maximum numbers of errors between two variant sequences. Default: 1.")
group.add_argument('--sample-tag', '-s',
action="store",
dest="clean:sample-tag-name",
metavar="<SAMPLE TAG NAME>",
type=str,
default="merged_sample",
help="Name of the tag where sample counts are kept.")
group.add_argument('--ratio', '-r',
action="store", dest="clean:ratio",
metavar='<RATIO>',
default=0.5,
type=float,
help="Maximum ratio between the counts of two sequences so that the less abundant one can be considered"
" a variant of the more abundant one. Default: 0.5.")
group.add_argument('--heads-only', '-H',
action="store_true",
dest="clean:heads-only",
default=False,
help="Only sequences labeled as heads are kept in the output. Default: False")
group.add_argument('--cluster-tags', '-C',
action="store_true",
dest="clean:cluster-tags",
default=False,
help="Adds tags for each sequence giving its cluster's head and weight for each sample.")
def run(config):
DMS.obi_atexit()
logger("info", "obi clean")
# Open the input: only the DMS
input = open_uri(config['obi']['inputURI'],
dms_only=True)
if input is None:
raise Exception("Could not read input")
i_dms = input[0]
i_dms_name = input[0].name
i_view_name = input[1]
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output")
o_dms = output[0]
final_o_view_name = output[1]
# If the input and output DMS are not the same, run obiclean creating a temporary view that will be exported to
# the right DMS and deleted in the other afterwards.
if i_dms != o_dms:
temporary_view_name = final_o_view_name
i=0
while temporary_view_name in i_dms: # Making sure view name is unique in input DMS
temporary_view_name = final_o_view_name+b"_"+str2bytes(str(i))
i+=1
o_view_name = temporary_view_name
else:
o_view_name = final_o_view_name
# Save command config in View comments
command_line = " ".join(sys.argv[1:])
comments = View.print_config(config, "clean", command_line, input_dms_name=[i_dms_name], input_view_name=[i_view_name])
if obi_clean(tobytes(i_dms_name), tobytes(i_view_name), tobytes(config['clean']['sample-tag-name']), tobytes(o_view_name), comments, \
config['clean']['distance'], config['clean']['ratio'], config['clean']['heads-only'], 1) < 0:
raise Exception("Error running obiclean")
# If the input and output DMS are not the same, export result view to output DMS
if i_dms != o_dms:
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], o_view_name, final_o_view_name)
# Save command config in DMS comments
o_dms.record_command_line(command_line)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_dms[final_o_view_name]), file=sys.stderr)
# If the input and the output DMS are different, delete the temporary result view in the input DMS
if i_dms != o_dms:
View.delete_view(i_dms, o_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,55 @@
#cython: language_level=3
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.dms import DMS
from obitools3.apps.optiongroups import addMinimalInputOption
from obitools3.dms.capi.obiview cimport COUNT_COLUMN
__title__="Counts sequence records"
def addOptions(parser):
addMinimalInputOption(parser)
group = parser.add_argument_group('obi count specific options')
group.add_argument('-s','--sequence',
action="store_true", dest="count:sequence",
default=False,
help="Prints only the number of sequence records.")
group.add_argument('-a','--all',
action="store_true", dest="count:all",
default=False,
help="Prints only the total count of sequence records (if a sequence has no `count` attribute, its default count is 1) (default: False).")
def run(config):
DMS.obi_atexit()
logger("info", "obi count")
# Open the input
input = open_uri(config['obi']['inputURI'])
if input is None:
raise Exception("Could not read input")
entries = input[1]
count1 = len(entries)
count2 = 0
if COUNT_COLUMN in entries and ((config['count']['sequence'] == config['count']['all']) or (config['count']['all'])) :
for e in entries:
count2+=e[COUNT_COLUMN]
if COUNT_COLUMN in entries and (config['count']['sequence'] == config['count']['all']):
print(count1,count2)
elif COUNT_COLUMN in entries and config['count']['all']:
print(count2)
else:
print(count1)

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,202 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms.dms cimport DMS
from obitools3.dms.capi.obidms cimport OBIDMS_p
from obitools3.dms.view import RollbackException
from obitools3.dms.capi.obiecopcr cimport obi_ecopcr
from obitools3.apps.optiongroups import addMinimalInputOption, addMinimalOutputOption, addTaxonomyOption
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.utils cimport tobytes
from obitools3.dms.view.typed_view.view_NUC_SEQS cimport View_NUC_SEQS
from obitools3.dms.view import View
from libc.stdlib cimport malloc, free
from libc.stdint cimport int32_t
import sys
__title__="in silico PCR"
# TODO: add option to output unique ids
def addOptions(parser):
addMinimalInputOption(parser)
addTaxonomyOption(parser)
addMinimalOutputOption(parser)
group = parser.add_argument_group('obi ecopcr specific options')
group.add_argument('--primer1', '-F',
action="store", dest="ecopcr:primer1",
metavar='<PRIMER>',
type=str,
help="Forward primer.")
group.add_argument('--primer2', '-R',
action="store", dest="ecopcr:primer2",
metavar='<PRIMER>',
type=str,
help="Reverse primer.")
group.add_argument('--error', '-e',
action="store", dest="ecopcr:error",
metavar='<ERROR>',
default=0,
type=int,
help="Maximum number of errors (mismatches) allowed per primer. Default: 0.")
group.add_argument('--min-length', '-l',
action="store",
dest="ecopcr:min-length",
metavar="<MINIMUM LENGTH>",
type=int,
default=0,
help="Minimum length of the in silico amplified DNA fragment, excluding primers.")
group.add_argument('--max-length', '-L',
action="store",
dest="ecopcr:max-length",
metavar="<MAXIMUM LENGTH>",
type=int,
default=0,
help="Maximum length of the in silico amplified DNA fragment, excluding primers.")
group.add_argument('--restrict-to-taxid', '-r',
action="append",
dest="ecopcr:restrict-to-taxid",
metavar="<TAXID>",
type=int,
default=[],
help="Only the sequence records corresponding to the taxonomic group identified "
"by TAXID are considered for the in silico PCR. The TAXID is an integer "
"that can be found in the NCBI taxonomic database.")
group.add_argument('--ignore-taxid', '-i',
action="append",
dest="ecopcr:ignore-taxid",
metavar="<TAXID>",
type=int,
default=[],
help="The sequences of the taxonomic group identified by TAXID are not considered for the in silico PCR.")
group.add_argument('--circular', '-c',
action="store_true",
dest="ecopcr:circular",
default=False,
help="Considers that the input sequences are circular (e.g. mitochondrial or chloroplastic DNA).")
group.add_argument('--salt-concentration', '-a',
action="store",
dest="ecopcr:salt-concentration",
metavar="<FLOAT>",
type=float,
default=0.05,
help="Salt concentration used for estimating the Tm. Default: 0.05.")
group.add_argument('--salt-correction-method', '-m',
action="store",
dest="ecopcr:salt-correction-method",
metavar="<1|2>",
type=int,
default=1,
help="Defines the method used for estimating the Tm (melting temperature) between the primers and their corresponding "
"target sequences. SANTALUCIA: 1, or OWCZARZY: 2. Default: 1.")
group.add_argument('--keep-nucs', '-D',
action="store",
dest="ecopcr:keep-nucs",
metavar="<INTEGER>",
type=int,
default=0,
help="Keeps the specified number of nucleotides on each side of the in silico amplified sequences, "
"(already including the amplified DNA fragment plus the two target sequences of the primers).")
group.add_argument('--kingdom-mode', '-k',
action="store_true",
dest="ecopcr:kingdom-mode",
default=False,
help="Print in the output the kingdom of the in silico amplified sequences (default: print the superkingdom).")
def run(config):
cdef int32_t* restrict_to_taxids_p = NULL
cdef int32_t* ignore_taxids_p = NULL
restrict_to_taxids_len = len(config['ecopcr']['restrict-to-taxid'])
restrict_to_taxids_p = <int32_t*> malloc((restrict_to_taxids_len + 1) * sizeof(int32_t)) # +1 for the -1 flagging the end of the array
for i in range(restrict_to_taxids_len) :
restrict_to_taxids_p[i] = config['ecopcr']['restrict-to-taxid'][i]
restrict_to_taxids_p[restrict_to_taxids_len] = -1
ignore_taxids_len = len(config['ecopcr']['ignore-taxid'])
ignore_taxids_p = <int32_t*> malloc((ignore_taxids_len + 1) * sizeof(int32_t)) # +1 for the -1 flagging the end of the array
for i in range(ignore_taxids_len) :
ignore_taxids_p[i] = config['ecopcr']['ignore-taxid'][i]
ignore_taxids_p[ignore_taxids_len] = -1
DMS.obi_atexit()
logger("info", "obi ecopcr")
# Open the input: only the DMS
input = open_uri(config['obi']['inputURI'],
dms_only=True)
if input is None:
raise Exception("Could not read input")
i_dms = input[0]
i_dms_name = input[0].name
i_view_name = input[1]
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output")
o_dms = output[0]
o_dms_name = output[0].name
o_view_name = output[1]
# Read taxonomy name
taxonomy_name = config['obi']['taxoURI'].split("/")[-1] # Robust in theory
# Save command config in View comments
command_line = " ".join(sys.argv[1:])
input_dms_name=[i_dms_name]
input_view_name= [i_view_name]
input_dms_name.append(config['obi']['taxoURI'].split("/")[-3])
input_view_name.append("taxonomy/"+config['obi']['taxoURI'].split("/")[-1])
comments = View.print_config(config, "ecopcr", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
# TODO: primers in comments?
if obi_ecopcr(tobytes(i_dms_name), tobytes(i_view_name), tobytes(taxonomy_name), \
tobytes(o_dms_name), tobytes(o_view_name), comments, \
tobytes(config['ecopcr']['primer1']), tobytes(config['ecopcr']['primer2']), \
config['ecopcr']['error'], \
config['ecopcr']['min-length'], config['ecopcr']['max-length'], \
restrict_to_taxids_p, ignore_taxids_p, \
config['ecopcr']['circular'], config['ecopcr']['salt-concentration'], config['ecopcr']['salt-correction-method'], \
config['ecopcr']['keep-nucs'], config['ecopcr']['kingdom-mode']) < 0:
raise Exception("Error running ecopcr")
# Save command config in DMS comments
o_dms.record_command_line(command_line)
free(restrict_to_taxids_p)
free(ignore_taxids_p)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_dms[o_view_name]), file=sys.stderr)
o_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,129 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms.dms cimport DMS
from obitools3.dms.view import RollbackException
from obitools3.dms.capi.obiecotag cimport obi_ecotag
from obitools3.apps.optiongroups import addMinimalInputOption, addTaxonomyOption, addMinimalOutputOption
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.utils cimport tobytes, str2bytes
from obitools3.dms.view.view cimport View
from obitools3.dms.view.typed_view.view_NUC_SEQS cimport View_NUC_SEQS
import sys
__title__="Taxonomic assignment of sequences"
def addOptions(parser):
addMinimalInputOption(parser)
addTaxonomyOption(parser)
addMinimalOutputOption(parser)
group = parser.add_argument_group('obi ecotag specific options')
group.add_argument('--ref-database','-R',
action="store", dest="ecotag:ref_view",
metavar='<REF_VIEW>',
type=str,
help="URI of the view containing the reference database as built by the build_ref_db command.")
group.add_argument('--minimum-identity','-m',
action="store", dest="ecotag:threshold",
metavar='<THRESHOLD>',
default=0.0,
type=float,
help="Minimum identity to consider for assignment, as a normalized identity, e.g. 0.95 for an identity of 95%%. "
"Default: 0.00 (no threshold).")
def run(config):
DMS.obi_atexit()
logger("info", "obi ecotag")
# Open the query view: only the DMS
input = open_uri(config['obi']['inputURI'],
dms_only=True)
if input is None:
raise Exception("Could not read input")
i_dms = input[0]
i_dms_name = input[0].name
i_view_name = input[1]
# Open the reference view: only the DMS
ref = open_uri(config['ecotag']['ref_view'],
dms_only=True)
if ref is None:
raise Exception("Could not read reference view URI")
ref_dms = ref[0]
ref_dms_name = ref[0].name
ref_view_name = ref[1]
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output")
o_dms = output[0]
final_o_view_name = output[1]
# If the input and output DMS are not the same, run ecotag creating a temporary view that will be exported to
# the right DMS and deleted in the other afterwards.
if i_dms != o_dms:
temporary_view_name = final_o_view_name
i=0
while temporary_view_name in i_dms: # Making sure view name is unique in input DMS
temporary_view_name = final_o_view_name+b"_"+str2bytes(str(i))
i+=1
o_view_name = temporary_view_name
else:
o_view_name = final_o_view_name
# Read taxonomy DMS and name
taxo = open_uri(config['obi']['taxoURI'],
dms_only=True)
taxo_dms_name = taxo[0].name
taxo_dms = taxo[0]
taxonomy_name = config['obi']['taxoURI'].split("/")[-1] # Robust in theory
# Save command config in View comments
command_line = " ".join(sys.argv[1:])
input_dms_name=[i_dms_name]
input_view_name= [i_view_name]
input_dms_name.append(ref_dms_name)
input_view_name.append(ref_view_name)
input_dms_name.append(config['obi']['taxoURI'].split("/")[-3])
input_view_name.append("taxonomy/"+config['obi']['taxoURI'].split("/")[-1])
comments = View.print_config(config, "ecotag", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
if obi_ecotag(tobytes(i_dms_name), tobytes(i_view_name), \
tobytes(ref_dms_name), tobytes(ref_view_name), \
tobytes(taxo_dms_name), tobytes(taxonomy_name), \
tobytes(o_view_name), comments,
config['ecotag']['threshold']) < 0:
raise Exception("Error running ecotag")
# If the input and output DMS are not the same, export result view to output DMS
if i_dms != o_dms:
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], o_view_name, final_o_view_name)
# Save command config in DMS comments
o_dms.record_command_line(command_line)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_dms[final_o_view_name]), file=sys.stderr)
# If the input and the output DMS are different, delete the temporary result view in the input DMS
if i_dms != o_dms:
View.delete_view(i_dms, o_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,69 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.dms import DMS
from obitools3.dms.obiseq import Nuc_Seq
from obitools3.apps.optiongroups import addMinimalInputOption, \
addExportOutputOption
import sys
__title__="Export a view to a different file format"
def addOptions(parser):
addMinimalInputOption(parser)
addExportOutputOption(parser)
def run(config):
DMS.obi_atexit()
logger("info", "obi export : exports a view to a different file format")
# Open the input
input = open_uri(config['obi']['inputURI'])
if input is None:
raise Exception("Could not read input")
iview = input[1]
# Open the output
output = open_uri(config['obi']['outputURI'],
input=False)
if output is None:
raise Exception("Could not open output URI")
output_object = output[0]
writer = output[1]
# Check that the input view has the type NUC_SEQS if needed # TODO discuss, maybe bool property
if (output[2] == Nuc_Seq) and (iview.type != b"NUC_SEQS_VIEW") : # Nuc_Seq_Stored? TODO
raise Exception("Error: the view to export in fasta or fastq format is not a NUC_SEQS view")
# Initialize the progress bar
pb = ProgressBar(len(iview), config, seconde=5)
i=0
for seq in iview :
pb(i)
try:
writer(seq)
except StopIteration:
break
i+=1
pb(i, force=True)
print("", file=sys.stderr)
# TODO save command in input dms?
output_object.close()
iview.close()
input[0].close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,352 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.dms.view.view cimport View, Line_selection
from obitools3.uri.decode import open_uri
from obitools3.apps.optiongroups import addMinimalInputOption, addTaxonomyOption, addMinimalOutputOption
from obitools3.dms.view import RollbackException
from obitools3.apps.config import logger
from obitools3.utils cimport tobytes, str2bytes
from functools import reduce
import time
import re
import sys
__title__="Grep view lines that match the given predicates"
# TODO should sequences that have a grepped attribute at None be grepped or not? (in obi1 they are but....)
def addOptions(parser):
addMinimalInputOption(parser)
addTaxonomyOption(parser)
addMinimalOutputOption(parser)
group=parser.add_argument_group("obi grep specific options")
group.add_argument("--predicate", "-p",
action="append", dest="grep:grep_predicates",
metavar="<PREDICATE>",
default=None,
type=str,
help="Python boolean expression to be evaluated in the "
"sequence/line context. The attribute name can be "
"used in the expression as a variable name."
"An extra variable named 'sequence' or 'line' refers"
"to the sequence or line object itself. "
"Several -p options can be used on the same "
"commande line.")
group.add_argument("-S", "--sequence",
action="store", dest="grep:seq_pattern",
metavar="<REGULAR_PATTERN>",
type=str,
help="Regular expression pattern used to select "
"the sequence. The pattern is case insensitive.")
group.add_argument("-D", "--definition",
action="store", dest="grep:def_pattern",
metavar="<REGULAR_PATTERN>",
type=str,
help="Regular expression pattern used to select "
"the definition of the sequence. The pattern is case insensitive.")
group.add_argument("-I", "--identifier",
action="store", dest="grep:id_pattern",
metavar="<REGULAR_PATTERN>",
type=str,
help="Regular expression pattern used to select "
"the identifier of the sequence. The pattern is case insensitive.")
group.add_argument("--id-list",
action="store", dest="grep:id_list",
metavar="<FILE_NAME>",
type=str,
help="File containing the identifiers of the sequences to select.")
group.add_argument("-a", "--attribute",
action="append", dest="grep:attribute_patterns",
type=str,
default=[],
metavar="<ATTRIBUTE_NAME>:<REGULAR_PATTERN>",
help="Regular expression pattern matched against "
"the attributes of the sequence. "
"The pattern is case sensitive. "
"Several -a options can be used on the same "
"command line.")
group.add_argument("-A", "--has-attribute",
action="append", dest="grep:attributes",
type=str,
default=[],
metavar="<ATTRIBUTE_NAME>",
help="Select records with the attribute <ATTRIBUTE_NAME> "
"defined (not set to NA value). "
"Several -a options can be used on the same "
"command line.")
group.add_argument("-L", "--lmax",
action="store", dest="grep:lmax",
metavar="<MAX_LENGTH>",
type=int,
help="Keep sequences shorter than MAX_LENGTH.")
group.add_argument("-l", "--lmin",
action="store", dest="grep:lmin",
metavar="<MIN_LENGTH>",
type=int,
help="Keep sequences longer than MIN_LENGTH.")
group.add_argument("-v", "--invert-selection",
action="store_true", dest="grep:invert_selection",
default=False,
help="Invert the selection.")
group=parser.add_argument_group("Taxonomy filtering specific options") #TODO put somewhere else? not in grep
group.add_argument('--require-rank',
action="append", dest="grep:required_ranks",
metavar="<RANK_NAME>",
type=str,
default=[],
help="Select sequences with a taxid that is or has "
"a parent of rank <RANK_NAME>.")
group.add_argument('-r', '--required',
action="append", dest="grep:required_taxids",
metavar="<TAXID>",
type=int,
default=[],
help="Select the sequences having the ancestor of taxid <TAXID>. "
"If several ancestors are specified (with \n'-r taxid1 -r taxid2'), "
"the sequences having at least one of them are selected.")
# TODO useless option equivalent to -r -v?
group.add_argument('-i','--ignore',
action="append", dest="grep:ignored_taxids",
metavar="<TAXID>",
type=int,
default=[],
help="Ignore the sequences having the ancestor of taxid <TAXID>. "
"If several ancestors are specified (with \n'-r taxid1 -r taxid2'), "
"the sequences having at least one of them are ignored.")
def Filter_generator(options, tax_filter):
#taxfilter = taxonomyFilterGenerator(options)
# Initialize conditions
predicates = None
if "predicates" in options:
predicates = options["predicates"]
attributes = None
if "attributes" in options:
attributes = options["attributes"]
lmax = None
if "lmax" in options:
lmax = options["lmax"]
lmin = None
if "lmin" in options:
lmin = options["lmin"]
invert_selection = options["invert_selection"]
id_set = None
if "id_list" in options:
id_set = set(x.strip() for x in open(options["id_list"]))
# Initialize the regular expression patterns
seq_pattern = None
if "seq_pattern" in options:
seq_pattern = re.compile(tobytes(options["seq_pattern"]), re.I)
id_pattern = None
if "id_pattern" in options:
id_pattern = re.compile(tobytes(options["id_pattern"]))
def_pattern = None
if "def_pattern" in options:
def_pattern = re.compile(tobytes(options["def_pattern"]))
attribute_patterns={}
if "attribute_patterns" in options:
for p in options["attribute_patterns"]:
attribute, pattern = p.split(":", 1)
attribute_patterns[tobytes(attribute)] = re.compile(tobytes(pattern))
def filter(line, loc_env):
cdef bint good = True
if seq_pattern and hasattr(line, "seq"):
good = <bint>(seq_pattern.search(line.seq))
if good and id_pattern and hasattr(line, "id"):
good = <bint>(id_pattern.search(line.id))
if good and id_set is not None and hasattr(line, "id"):
good = line.id in id_set
if good and def_pattern and hasattr(line, "definition"):
good = <bint>(def_pattern.search(line.definition))
if good and attributes: # TODO discuss that we test not None
good = reduce(lambda bint x, bint y: x and y,
(line[attribute] is not None for attribute in attributes),
True)
if good and attribute_patterns:
good = (reduce(lambda bint x, bint y : x and y,
(line[attribute] is not None for attribute in attributes),
True)
and
reduce(lambda bint x, bint y: x and y,
(<bint>(attribute_patterns[attribute].search(tobytes(str(line[attribute]))))
for attribute in attribute_patterns),
True)
)
if good and predicates:
good = (reduce(lambda bint x, bint y: x and y,
(bool(eval(p, loc_env, line))
for p in predicates), True))
if good and lmin:
good = len(line) >= lmin
if good and lmax:
good = len(line) <= lmax
if good:
good = tax_filter(line)
if invert_selection :
good = not good
return good
return filter
def Taxonomy_filter_generator(taxo, options):
if taxo is not None:
def tax_filter(seq):
good = True
if b'TAXID' in seq and seq[b'TAXID'] is not None: # TODO use macro
taxid = seq[b'TAXID']
if "required_ranks" in options and options["required_ranks"]:
taxon_at_rank = reduce(lambda x,y: x and y,
(taxo.get_taxon_at_rank(seq[b'TAXID'], rank) is not None
for rank in options["required_ranks"]),
True)
good = good and taxon_at_rank
if "required_taxids" in options and options["required_taxids"]:
good = good and reduce(lambda x,y: x or y,
(taxo.is_ancestor(r, taxid)
for r in options["required_taxids"]),
False)
if "ignored_taxids" in options and options["ignored_taxids"]:
good = good and not reduce(lambda x,y: x or y,
(taxo.is_ancestor(r,taxid)
for r in options["ignored_taxids"]),
False)
return good
else:
def tax_filter(seq):
return True
return tax_filter
def run(config):
DMS.obi_atexit()
logger("info", "obi grep")
# Open the input
input = open_uri(config["obi"]["inputURI"])
if input is None:
raise Exception("Could not read input view")
i_dms = input[0]
i_view = input[1]
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output view")
o_dms = output[0]
o_view_name_final = output[1]
o_view_name = o_view_name_final
# If the input and output DMS are not the same, create output view in input DMS first, then export it
# to output DMS, making sure the temporary view name is unique in the input DMS
if i_dms != o_dms:
i=0
while o_view_name in i_dms:
o_view_name = o_view_name_final+b"_"+str2bytes(str(i))
i+=1
if 'taxoURI' in config['obi'] and config['obi']['taxoURI'] is not None:
taxo_uri = open_uri(config["obi"]["taxoURI"])
if taxo_uri is None:
raise Exception("Couldn't open taxonomy")
taxo = taxo_uri[1]
else :
taxo = None
# Initialize the progress bar
pb = ProgressBar(len(i_view), config, seconde=5)
# Apply filter
tax_filter = Taxonomy_filter_generator(taxo, config["grep"])
filter = Filter_generator(config["grep"], tax_filter)
selection = Line_selection(i_view)
for i in range(len(i_view)):
pb(i)
line = i_view[i]
loc_env = {"sequence": line, "line": line, "taxonomy": taxo}
good = filter(line, loc_env)
if good :
selection.append(i)
pb(i, force=True)
print("", file=sys.stderr)
# Create output view with the line selection
try:
o_view = selection.materialize(o_view_name)
except Exception, e:
raise RollbackException("obi grep error, rollbacking view: "+str(e), o_view)
# Save command config in View and DMS comments
command_line = " ".join(sys.argv[1:])
input_dms_name=[input[0].name]
input_view_name=[input[1].name]
if 'taxoURI' in config['obi'] and config['obi']['taxoURI'] is not None:
input_dms_name.append(config['obi']['taxoURI'].split("/")[-3])
input_view_name.append("taxonomy/"+config['obi']['taxoURI'].split("/")[-1])
o_view.write_config(config, "grep", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
o_dms.record_command_line(command_line)
# If input and output DMS are not the same, export the temporary view to the output DMS
# and delete the temporary view in the input DMS
if i_dms != o_dms:
o_view.close()
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], o_view_name, o_view_name_final)
o_view = o_dms[o_view_name_final]
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_view), file=sys.stderr)
# If the input and the output DMS are different, delete the temporary imported view used to create the final view
if i_dms != o_dms:
View.delete_view(i_dms, o_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,106 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.dms.view.view cimport View, Line_selection
from obitools3.uri.decode import open_uri
from obitools3.apps.optiongroups import addMinimalInputOption, addMinimalOutputOption
from obitools3.dms.view import RollbackException
from obitools3.apps.config import logger
from obitools3.utils cimport str2bytes
import time
import sys
__title__="Keep the N first lines of a view."
def addOptions(parser):
addMinimalInputOption(parser)
addMinimalOutputOption(parser)
group=parser.add_argument_group('obi head specific options')
group.add_argument('-n', '--sequence-count',
action="store", dest="head:count",
metavar='<N>',
default=10,
type=int,
help="Number of first records to keep.")
def run(config):
DMS.obi_atexit()
logger("info", "obi head")
# Open the input
input = open_uri(config["obi"]["inputURI"])
if input is None:
raise Exception("Could not read input view")
i_dms = input[0]
i_view = input[1]
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output view")
o_dms = output[0]
o_view_name_final = output[1]
o_view_name = o_view_name_final
# If the input and output DMS are not the same, create output view in input DMS first, then export it
# to output DMS, making sure the temporary view name is unique in the input DMS
if i_dms != o_dms:
i=0
while o_view_name in i_dms:
o_view_name = o_view_name_final+b"_"+str2bytes(str(i))
i+=1
n = min(config['head']['count'], len(i_view))
# Initialize the progress bar
pb = ProgressBar(n, config, seconde=5)
selection = Line_selection(i_view)
for i in range(n):
pb(i)
selection.append(i)
pb(i, force=True)
print("", file=sys.stderr)
# Create output view with the line selection
try:
o_view = selection.materialize(o_view_name)
except Exception, e:
raise RollbackException("obi head error, rollbacking view: "+str(e), o_view)
# Save command config in DMS comments
command_line = " ".join(sys.argv[1:])
o_view.write_config(config, "head", command_line, input_dms_name=[i_dms.name], input_view_name=[i_view.name])
o_dms.record_command_line(command_line)
# If input and output DMS are not the same, export the temporary view to the output DMS
# and delete the temporary view in the input DMS
if i_dms != o_dms:
o_view.close()
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], o_view_name, o_view_name_final)
o_view = o_dms[o_view_name_final]
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(view), file=sys.stderr)
# If the input and the output DMS are different, delete the temporary imported view used to create the final view
if i_dms != o_dms:
View.delete_view(i_dms, o_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,57 @@
#cython: language_level=3
from obitools3.apps.optiongroups import addMinimalInputOption
from obitools3.uri.decode import open_uri
from obitools3.dms import DMS
from obitools3.dms.view import View
from obitools3.utils cimport bytes2str
__title__="Command line histories and view history graphs"
def addOptions(parser):
addMinimalInputOption(parser)
group=parser.add_argument_group('obi history specific options')
group.add_argument('--bash', '-b',
action="store_const", dest="history:format",
default="bash",
const="bash",
help="Print history in bash format")
group.add_argument('--dot', '-d',
action="store_const", dest="history:format",
default="bash",
const="dot",
help="Print history in DOT format (default: bash format)")
group.add_argument('--ascii', '-a',
action="store_const", dest="history:format",
default="bash",
const="ascii",
help="Print history in ASCII format (only for views; default: bash format)")
def run(config):
cdef object entries
DMS.obi_atexit()
input = open_uri(config['obi']['inputURI'])
entries = input[1]
if config['history']['format'] == "bash" :
print(bytes2str(entries.bash_history))
elif config['history']['format'] == "dot" :
print(bytes2str(entries.dot_history_graph))
elif config['history']['format'] == "ascii" :
if isinstance(entries, View):
print(bytes2str(entries.ascii_history_graph))
else:
raise Exception("ASCII history only available for views")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,301 @@
#cython: language_level=3
import sys
import os
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms.view.view cimport View
from obitools3.dms.view import RollbackException
from obitools3.dms.view.typed_view.view_NUC_SEQS cimport View_NUC_SEQS
from obitools3.dms.column.column cimport Column
from obitools3.dms.obiseq cimport Nuc_Seq
from obitools3.dms import DMS
from obitools3.dms.taxo.taxo cimport Taxonomy
from obitools3.utils cimport tobytes, \
get_obitype, \
update_obitype
from obitools3.dms.capi.obiview cimport VIEW_TYPE_NUC_SEQS, \
NUC_SEQUENCE_COLUMN, \
ID_COLUMN, \
DEFINITION_COLUMN, \
QUALITY_COLUMN, \
COUNT_COLUMN, \
TAXID_COLUMN
from obitools3.dms.capi.obitypes cimport obitype_t, \
OBI_VOID, \
OBI_QUAL
from obitools3.dms.capi.obierrno cimport obi_errno
from obitools3.apps.optiongroups import addImportInputOption, \
addTabularInputOption, \
addTaxdumpInputOption, \
addMinimalOutputOption
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
__title__="Imports sequences from different formats into a DMS"
default_config = { 'destview' : None,
'skip' : 0,
'only' : None,
'skiperror' : False,
'seqinformat' : None,
'moltype' : 'nuc',
'source' : None
}
def addOptions(parser):
addImportInputOption(parser)
addTabularInputOption(parser)
addTaxdumpInputOption(parser)
addMinimalOutputOption(parser)
def run(config):
cdef tuple input
cdef tuple output
cdef int i
cdef type value_type
cdef obitype_t value_obitype
cdef obitype_t old_type
cdef obitype_t new_type
cdef bint get_quality
cdef bint NUC_SEQS_view
cdef int nb_elts
cdef object d
cdef View view
cdef object entries
cdef object entry
cdef Column id_col
cdef Column def_col
cdef Column seq_col
cdef Column qual_col
cdef Column old_column
cdef bint rewrite
cdef dict dcols
cdef int skipping
cdef bytes tag
cdef object value
cdef list elt_names
cdef int old_nb_elements_per_line
cdef int new_nb_elements_per_line
cdef list old_elements_names
cdef list new_elements_names
cdef ProgressBar pb
global obi_errno
DMS.obi_atexit()
logger("info", "obi import: imports an object (file(s), obiview, taxonomy...) into a DMS")
entry_count = -1
if not config['obi']['taxdump']:
input = open_uri(config['obi']['inputURI'])
if input is None: # TODO check for bytes instead now?
raise Exception("Could not open input URI")
entry_count = input[4]
logger("info", "Importing %d entries", entry_count)
# TODO a bit dirty?
if input[2]==Nuc_Seq:
v = View_NUC_SEQS
else:
v = View
else:
v = None
output = open_uri(config['obi']['outputURI'],
input=False,
newviewtype=v)
if output is None:
raise Exception("Could not create output view")
# Read taxdump
if config['obi']['taxdump']: # The input is a taxdump to import in a DMS
taxo = Taxonomy.open_taxdump(output[0], config['obi']['inputURI'])
taxo.write(output[1])
taxo.close()
output[0].record_command_line(" ".join(sys.argv[1:]))
output[0].close()
return
if entry_count >= 0:
pb = ProgressBar(entry_count, config, seconde=5)
else:
pb = None
entries = input[1]
NUC_SEQS_view = False
if isinstance(output[1], View) :
view = output[1]
if output[2] == View_NUC_SEQS :
NUC_SEQS_view = True
else:
raise NotImplementedError()
# Save basic columns in variables for optimization
if NUC_SEQS_view :
id_col = view[ID_COLUMN]
def_col = view[DEFINITION_COLUMN]
seq_col = view[NUC_SEQUENCE_COLUMN]
dcols = {}
i = 0
for entry in entries :
if entry is None: # error or exception handled at lower level, not raised because Python generators can't resume after any exception is raised
if config['obi']['skiperror']:
i-=1
continue
else:
raise RollbackException("obi import error, rollbacking view", view)
if pb is not None:
pb(i)
if NUC_SEQS_view:
id_col[i] = entry.id
def_col[i] = entry.definition
seq_col[i] = entry.seq
# Check if there is a sequencing quality associated by checking the first entry # TODO haven't found a more robust solution yet
if i == 0:
get_quality = QUALITY_COLUMN in entry
if get_quality:
Column.new_column(view, QUALITY_COLUMN, OBI_QUAL)
qual_col = view[QUALITY_COLUMN]
if get_quality:
qual_col[i] = entry.quality
for tag in entry :
if tag != ID_COLUMN and tag != DEFINITION_COLUMN and tag != NUC_SEQUENCE_COLUMN and tag != QUALITY_COLUMN : # TODO dirty
value = entry[tag]
if tag == b"taxid":
tag = TAXID_COLUMN
if tag == b"count":
tag = COUNT_COLUMN
if tag not in dcols :
value_type = type(value)
nb_elts = 1
value_obitype = OBI_VOID
if value_type == dict or value_type == list :
nb_elts = len(value)
elt_names = list(value)
else :
nb_elts = 1
elt_names = None
value_obitype = get_obitype(value)
if value_obitype != OBI_VOID :
dcols[tag] = (Column.new_column(view, tag, value_obitype, nb_elements_per_line=nb_elts, elements_names=elt_names), value_obitype)
# Fill value
dcols[tag][0][i] = value
# TODO else log error?
else :
rewrite = False
# Check type adequation
old_type = dcols[tag][1]
new_type = OBI_VOID
new_type = update_obitype(old_type, value)
if old_type != new_type :
rewrite = True
try:
# Fill value
dcols[tag][0][i] = value
except IndexError :
value_type = type(value)
old_column = dcols[tag][0]
old_nb_elements_per_line = old_column.nb_elements_per_line
new_nb_elements_per_line = 0
old_elements_names = old_column.elements_names
new_elements_names = None
#####################################################################
# Check the length and keys of column lines if needed
if value_type == dict : # Check dictionary keys
for k in value :
if k not in old_elements_names :
new_elements_names = list(set(old_elements_names+[tobytes(k) for k in value]))
rewrite = True
break
elif value_type == list or value_type == tuple : # Check vector length
if old_nb_elements_per_line < len(value) :
new_nb_elements_per_line = len(value)
rewrite = True
#####################################################################
if rewrite :
if new_nb_elements_per_line == 0 and new_elements_names is not None :
new_nb_elements_per_line = len(new_elements_names)
# Reset obierrno
obi_errno = 0
dcols[tag] = (view.rewrite_column_with_diff_attributes(old_column.name,
new_data_type=new_type,
new_nb_elements_per_line=new_nb_elements_per_line,
new_elements_names=new_elements_names,
rewrite_last_line=False),
value_obitype)
# Update the dictionary:
for t in dcols :
dcols[t] = (view[t], dcols[t][1])
# Fill value
dcols[tag][0][i] = value
i+=1
if pb is not None:
pb(i, force=True)
print("", file=sys.stderr)
# Save command config in View and DMS comments
command_line = " ".join(sys.argv[1:])
view.write_config(config, "import", command_line, input_str=[os.path.abspath(config['obi']['inputURI'])])
output[0].record_command_line(command_line)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(view), file=sys.stderr)
try:
input[0].close()
except AttributeError:
pass
try:
output[0].close()
except AttributeError:
pass
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,44 @@
#cython: language_level=3
from obitools3.apps.optiongroups import addMinimalInputOption
from obitools3.uri.decode import open_uri
from obitools3.dms import DMS
__title__="Less equivalent"
def addOptions(parser):
addMinimalInputOption(parser)
group=parser.add_argument_group('obi less specific options')
group.add_argument('--print', '-n',
action="store", dest="less:print",
metavar='<N>',
default=10,
type=int,
help="Print N entries (default: 10)")
def run(config):
cdef object entries
cdef int n
DMS.obi_atexit()
input = open_uri(config['obi']['inputURI'])
entries = input[1]
if config['less']['print'] > len(entries) :
n = len(entries)
else :
n = config['less']['print']
# Print
for i in range(n) :
print(repr(entries[i]))

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,28 @@
#cython: language_level=3
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.dms import DMS
from obitools3.apps.optiongroups import addMinimalInputOption
__title__="Print a preview of a DMS, view, column...."
def addOptions(parser):
addMinimalInputOption(parser)
def run(config):
DMS.obi_atexit()
logger("info", "obi ls")
# Open the input
input = open_uri(config['obi']['inputURI'])
if input is None:
raise Exception("Could not read input")
print(repr(input[1]))

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,604 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.dms.view import RollbackException
from obitools3.dms.view.typed_view.view_NUC_SEQS cimport View_NUC_SEQS
from obitools3.dms.column.column cimport Column, Column_line
from obitools3.apps.optiongroups import addMinimalInputOption, addMinimalOutputOption
from obitools3.uri.decode import open_uri
from obitools3.apps.config import logger
from obitools3.libalign._freeendgapfm import FreeEndGapFullMatch
from obitools3.libalign.apat_pattern import Primer_search
from obitools3.dms.obiseq cimport Nuc_Seq
from obitools3.dms.capi.obitypes cimport OBI_SEQ, OBI_QUAL
from obitools3.dms.capi.apat cimport MAX_PATTERN
from obitools3.utils cimport tobytes
from libc.stdint cimport INT32_MAX
from functools import reduce
import math
import sys
REVERSE_SEQ_COLUMN_NAME = b"REVERSE_SEQUENCE" # used by alignpairedend tool
REVERSE_QUALITY_COLUMN_NAME = b"REVERSE_QUALITY" # used by alignpairedend tool
__title__="Assigns sequence records to the corresponding experiment/sample based on DNA tags and primers"
def addOptions(parser):
addMinimalInputOption(parser)
addMinimalOutputOption(parser)
group = parser.add_argument_group('obi ngsfilter specific options')
group.add_argument('-t','--info-view',
action="store", dest="ngsfilter:info_view",
metavar="<URI>",
type=str,
default=None,
help="URI to the view containing the samples definition (with tags, primers, sample names,...)")
group.add_argument('-R', '--reverse-reads',
action="store", dest="ngsfilter:reverse",
metavar="<URI>",
default=None,
type=str,
help="URI to the reverse reads if the paired-end reads haven't been aligned yet")
group.add_argument('-u','--unidentified',
action="store", dest="ngsfilter:unidentified",
metavar="<URI>",
type=str,
default=None,
help="URI to the view used to store the sequences unassigned to any sample")
group.add_argument('-e','--error',
action="store", dest="ngsfilter:error",
metavar="###",
type=int,
default=2,
help="Number of errors allowed for matching primers [default = 2]")
class Primer:
collection={}
def __init__(self, sequence, taglength, forward=True, max_errors=2, verbose=False, primer_pair_idx=0, primer_idx=0):
'''
@param sequence:
@type sequence:
@param direct:
@type direct:
'''
assert sequence not in Primer.collection \
or Primer.collection[sequence]==taglength, \
"Primer %s must always be used with tags of the same length" % sequence
Primer.collection[sequence]=taglength
self.primer_pair_idx = primer_pair_idx
self.primer_idx = primer_idx
self.is_revcomp = False
self.revcomp = None
self.raw=sequence
self.sequence = Nuc_Seq(b"primer", sequence)
self.lseq = len(self.sequence)
self.max_errors=max_errors
self.taglength=taglength
self.forward = forward
self.verbose=verbose
def reverse_complement(self):
p = Primer(self.raw,
self.taglength,
not self.forward,
verbose=self.verbose,
max_errors=self.max_errors,
primer_pair_idx=self.primer_pair_idx,
primer_idx=self.primer_idx)
p.sequence=p.sequence.reverse_complement
p.is_revcomp = True
p.revcomp = None
return p
def __hash__(self):
return hash(str(self.raw))
def __eq__(self,primer):
return self.raw==primer.raw
def __call__(self, sequence, same_sequence=False, pattern=0, begin=0):
if len(sequence) <= self.lseq:
return None
ali = self.aligner.search_one_primer(sequence.seq,
self.primer_pair_idx,
self.primer_idx,
reverse_comp=self.is_revcomp,
same_sequence=same_sequence,
pattern_ref=pattern,
begin=begin)
if ali is None: # no match
return None
errors, start = ali.first_encountered()
if errors <= self.max_errors:
end = start + self.lseq
if self.taglength is not None:
if self.sequence.is_revcomp:
if (len(sequence)-end) >= self.taglength:
tag_start = len(sequence) - end - self.taglength
tag = sequence.reverse_complement[tag_start:tag_start+self.taglength].seq
else:
tag=None
else:
if start >= self.taglength:
tag = tobytes((sequence[start - self.taglength:start].seq).lower()) # turn back to lowercase because apat turned to uppercase
else:
tag=None
else:
tag=None
return errors,start,end,tag
return None
def __str__(self):
return "%s: %s" % ({True:'D',False:'R'}[self.forward],self.raw)
__repr__=__str__
cdef read_info_view(info_view, max_errors=2, verbose=False, not_aligned=False):
infos = {}
primer_list = []
i=0
for p in info_view:
forward=Primer(p[b'forward_primer'],
len(p[b'forward_tag']) if p[b'forward_tag']!=b'-' else None,
True,
max_errors=max_errors,
verbose=verbose,
primer_pair_idx=i,
primer_idx=0)
fp = infos.get(forward,{})
infos[forward]=fp
reverse=Primer(p[b'reverse_primer'],
len(p[b'reverse_tag']) if p[b'reverse_tag']!=b'-' else None,
False,
max_errors=max_errors,
verbose=verbose,
primer_pair_idx=i,
primer_idx=1)
primer_list.append((p[b'forward_primer'], p[b'reverse_primer']))
rp = infos.get(reverse,{})
infos[reverse]=rp
if not_aligned:
cf=forward
cr=reverse
cf.revcomp = forward.reverse_complement()
cr.revcomp = reverse.reverse_complement()
dpp=fp.get(cr,{})
fp[cr]=dpp
rpp=rp.get(cf,{})
rp[cf]=rpp
else:
cf=forward.reverse_complement()
cr=reverse.reverse_complement()
dpp=fp.get(cr,{})
fp[cr]=dpp
rpp=rp.get(cf,{})
rp[cf]=rpp
tags = (p[b'forward_tag'] if p[b'forward_tag']!=b'-' else None,
p[b'reverse_tag'] if p[b'reverse_tag']!=b'-' else None)
assert tags not in dpp, \
"Tag pair %s is already used with primer pairs: (%s,%s)" % (str(tags),forward,reverse)
# Save additional data
special_keys = [b'forward_primer', b'reverse_primer', b'forward_tag', b'reverse_tag']
data={}
for key in p:
if key not in special_keys:
data[key] = p[key]
dpp[tags] = data
rpp[tags] = data
i+=1
return infos, primer_list
cdef tuple annotate(sequences, infos, verbose=False):
def sortMatch(match):
if match[1] is None:
return INT32_MAX
else:
return match[1][1]
def sortReverseMatch(match):
if match[1] is None:
return -1
else:
return match[1][1]
not_aligned = len(sequences) > 1
sequenceF = sequences[0]
sequenceR = None
if not not_aligned:
final_sequence = sequenceF
else:
final_sequence = sequenceF.clone() # TODO maybe not cloning and then deleting quality tags is more efficient
if not_aligned:
sequenceR = sequences[1]
final_sequence[REVERSE_SEQ_COLUMN_NAME] = sequenceR.seq # used by alignpairedend tool
final_sequence[REVERSE_QUALITY_COLUMN_NAME] = sequenceR.quality # used by alignpairedend tool
for seq in sequences:
if hasattr(seq, "quality_array"):
q = -reduce(lambda x,y:x+y,(math.log10(z) for z in seq.quality_array),0)/len(seq.quality_array)*10
seq[b'avg_quality']=q
q = -reduce(lambda x,y:x+y,(math.log10(z) for z in seq.quality_array[0:10]),0)
seq[b'head_quality']=q
if len(seq.quality_array[10:-10]) :
q = -reduce(lambda x,y:x+y,(math.log10(z) for z in seq.quality_array[10:-10]),0)/len(seq.quality_array[10:-10])*10
seq[b'mid_quality']=q
q = -reduce(lambda x,y:x+y,(math.log10(z) for z in seq.quality_array[-10:]),0)
seq[b'tail_quality']=q
# Try direct matching:
directmatch = []
first_matched_seq = None
second_matched_seq = None
for seq in sequences:
new_seq = True
pattern = 0
for p in infos:
if pattern == MAX_PATTERN:
new_seq = True
pattern = 0
directmatch.append((p, p(seq, same_sequence=not new_seq, pattern=pattern), seq))
new_seq = False
pattern+=1
# Choose match closer to the start of (one of the) sequence(s)
directmatch = sorted(directmatch, key=sortMatch)
all_direct_matches = directmatch
directmatch = directmatch[0] if directmatch[0][1] is not None else None
if directmatch is None:
final_sequence[b'error']=b'No primer match'
return False, final_sequence
first_matched_seq = directmatch[2]
if id(first_matched_seq) == id(sequenceF) and not_aligned:
second_matched_seq = sequenceR
else:
second_matched_seq = sequenceF
match = first_matched_seq[directmatch[1][1]:directmatch[1][2]]
if not not_aligned:
final_sequence[b'seq_length_ori']=len(final_sequence)
if not not_aligned or id(first_matched_seq) == id(sequenceF):
final_sequence = final_sequence[directmatch[1][2]:]
else:
cut_seq = sequenceR[directmatch[1][2]:]
final_sequence[REVERSE_SEQ_COLUMN_NAME] = cut_seq.seq # used by alignpairedend tool
final_sequence[REVERSE_QUALITY_COLUMN_NAME] = cut_seq.quality # used by alignpairedend tool
if directmatch[0].forward:
final_sequence[b'direction']=b'forward'
final_sequence[b'forward_errors']=directmatch[1][0]
final_sequence[b'forward_primer']=directmatch[0].raw
final_sequence[b'forward_match']=match.seq
else:
final_sequence[b'direction']=b'reverse'
final_sequence[b'reverse_errors']=directmatch[1][0]
final_sequence[b'reverse_primer']=directmatch[0].raw
final_sequence[b'reverse_match']=match.seq
# Keep only paired reverse primer
infos = infos[directmatch[0]]
# If not aligned, look for other match in already computed match (choose the one that makes the biggest amplicon)
if not_aligned:
i=1
while all_direct_matches[i][1] is None and all_direct_matches[i][0].forward and i<len(all_direct_matches):
i+=1
if i < len(all_direct_matches):
reversematch = all_direct_matches[i]
else:
reversematch = None
# Look for other primer in the other direction on the sequence, or
# If sequences are not already aligned and reverse primer not found in most likely sequence (the one without the forward primer), try matching on the same sequence than the first match (primer in the other direction)
if not not_aligned or (not_aligned and reversematch[1] is None):
if not not_aligned:
sequence_to_match = second_matched_seq
else:
sequence_to_match = first_matched_seq
reversematch = []
# Compute begin
begin=directmatch[1][2]+1 # end of match + 1 on the same sequence
# Try reverse matching on the other sequence:
new_seq = True
pattern = 0
for p in infos:
if pattern == MAX_PATTERN:
new_seq = True
pattern = 0
if not_aligned:
primer=p.revcomp
else:
primer=p
reversematch.append((primer, primer(sequence_to_match, same_sequence=not new_seq, pattern=pattern, begin=begin)))
new_seq = False
pattern+=1
# Choose match closer to the end of the sequence
reversematch = sorted(reversematch, key=sortReverseMatch, reverse=True)
all_reverse_matches = reversematch
reversematch = reversematch[0] if reversematch[0][1] is not None else None
if reversematch is None and None not in infos:
if directmatch[0].forward:
message = b'No reverse primer match'
else:
message = b'No direct primer match'
final_sequence[b'error']=message
return False, final_sequence
if reversematch is None:
final_sequence[b'status']=b'partial'
if directmatch[0].forward:
tags=(directmatch[1][3],None)
else:
tags=(None,directmatch[1][3])
samples = infos[None]
else:
final_sequence[b'status']=b'full'
match = second_matched_seq[reversematch[1][1]:reversematch[1][2]]
match = match.reverse_complement
if not not_aligned or id(second_matched_seq) == id(sequenceF):
final_sequence = final_sequence[0:reversematch[1][1]]
else:
cut_seq = sequenceR[reversematch[1][2]:]
final_sequence[REVERSE_SEQ_COLUMN_NAME] = cut_seq.seq # used by alignpairedend tool
final_sequence[REVERSE_QUALITY_COLUMN_NAME] = cut_seq.quality # used by alignpairedend tool
if directmatch[0].forward:
tags=(directmatch[1][3], reversematch[1][3])
final_sequence[b'reverse_errors'] = reversematch[1][0]
final_sequence[b'reverse_primer'] = reversematch[0].raw
final_sequence[b'reverse_match'] = match.seq
else:
tags=(reversematch[1][3], directmatch[1][3])
final_sequence[b'forward_errors'] = reversematch[1][0]
final_sequence[b'forward_primer'] = reversematch[0].raw
final_sequence[b'forward_match'] = match.seq
if tags[0] is not None:
final_sequence[b'forward_tag'] = tags[0]
if tags[1] is not None:
final_sequence[b'reverse_tag'] = tags[1]
samples = infos[reversematch[0]]
if not directmatch[0].forward and not not_aligned: # don't reverse complement if not_aligned
final_sequence = final_sequence.reverse_complement
sample=None
if tags[0] is not None: # Direct tag known
if tags[1] is not None: # Reverse tag known
sample = samples.get(tags, None)
else: # Only direct tag known
s=[samples[x] for x in samples if x[0]==tags[0]]
if len(s)==1:
sample=s[0]
elif len(s)>1:
final_sequence[b'error']=b'multiple samples match tags'
return False, final_sequence
else:
sample=None
else:
if tags[1] is not None: # Only reverse tag known
s=[samples[x] for x in samples if x[1]==tags[1]]
if len(s)==1:
sample=s[0]
elif len(s)>1:
final_sequence[b'error']=b'multiple samples match tags'
return False, final_sequence
else:
sample=None
if sample is None:
final_sequence[b'error']=b"Cannot assign sequence to a sample"
return False, final_sequence
final_sequence.update(sample)
if not not_aligned:
final_sequence[b'seq_length']=len(final_sequence)
return True, final_sequence
def run(config):
DMS.obi_atexit()
logger("info", "obi ngsfilter")
assert config['ngsfilter']['info_view'] is not None, "Option -t must be specified"
# Open the input
forward = None
reverse = None
input = None
not_aligned = False
input = open_uri(config['obi']['inputURI'])
if input is None:
raise Exception("Could not open input reads")
if input[2] != View_NUC_SEQS:
raise NotImplementedError('obi ngsfilter only works on NUC_SEQS views')
if "reverse" in config["ngsfilter"]:
forward = input[1]
rinput = open_uri(config["ngsfilter"]["reverse"])
if rinput is None:
raise Exception("Could not open reverse reads")
if rinput[2] != View_NUC_SEQS:
raise NotImplementedError('obi ngsfilter only works on NUC_SEQS views')
reverse = rinput[1]
if len(forward) != len(reverse):
raise Exception("Error: the number of forward and reverse reads are different")
entries = [forward, reverse]
not_aligned = True
input_dms_name = [forward.dms.name, reverse.dms.name]
input_view_name = [forward.name, reverse.name]
else:
entries = input[1]
input_dms_name = [entries.dms.name]
input_view_name = [entries.name]
if not_aligned:
entries_len = len(forward)
else:
entries_len = len(entries)
# Open the output
output = open_uri(config['obi']['outputURI'],
input=False,
newviewtype=View_NUC_SEQS)
if output is None:
raise Exception("Could not create output view")
o_view = output[1]
# Open the view containing the informations about the tags and the primers
info_input = open_uri(config['ngsfilter']['info_view'])
if info_input is None:
raise Exception("Could not read the view containing the informations about the tags and the primers")
info_view = info_input[1]
input_dms_name.append(info_input[0].name)
input_view_name.append(info_input[1].name)
# Open the unidentified view
if 'unidentified' in config['ngsfilter'] and config['ngsfilter']['unidentified'] is not None: # TODO keyError if undefined problem
unidentified_input = open_uri(config['ngsfilter']['unidentified'],
input=False,
newviewtype=View_NUC_SEQS)
if unidentified_input is None:
raise Exception("Could not open the view containing the unidentified reads")
unidentified = unidentified_input[1]
else:
unidentified = None
# Initialize the progress bar
pb = ProgressBar(entries_len, config, seconde=5)
# Check and store primers and tags
infos, primer_list = read_info_view(info_view, max_errors=config['ngsfilter']['error'], verbose=False, not_aligned=not_aligned) # TODO obi verbose option
aligner = Primer_search(primer_list, config['ngsfilter']['error'])
for p in infos:
p.aligner = aligner
for paired_p in infos[p]:
paired_p.aligner = aligner
if paired_p.revcomp is not None:
paired_p.revcomp.aligner = aligner
if not_aligned: # create columns used by alignpairedend tool
Column.new_column(o_view, REVERSE_SEQ_COLUMN_NAME, OBI_SEQ)
Column.new_column(o_view, REVERSE_QUALITY_COLUMN_NAME, OBI_QUAL, associated_column_name=REVERSE_SEQ_COLUMN_NAME, associated_column_version=o_view[REVERSE_SEQ_COLUMN_NAME].version)
Column.new_column(unidentified, REVERSE_SEQ_COLUMN_NAME, OBI_SEQ)
Column.new_column(unidentified, REVERSE_QUALITY_COLUMN_NAME, OBI_QUAL, associated_column_name=REVERSE_SEQ_COLUMN_NAME, associated_column_version=unidentified[REVERSE_SEQ_COLUMN_NAME].version)
g = 0
u = 0
try:
for i in range(entries_len):
pb(i)
if not_aligned:
modseq = [Nuc_Seq.new_from_stored(forward[i]), Nuc_Seq.new_from_stored(reverse[i])]
else:
modseq = [Nuc_Seq.new_from_stored(entries[i])]
good, oseq = annotate(modseq, infos)
if good:
o_view[g].set(oseq.id, oseq.seq, definition=oseq.definition, quality=oseq.quality, tags=oseq)
g+=1
elif unidentified is not None:
unidentified[u].set(oseq.id, oseq.seq, definition=oseq.definition, quality=oseq.quality, tags=oseq)
u+=1
except Exception, e:
raise RollbackException("obi ngsfilter error, rollbacking views: "+str(e), o_view, unidentified)
pb(i, force=True)
print("", file=sys.stderr)
# Save command config in View and DMS comments
command_line = " ".join(sys.argv[1:])
o_view.write_config(config, "ngsfilter", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
unidentified.write_config(config, "ngsfilter", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
# Add comment about unidentified seqs
unidentified.comments["info"] = "View containing sequences categorized as unidentified by the ngsfilter command"
output[0].record_command_line(command_line)
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_view), file=sys.stderr)
input[0].close()
output[0].close()
info_input[0].close()
unidentified_input[0].close()
aligner.free()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,144 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.dms.view.view cimport View, Line_selection
from obitools3.uri.decode import open_uri
from obitools3.apps.optiongroups import addMinimalInputOption, addMinimalOutputOption
from obitools3.dms.view import RollbackException
from obitools3.apps.config import logger
from obitools3.utils cimport str2bytes
from obitools3.dms.capi.obitypes cimport OBI_BOOL, \
OBI_CHAR, \
OBI_FLOAT, \
OBI_INT, \
OBI_QUAL, \
OBI_SEQ, \
OBI_STR, \
OBIBool_NA, \
OBIChar_NA, \
OBIFloat_NA, \
OBIInt_NA
import time
import sys
NULL_VALUE = {OBI_BOOL: OBIBool_NA,
OBI_CHAR: OBIChar_NA,
OBI_FLOAT: OBIFloat_NA,
OBI_INT: OBIInt_NA,
OBI_QUAL: [],
OBI_SEQ: b"",
OBI_STR: b""}
__title__="Sort view lines according to the value of a given attribute."
def addOptions(parser):
addMinimalInputOption(parser)
addMinimalOutputOption(parser)
group=parser.add_argument_group('obi sort specific options')
group.add_argument('--key', '-k',
action="append", dest="sort:keys",
metavar='<TAG NAME>',
default=[],
type=str,
help="Attribute used to sort the sequence records.")
group.add_argument('--reverse', '-r',
action="store_true", dest="sort:reverse",
default=False,
help="Sort in reverse order.")
def line_cmp(line, key, pb):
pb
if line[key] is None:
return NULL_VALUE[line.view[key].data_type_int]
else:
return line[key]
def run(config):
DMS.obi_atexit()
logger("info", "obi sort")
# Open the input
input = open_uri(config["obi"]["inputURI"])
if input is None:
raise Exception("Could not read input view")
i_dms = input[0]
i_view = input[1]
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output view")
o_dms = output[0]
o_view_name_final = output[1]
o_view_name = o_view_name_final
# If the input and output DMS are not the same, create output view in input DMS first, then export it
# to output DMS, making sure the temporary view name is unique in the input DMS
if i_dms != o_dms:
i=0
while o_view_name in i_dms:
o_view_name = o_view_name_final+b"_"+str2bytes(str(i))
i+=1
# Initialize the progress bar
pb = ProgressBar(len(i_view), config, seconde=5)
keys = config['sort']['keys']
selection = Line_selection(i_view)
for i in range(len(i_view)): # TODO special function?
selection.append(i)
for k in keys: # TODO order?
selection.sort(key=lambda line_idx: line_cmp(i_view[line_idx], k, pb(line_idx)), reverse=config['sort']['reverse'])
pb(len(i_view), force=True)
print("", file=sys.stderr)
# Create output view with the sorted line selection
try:
o_view = selection.materialize(o_view_name)
except Exception, e:
raise RollbackException("obi sort error, rollbacking view: "+str(e), o_view)
# Save command config in View and DMS comments
command_line = " ".join(sys.argv[1:])
input_dms_name=[input[0].name]
input_view_name=[input[1].name]
o_view.write_config(config, "sort", command_line, input_dms_name=input_dms_name, input_view_name=input_view_name)
o_dms.record_command_line(command_line)
# If input and output DMS are not the same, export the temporary view to the output DMS
# and delete the temporary view in the input DMS
if i_dms != o_dms:
o_view.close()
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], o_view_name, o_view_name_final)
o_view = o_dms[o_view_name_final]
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_view), file=sys.stderr)
# If the input and the output DMS are different, delete the temporary imported view used to create the final view
if i_dms != o_dms:
View.delete_view(i_dms, o_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,265 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.uri.decode import open_uri
from obitools3.apps.optiongroups import addMinimalInputOption, addTaxonomyOption
from obitools3.dms.view import RollbackException
from obitools3.apps.config import logger
from obitools3.dms.capi.obiview cimport COUNT_COLUMN
from functools import reduce
import math
import time
import sys
__title__="Compute basic statistics for attribute values."
'''
`obi stats` computes basic statistics for attribute values of sequence records.
The sequence records can be categorized or not using one or several ``-c`` options.
By default, only the number of sequence records and the total count are computed for each category.
Additional statistics can be computed for attribute values in each category, such as:
- minimum value (``-m`` option)
- maximum value (``-M`` option)
- mean value (``-a`` option)
- variance (``-v`` option)
- standard deviation (``-s`` option)
The result is a contingency table with the different categories in rows, and the
computed statistics in columns.
'''
# TODO: when is the taxonomy possibly used?
def addOptions(parser):
addMinimalInputOption(parser)
addTaxonomyOption(parser)
group=parser.add_argument_group('obi stats specific options')
group.add_argument('-c','--category-attribute',
action="append", dest="stats:categories",
metavar="<Attribute Name>",
default=[],
help="Attribute used to categorize the records.")
group.add_argument('-m','--min',
action="append", dest="stats:minimum",
metavar="<Attribute Name>",
default=[],
help="Compute the minimum value of attribute for each category.")
group.add_argument('-M','--max',
action="append", dest="stats:maximum",
metavar="<Attribute Name>",
default=[],
help="Compute the maximum value of attribute for each category.")
group.add_argument('-a','--mean',
action="append", dest="stats:mean",
metavar="<Attribute Name>",
default=[],
help="Compute the mean value of attribute for each category.")
group.add_argument('-v','--variance',
action="append", dest="stats:var",
metavar="<Attribute Name>",
default=[],
help="Compute the variance of attribute for each category.")
group.add_argument('-s','--std-dev',
action="append", dest="stats:sd",
metavar="<Attribute Name>",
default=[],
help="Compute the standard deviation of attribute for each category.")
def statistics(values, attributes, func):
stat={}
lstat={}
for var in attributes:
if var in values:
stat[var]={}
lstat[var]=0
for c in values[var]:
v = values[var][c]
m = func(v)
stat[var][c]=m
lm=len(str(m))
if lm > lstat[var]:
lstat[var]=lm
return stat, lstat
def minimum(values, options):
return statistics(values, options['minimum'], min)
def maximum(values, options):
return statistics(values, options['maximum'], max)
def mean(values, options):
def average(v):
s = reduce(lambda x,y:x+y,v,0)
return float(s)/len(v)
return statistics(values, options['mean'], average)
def variance(v):
if len(v)==1:
return 0
s = reduce(lambda x,y:(x[0]+y,x[1]+y**2),v,(0.,0.))
return s[1]/(len(v)-1) - s[0]**2/len(v)/(len(v)-1)
def varpop(values, options):
return statistics(values, options['var'], variance)
def sd(values, options):
def stddev(v):
return math.sqrt(variance(v))
return statistics(values, options['sd'], stddev)
def run(config):
DMS.obi_atexit()
logger("info", "obi stats")
# Open the input
input = open_uri(config['obi']['inputURI'])
if input is None:
raise Exception("Could not read input view")
i_view = input[1]
if 'taxoURI' in config['obi'] and config['obi']['taxoURI'] is not None:
taxo_uri = open_uri(config['obi']['taxoURI'])
if taxo_uri is None:
raise Exception("Couldn't open taxonomy")
taxo = taxo_uri[1]
else :
taxo = None
statistics = set(config['stats']['minimum']) | set(config['stats']['maximum']) | set(config['stats']['mean'])
total = 0
catcount={}
totcount={}
values={}
lcat=0
# Initialize the progress bar
pb = ProgressBar(len(i_view), config, seconde=5)
for i in range(len(i_view)):
pb(i)
line = i_view[i]
category = []
for c in config['stats']['categories']:
try:
if taxo is not None:
loc_env = {'sequence': line, 'line': line, 'taxonomy': taxo}
else:
loc_env = {'sequence': line, 'line': line}
v = eval(c, loc_env, line)
lv=len(str(v))
if lv > lcat:
lcat=lv
category.append(v)
except:
category.append(None)
if 4 > lcat:
lcat=4
category=tuple(category)
catcount[category]=catcount.get(category,0)+1
try:
totcount[category]=totcount.get(category,0)+line[COUNT_COLUMN]
except KeyError:
totcount[category]=totcount.get(category,0)+1
for var in statistics:
if var in line:
v = line[var]
if var not in values:
values[var]={}
if category not in values[var]:
values[var][category]=[]
values[var][category].append(v)
pb(i, force=True)
print("", file=sys.stderr)
mini, lmini = minimum(values, config['stats'])
maxi, lmaxi = maximum(values, config['stats'])
avg, lavg = mean(values, config['stats'])
varp, lvarp = varpop(values, config['stats'])
sigma, lsigma = sd(values, config['stats'])
pcat = "%%-%ds" % lcat
if config['stats']['minimum']:
minvar= "min_%%-%ds" % max(len(x) for x in config['stats']['minimum'])
else:
minvar= "%s"
if config['stats']['maximum']:
maxvar= "max_%%-%ds" % max(len(x) for x in config['stats']['maximum'])
else:
maxvar= "%s"
if config['stats']['mean']:
meanvar= "mean_%%-%ds" % max(len(x) for x in config['stats']['mean'])
else:
meanvar= "%s"
if config['stats']['var']:
varvar= "var_%%-%ds" % max(len(x) for x in config['stats']['var'])
else:
varvar= "%s"
if config['stats']['sd']:
sdvar= "sd_%%-%ds" % max(len(x) for x in config['stats']['sd'])
else:
sdvar= "%s"
hcat = "\t".join([pcat % x for x in config['stats']['categories']]) + "\t" +\
"\t".join([minvar % x for x in config['stats']['minimum']]) + "\t" +\
"\t".join([maxvar % x for x in config['stats']['maximum']]) + "\t" +\
"\t".join([meanvar % x for x in config['stats']['mean']]) + "\t" +\
"\t".join([varvar % x for x in config['stats']['var']]) + "\t" +\
"\t".join([sdvar % x for x in config['stats']['sd']]) + \
"\t count" + \
"\t total"
print(hcat)
for c in catcount:
for v in c:
print(pcat % str(v)+"\t", end="")
for m in config['stats']['minimum']:
print((("%%%dd" % lmini[m]) % mini[m][c])+"\t", end="")
for m in config['stats']['maximum']:
print((("%%%dd" % lmaxi[m]) % maxi[m][c])+"\t", end="")
for m in config['stats']['mean']:
print((("%%%df" % lavg[m]) % avg[m][c])+"\t", end="")
for m in config['stats']['var']:
print((("%%%df" % lvarp[m]) % varp[m][c])+"\t", end="")
for m in config['stats']['sd']:
print((("%%%df" % lsigma[m]) % sigma[m][c])+"\t", end="")
print("%7d" %catcount[c], end="")
print("%9d" %totcount[c])
input[0].close()
logger("info", "Done.")

View File

@ -0,0 +1,103 @@
../../../src/obi_lcs.h
../../../src/obi_lcs.c
../../../src/obierrno.h
../../../src/obierrno.c
../../../src/upperband.h
../../../src/upperband.c
../../../src/sse_banded_LCS_alignment.h
../../../src/sse_banded_LCS_alignment.c
../../../src/obiblob.h
../../../src/obiblob.c
../../../src/utils.h
../../../src/utils.c
../../../src/obidms.h
../../../src/obidms.c
../../../src/libjson/json_utils.h
../../../src/libjson/json_utils.c
../../../src/libjson/cJSON.h
../../../src/libjson/cJSON.c
../../../src/obiavl.h
../../../src/obiavl.c
../../../src/bloom.h
../../../src/bloom.c
../../../src/crc64.h
../../../src/crc64.c
../../../src/murmurhash2.h
../../../src/murmurhash2.c
../../../src/obidmscolumn.h
../../../src/obidmscolumn.c
../../../src/obitypes.h
../../../src/obitypes.c
../../../src/obidmscolumndir.h
../../../src/obidmscolumndir.c
../../../src/obiblob_indexer.h
../../../src/obiblob_indexer.c
../../../src/obiview.h
../../../src/obiview.c
../../../src/hashtable.h
../../../src/hashtable.c
../../../src/linked_list.h
../../../src/linked_list.c
../../../src/obidmscolumn_array.h
../../../src/obidmscolumn_array.c
../../../src/obidmscolumn_blob.h
../../../src/obidmscolumn_blob.c
../../../src/obidmscolumn_idx.h
../../../src/obidmscolumn_idx.c
../../../src/obidmscolumn_bool.h
../../../src/obidmscolumn_bool.c
../../../src/obidmscolumn_char.h
../../../src/obidmscolumn_char.c
../../../src/obidmscolumn_float.h
../../../src/obidmscolumn_float.c
../../../src/obidmscolumn_int.h
../../../src/obidmscolumn_int.c
../../../src/obidmscolumn_qual.h
../../../src/obidmscolumn_qual.c
../../../src/obidmscolumn_seq.h
../../../src/obidmscolumn_seq.c
../../../src/obidmscolumn_str.h
../../../src/obidmscolumn_str.c
../../../src/array_indexer.h
../../../src/array_indexer.c
../../../src/char_str_indexer.h
../../../src/char_str_indexer.c
../../../src/dna_seq_indexer.h
../../../src/dna_seq_indexer.c
../../../src/encode.c
../../../src/encode.h
../../../src/uint8_indexer.c
../../../src/uint8_indexer.h
../../../src/build_reference_db.c
../../../src/build_reference_db.h
../../../src/kmer_similarity.c
../../../src/kmer_similarity.h
../../../src/obi_clean.c
../../../src/obi_clean.h
../../../src/obi_ecopcr.c
../../../src/obi_ecopcr.h
../../../src/obi_ecotag.c
../../../src/obi_ecotag.h
../../../src/obidms_taxonomy.c
../../../src/obidms_taxonomy.h
../../../src/obilittlebigman.c
../../../src/obilittlebigman.h
../../../src/_sse.h
../../../src/obidebug.h
../../../src/libecoPCR/libapat/CODES/dft_code.h
../../../src/libecoPCR/libapat/CODES/dna_code.h
../../../src/libecoPCR/libapat/CODES/prot_code.h
../../../src/libecoPCR/libapat/apat_parse.c
../../../src/libecoPCR/libapat/apat_search.c
../../../src/libecoPCR/libapat/apat.h
../../../src/libecoPCR/libapat/Gmach.h
../../../src/libecoPCR/libapat/Gtypes.h
../../../src/libecoPCR/libapat/libstki.c
../../../src/libecoPCR/libapat/libstki.h
../../../src/libecoPCR/libthermo/nnparams.h
../../../src/libecoPCR/libthermo/nnparams.c
../../../src/libecoPCR/ecoapat.c
../../../src/libecoPCR/ecodna.c
../../../src/libecoPCR/ecoError.c
../../../src/libecoPCR/ecoMalloc.c
../../../src/libecoPCR/ecoPCR.h

View File

@ -0,0 +1,110 @@
#cython: language_level=3
from obitools3.apps.progress cimport ProgressBar # @UnresolvedImport
from obitools3.dms import DMS
from obitools3.dms.view.view cimport View, Line_selection
from obitools3.uri.decode import open_uri
from obitools3.apps.optiongroups import addMinimalInputOption, addMinimalOutputOption
from obitools3.dms.view import RollbackException
from obitools3.apps.config import logger
from obitools3.utils cimport str2bytes
import time
import sys
__title__="Keep the N last lines of a view."
def addOptions(parser):
addMinimalInputOption(parser)
addMinimalOutputOption(parser)
group=parser.add_argument_group('obi tail specific options')
group.add_argument('-n', '--sequence-count',
action="store", dest="tail:count",
metavar='<N>',
default=10,
type=int,
help="Number of last records to keep.")
def run(config):
DMS.obi_atexit()
logger("info", "obi tail")
# Open the input
input = open_uri(config["obi"]["inputURI"])
if input is None:
raise Exception("Could not read input view")
i_dms = input[0]
i_view = input[1]
# Open the output: only the DMS
output = open_uri(config['obi']['outputURI'],
input=False,
dms_only=True)
if output is None:
raise Exception("Could not create output view")
o_dms = output[0]
o_view_name_final = output[1]
o_view_name = o_view_name_final
# If the input and output DMS are not the same, create output view in input DMS first, then export it
# to output DMS, making sure the temporary view name is unique in the input DMS
if i_dms != o_dms:
i=0
while o_view_name in i_dms:
o_view_name = o_view_name_final+b"_"+str2bytes(str(i))
i+=1
start = max(len(i_view) - config['tail']['count'], 0)
# Initialize the progress bar
pb = ProgressBar(len(i_view) - start, config, seconde=5)
selection = Line_selection(i_view)
for i in range(start, len(i_view)):
pb(i)
selection.append(i)
pb(i, force=True)
print("", file=sys.stderr)
# Save command config in View comments
command_line = " ".join(sys.argv[1:])
comments = View.get_config_dict(config, "tail", command_line, input_dms_name=[i_dms.name], input_view_name=[i_view.name])
# Create output view with the line selection
try:
o_view = selection.materialize(o_view_name)
except Exception, e:
raise RollbackException("obi tail error, rollbacking view: "+str(e), o_view)
# Save command config in DMS comments
command_line = " ".join(sys.argv[1:])
o_view.write_config(config, "tail", command_line, input_dms_name=[i_dms.name], input_view_name=[i_view.name])
o_dms.record_command_line(command_line)
# If input and output DMS are not the same, export the temporary view to the output DMS
# and delete the temporary view in the input DMS
if i_dms != o_dms:
o_view.close()
View.import_view(i_dms.full_path[:-7], o_dms.full_path[:-7], o_view_name, o_view_name_final)
o_view = o_dms[o_view_name_final]
#print("\n\nOutput view:\n````````````", file=sys.stderr)
#print(repr(o_view), file=sys.stderr)
# If the input and the output DMS are different, delete the temporary imported view used to create the final view
if i_dms != o_dms:
View.delete_view(i_dms, o_view_name)
o_dms.close()
i_dms.close()
logger("info", "Done.")

Some files were not shown because too many files have changed in this diff Show More