Commit Graph

72 Commits

Author SHA1 Message Date
15f033332c Patch a bug leading to a double pseudogene tagging
Former-commit-id: 35e27b66dc2f350b72544626da12a758b40da071
Former-commit-id: d01e79b8e7450e4aa734a8d04e81573602a58fec
2018-11-20 17:39:38 +01:00
2ff6ff3308 If proteins are looked for without stop adds an extra option
PASS1_LOOK_FOR_PSEUDO allowing for searching with stop in a second time
(Pseudogene search).

The PASS1_ALLOW_STOP is set back to 0 and the new PASS1_LOOK_FOR_PSEUDO
is set to 1

Former-commit-id: 318327af6bdc3fbdfbe7f438ff7cbea22863a0ab
Former-commit-id: a130baf2b1c3bf1158d367d3633b02600f04674a
2018-11-20 16:02:23 +01:00
a040adb132 Check the translation for stop codon and add a pseudogene qualifier if
present.

Former-commit-id: 11b612fcdfa1fdd2a2614148b5b1772954e62e70
Former-commit-id: 02c87c99e5ece530640e521a577867e74ed1541e
2018-11-20 15:59:57 +01:00
4f18ef51d0 Redirect output of pushd and popd to /dev/null
Former-commit-id: e6ce2c7387b5abd0ef3be9b58c23bbfe596a5aff
Former-commit-id: 85e9495c91660380d531efb63a8f81aa393805cf
2018-05-11 16:20:39 +02:00
c691818059 Changes in .gitignore
Former-commit-id: 7e9fcd4ed6487e52562b274d87345e0e46f1458d
Former-commit-id: bf7a58dcf96db9f8c132977aa7d3f6af97cce0f5
2018-04-05 18:15:40 +02:00
0a5a65ab26 Change the notation algorithm to take advantage of the new CAU tRNA
reference library

Former-commit-id: 32650f41c4a7f95ce5da78c1f520438b35c1d4d1
Former-commit-id: 7ee31aaed2aca437b689fc7930095279fce0051b
2018-04-05 17:59:12 +02:00
81657a288a Modify script to accept compressed genome files
Former-commit-id: f816e3ce8b10e2ca3f1aa9ae969c24e699368e25
Former-commit-id: 16fb412552debdfd2172926e8a8b63be05257bdf
2018-04-05 17:58:19 +02:00
ee634cc779 Simplify CAU tRNA reference database building to keep onlyCAU tRNA
from plastomes where the three categories of CAU tRNA (Met/Ile/fMet)
are annotated

Former-commit-id: 67dc445698e22fe8a503c6700977c79e4817d302
Former-commit-id: 6e84303543b0752a7946bdde6e5114cfe6eef8da
2018-04-05 17:55:31 +02:00
fc821d6be8 Final small changes to patch the bug related to complex filenames
Former-commit-id: c59d7b5e7f8c8f37e955e44b354521c312cfc2c4
Former-commit-id: e9d8bc55d4542b276e91104672ba7dddb53c0c6a
2018-01-25 08:53:27 +01:00
640294b47e Always a new attempt to solve the bug...
Former-commit-id: 0a5ece1e927034a7001e2e1bcd2743d9b9e3ec6d
Former-commit-id: 0aafb797b73c8beb4d8662784c8537e6f0c13c5d
2018-01-24 16:41:35 +01:00
44a75f6fd7 Comment out phase 2 CDS searching
Former-commit-id: ca048f8c762475a2ca02735a20b90576b0222462
Former-commit-id: 455ffc2945c49f701f7406930fbe2e4e166d172d
2018-01-24 16:12:49 +01:00
238b500e1a Add missing file...
Former-commit-id: f71b0396212bb8cd2df1ca1a4e5847f30c613a48
Former-commit-id: 17cc9616d8835548e996712545d4cc0e1833f90f
2018-01-24 15:13:31 +01:00
8d2ec19fe8 Patch a bug to launch exonerate on complexe filename
Former-commit-id: e8357a639a22cb123985a0ed487dfd4018c9bb0a
Former-commit-id: a2e1c2ce75c0eac9574b7a68506f6f209e54ea89
2018-01-24 15:07:04 +01:00
f74bb0d973 Patch a bug blocking the exonerate execution when the genome filename is
too long or complex

Former-commit-id: a9da8eab920f422609b41be2e16d65e0569f953c
Former-commit-id: 6829ae3081bea4a1d16ec8d3bad10e51f01f51d7
2018-01-23 07:32:12 +01:00
a25ab81b38 Add logs to print the sequence length and if the sequence is reverse
complemented

Former-commit-id: ba55f354ea7a51119fe44bcb36aa5927194293e2
Former-commit-id: dd7715be54ac92c9625f0a2c30e572b7aee76dc7
2018-01-18 22:00:07 +01:00
08d7c940a4 Patch a bug in the final sequence formating occuring when the input
sequence has not 60 char per line

Former-commit-id: 213735f5b9f3cd817053e284d7844cfdd69726c6
Former-commit-id: 074b4aaac0eac00de9b3b48e75804417ce780a2d
2018-01-18 21:58:50 +01:00
04ea0f110d Allows for reporting
Former-commit-id: af7999b0f3c69be9c796799813950adbdb0fb0e8
Former-commit-id: f8a6f2a26c58a02aa6d076bd3005a02f906de82a
2016-10-20 09:31:54 -03:00
1ac0af03c2 Patch the new ycf1 specific parameters
Former-commit-id: 66f848b351a6b8186ff03a7059aa167f39ed29a1
Former-commit-id: fd4260434739725ff967138089eaeeb013812784
2016-10-09 07:19:35 -03:00
8156d5dd2f Add specific exonerate parameters for ycf1
Former-commit-id: c956dde7ad2183b72fe5221333876747db97b361
Former-commit-id: 5ddf35ea93eadadecb063277afd513e8ae73e559
2016-10-09 07:11:20 -03:00
001c1dcac1 For a given protein consider only cluster with at list a score of 95% of
the best score

Former-commit-id: cfdc6fcd37a4036d8bcca27bc7e120e60a94998d
Former-commit-id: f45bb7922f28165fd3baa1bc67bf815a759d1590
2016-10-09 04:24:08 -03:00
54413e7420 Change awk to $AwkCmd
Signed-off-by: Eric Coissac <eric.coissac@metabarcoding.org>
Former-commit-id: 79d7c6cc4333c8f72cef71f9c5323c151bb0e6b7
Former-commit-id: 869cf28bb894c95297fc0f80e424a55d347f2a65
2016-10-09 01:25:57 -03:00
87453701b7 Change some parameters in program calls
Former-commit-id: 3ed8760844007def1d8c5a9cf4eaee01d571fe0b
Former-commit-id: b15127c8f8a601b33e09daccc645cbb8a1f23a2e
2016-10-06 12:37:57 -03:00
4992483b80 Change some blastx parametter to get better matches by taking into
account intron size and the good genetic code

Former-commit-id: 6600123fbdce2070058074e82c791c7fc260c39b
Former-commit-id: ac413cc4a49844d4fa4087107aa84680d36f3df1
2016-10-06 12:36:43 -03:00
e4f3081fa8 Switch to the speedup mode because of the slow down imposed by the new
exonarate parametters

Former-commit-id: 30f2caea735460bcc4dfa61adde72d7da2fb6f2e
Former-commit-id: 0537c77f5bc16d766b3cbd668dcd1e1711140937
2016-10-06 12:35:32 -03:00
16b5e2927d Make changes to better detect pseudo genes frameshited and annotate them
correctly

Former-commit-id: d827908d63149941538e686b48f60a132173cb80
Former-commit-id: 2841c75b415c6c8fa35a6a90e23cf82c3c84408b
2016-10-06 10:06:37 -03:00
860cd217d4 Add the management of pseudogenes
Former-commit-id: 26d91366e483cf17c440b251ab1e8ac5390699fe
Former-commit-id: 0d3d69ba351bd174fe08387a474fd1137559e38a
2016-10-06 08:56:45 -03:00
d4da1d01fd A new set of protein cleaned for the CDS detector prepared using the
clusterizecore.sh script from the detectors/cds/lib folder.

The CDS detector is now modified to use the clean.fst files.


Former-commit-id: e30a53b5b6b658388af4b2640b30e6765c729894
Former-commit-id: 3015ad50d25248fb117ab00e816b00fde1f9ba1d
2016-10-05 09:31:24 -03:00
466308267e Add a patch for chloroplast annotation when no inverted repeats are
detected

Former-commit-id: 7e3ddd41cf0d0788223382fedbf45b183974233e
Former-commit-id: e5a8ceb825f78d243e37d22cd6b2e91f403c0ee8
2016-05-02 15:32:28 +02:00
8113b80d47 Add annotation of nuclear rDNA cistron
Former-commit-id: ee54019ddddbea4d17956622968f6ce673b609e1
Former-commit-id: 5e5381cf59409ca3dc01098b0e3f330efe0a6a32
2016-05-02 10:56:40 +02:00
20d0bcfbf8 First trial to automatcally cleanup the core CDS database
Former-commit-id: dc61a61816084f385f1aa89324b08f81602b4353
Former-commit-id: ee8bf1a08e4af4f4d8d12a1e2a83c5f688e5f7e8
2016-04-25 23:41:18 +02:00
536a451510 call explicitely tcsh to workaround a path bug
Former-commit-id: e6c05a695a6872dd5fb8acd96ee031844dd21fa0
Former-commit-id: 7740135e0861b796e85fce0c9c62a4793f836c2b
2016-04-13 17:32:10 +02:00
f466f5505a Change tha dash bang of the csh shell scripts
Former-commit-id: 115a1955c5883ffd0909cb05e887f70fa561b6e6
Former-commit-id: 5e6be182d5a3ec910f5deed27014227f34bd4745
2016-04-13 16:51:58 +02:00
69434c5b86 Add the latest tcsh able to deal with large PATH (at least 4096)
Former-commit-id: 32011d9b239e2c5ed93646a8173b285f377693a3
Former-commit-id: 6e804387bfacfc4e9242ef3f7014642044f3aa2c
2016-04-13 16:21:50 +02:00
ab37af3b03 Add the name of the org.annot pipeline in the CDS inference
Former-commit-id: 497194fafc15da0d80ee7dcb4cf11551d21061bd
Former-commit-id: ea502a0d75d7ff638258a5a15b8ff759cd6e28fa
2015-12-18 08:56:55 +01:00
a4e053989b Specify the genetic code during the aragorn call.
Former-commit-id: 6f18008c34dcb33059accc02edef681a26848416
Former-commit-id: a7313f06a23a307a0384b88e3bc8a1d7b9292e07
2015-12-18 08:39:48 +01:00
cf54e7dcb1 Close #15. Actually the bug in intron location was related to a
misinterpretation of the aragorn output format. Now tRNA, and intron
location are coherent with most of the locations extracted from genbank
file with one or 2 base pairs of difference.

Former-commit-id: dac4fb731e0edaeaebde9edc5350fce38ad99601
Former-commit-id: f8a0590342aec2db1fe5deb4475b8a9380891a48
2015-12-18 08:39:04 +01:00
89c4f17fc4 Patch a bug on the generation of the location of tRNA for gene on
reverse complement strand with an intron.

Former-commit-id: 729905450d60c9b2e76ac73567b3efb09cb1bb86
Former-commit-id: 722dc77682ef3da8a746879c52072c46adb9de71
2015-11-28 16:11:14 +01:00
b7282fb30d minor addition in cds/compare
Former-commit-id: e865ea931fb2fc76f49b72d823eda712138647e3
Former-commit-id: 3d8d2bd249907fa4fbb7fae2ee06cf6090f62d5e
2015-11-15 13:13:36 +01:00
2d404b5b24 removed need of R igraph from chlorodb/subdb
Former-commit-id: 574aace9be5804d728a877110f5f475d61644f75
Former-commit-id: 2e7ea63447643830a62f18a364327d7b396ec140
2015-11-14 22:13:55 +01:00
d83201fd2f minor bug in chlorodb
Former-commit-id: 7017655ac86e7b7837c7b581bf8a1abb86c08b30
Former-commit-id: dcedd4e32e3c7ce302eed94abd2b975a4506df97
2015-11-14 15:16:16 +01:00
6f43ede11e cds test on core and shell
Former-commit-id: 9be1f2c23d00a2678489090c4f6d04ffc0124061
Former-commit-id: 823ca0890900bf6f81b158cafc46c78049fcf080
2015-11-13 22:41:34 +01:00
405631f527 cds go_test bug fixed
Former-commit-id: f73133dca83d02a0c223e98a3ac82fdb0d03c5ae
Former-commit-id: 3db7c0037f7c109f4479490480d4323a55206c6a
2015-11-13 22:37:22 +01:00
42707c281c added test for chlorodb
Former-commit-id: 639cbbdc91a6c7f11544dbbe1fa0c47e1e28eaad
Former-commit-id: 59f6ff3f727d01f3ed4d553a554b322b24119b06
2015-11-13 18:53:47 +01:00
13b03062d5 add doc for chlorodb
Former-commit-id: 55fd288275b46bd02170029b9bc683ad34c3c611
Former-commit-id: 329bedc41c4355741a5bbd3f2056ac1025f3da7f
2015-11-13 17:55:24 +01:00
e4d6a8484d cds/tools/chlorodb added
Former-commit-id: 0579e878a69b7c285ca71870e9ca5730649a2fda
Former-commit-id: 7cced5b488441d87bf070a9a444317db0e048880
2015-11-13 17:41:18 +01:00
0d5f0c1f20 summary added in comparison
Former-commit-id: 9a267727234dc9026ce3a54f543e62d8f609945a
Former-commit-id: 556a34a214aea8824f34d4a2c117f09527d88146
2015-11-11 18:51:15 +01:00
dd9b23bc77 Merge branch 'master' of git.metabarcoding.org:org-asm/org-annotate
Former-commit-id: 14936719198c993d2e38b2c4d8f78dfa5c46c0b4
Former-commit-id: 8e795225e073e077fbc6835b9d9746b6c6ed95cf
2015-11-10 22:15:29 +01:00
9108ce75f1 fixed too many partial CDS bug
Former-commit-id: d733a46f4e92f755f38e452f03a28062de6739f1
Former-commit-id: 36bdc324d2b9a0491d07d40a7e68a4cf7ea73984
2015-11-10 22:15:01 +01:00
262995a486 added cds/tools compare and chrlorodb
Former-commit-id: 31633e6bc503eb08ddb507e58e3ab1a6d2ba6027
Former-commit-id: 830c771b5453f3482f283ee069234a34127bf08f
2015-11-10 09:29:05 +01:00
813e3958ba Change minimum length for considering a match from 1000 to 100.
Former-commit-id: 6bd827ca011ee71d83e98710edc837f56a089875
Former-commit-id: 454f080c0b163f238951541eec23b5946f914f28
2015-11-09 17:35:59 +01:00