Update wolf_tutorial
@ -27,7 +27,7 @@ The new database system used by the OBITools3 (called **DMS** for Data Managemen
|
|||||||
Any hybrid of those 2 works too.
|
Any hybrid of those 2 works too.
|
||||||
|
|
||||||
* View names must be unique within a DMS, in other words, views can not be overwritten.
|
* View names must be unique within a DMS, in other words, views can not be overwritten.
|
||||||
|
|
||||||
* All tools accept different input and output DMS.
|
* All tools accept different input and output DMS.
|
||||||
|
|
||||||
* If the output DMS is not given, the input DMS is used.
|
* If the output DMS is not given, the input DMS is used.
|
||||||
@ -70,7 +70,7 @@ Download this archive containing the reads and the ngs file:
|
|||||||
|
|
||||||
And unzip it:
|
And unzip it:
|
||||||
|
|
||||||
tar -zxvf wolf_tutorial.tar.gz
|
tar -zxvf wolf_tutorial.tar.gz
|
||||||
|
|
||||||
|
|
||||||
1. Import the first set of reads, with :
|
1. Import the first set of reads, with :
|
||||||
@ -139,11 +139,28 @@ Unlike the OBITools1, the OBITools3 make it possible to run ngsfilter before ali
|
|||||||
|
|
||||||
#### Build a reference database
|
#### Build a reference database
|
||||||
|
|
||||||
1. Download the sequences, or rather, just the files with mammal sequences for this tutorial:
|
Building the reference database is costly in time and disk space so you can simply download this already built one:
|
||||||
|
|
||||||
|
[v05_refs.fasta.gz](/uploads/fe71e2e103014d70a1bf9307e377ce2b/v05_refs.fasta.gz)
|
||||||
|
|
||||||
|
With the associated taxdump:
|
||||||
|
|
||||||
|
[taxdump.tar.gz](https://drive.google.com/file/d/1lMV5PWg122ZmhQtx0iq5m5iUKvVBZMUC/view?usp=sharing)
|
||||||
|
|
||||||
|
And import them (note that you could import them in another DMS):
|
||||||
|
|
||||||
|
obi import v05_refs.fasta.gz wolf/v05_refs
|
||||||
|
obi import --taxdump taxdump.tar.gz wolf/taxonomy/my_tax
|
||||||
|
|
||||||
|
You can then resume at the next part "**Clean the database**".
|
||||||
|
|
||||||
|
Otherwise, to build the database yourself from the start:
|
||||||
|
|
||||||
|
1. Download the sequences (except human and environmental samples):
|
||||||
|
|
||||||
mkdir EMBL
|
mkdir EMBL
|
||||||
cd EMBL
|
cd EMBL
|
||||||
wget -nH --cut-dirs=5 -Arel_std_mam_\*.dat.gz -m ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std/
|
wget -nH --cut-dirs=5 -A rel_std_*.dat.gz -R rel_std_hum_*.dat.gz,rel_std_env_*.dat.gz -m ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std/
|
||||||
cd ..
|
cd ..
|
||||||
|
|
||||||
2. Import the sequences in the DMS:
|
2. Import the sequences in the DMS:
|
||||||
@ -164,23 +181,33 @@ For EMBL files, you can give the path to a directory with several EMBL files.
|
|||||||
|
|
||||||
5. Use ecoPCR to simulate an *in silico* PCR with the V05 primers:
|
5. Use ecoPCR to simulate an *in silico* PCR with the V05 primers:
|
||||||
|
|
||||||
obi ecopcr -e 3 -l 50 -L 150 -F TTAGATACCCCACTATGC -R TAGAACAGGCTCCTCTAG --taxonomy wolf/taxonomy/my_tax wolf/embl_refs wolf/v05_db
|
obi ecopcr -e 3 -l 50 -L 150 -F TTAGATACCCCACTATGC -R TAGAACAGGCTCCTCTAG --taxonomy wolf/taxonomy/my_tax wolf/embl_refs wolf/v05_refs
|
||||||
|
|
||||||
#### Clean the database
|
#### Clean the database
|
||||||
|
|
||||||
1. Filter sequences so that they have a good taxonomic description at the species, genus, and family levels:
|
1. Filter sequences so that they have a good taxonomic description at the species, genus, and family levels:
|
||||||
|
|
||||||
obi grep --require-rank=species --require-rank=genus --require-rank=family --taxonomy wolf/taxonomy/my_tax wolf/v05_db wolf/v05_db_clean
|
obi grep --require-rank=species --require-rank=genus --require-rank=family --taxonomy wolf/taxonomy/my_tax wolf/v05_refs wolf/v05_refs_clean
|
||||||
|
|
||||||
2. Build the reference database specifically used by the OBITools3 to make ecotag efficient:
|
2. Dereplicate identical sequences:
|
||||||
|
|
||||||
|
obi uniq --taxonomy wolf/taxonomy/my_tax wolf/v05_refs_clean wolf/v05_refs_uniq
|
||||||
|
|
||||||
|
3. Ensure that the dereplicated sequences have a taxid at the family level:
|
||||||
|
|
||||||
|
obi grep --require-rank=family --taxonomy wolf/taxonomy/my_tax wolf/v05_refs_uniq wolf/v05_refs_uniq_clean
|
||||||
|
|
||||||
|
|
||||||
|
4. Build the reference database specifically used by the OBITools3 to make ecotag efficient:
|
||||||
|
|
||||||
|
obi build_ref_db -t 0.97 --taxonomy wolf/taxonomy/my_tax wolf/v05_refs_uniq_clean wolf/v05_db_97
|
||||||
|
|
||||||
obi build_ref_db -t 0.95 --taxonomy wolf/taxonomy/my_tax wolf/v05_db_clean wolf/v05_db_definitive
|
|
||||||
|
|
||||||
#### Assign each sequence to a taxon
|
#### Assign each sequence to a taxon
|
||||||
|
|
||||||
Once the reference database is built, taxonomic assignment can be done using the `ecotag` command:
|
Once the reference database is built, taxonomic assignment can be done using the `ecotag` command:
|
||||||
|
|
||||||
obi ecotag -m 0.95 --taxonomy wolf/taxonomy/my_tax -R wolf/v05_db_definitive wolf/cleaned_sequences wolf/assigned_sequences
|
obi ecotag -m 0.97 --taxonomy wolf/taxonomy/my_tax -R wolf/v05_db_97 wolf/cleaned_sequences wolf/assigned_sequences
|
||||||
|
|
||||||
### 8. After the taxonomic assignment
|
### 8. After the taxonomic assignment
|
||||||
|
|
||||||
@ -216,7 +243,7 @@ or convert the dot file to a png image file:
|
|||||||
|
|
||||||
You will get something like this:
|
You will get something like this:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
#### Export the results
|
#### Export the results
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user