Update wolf_tutorial

Celine Mercier
2019-09-01 18:19:32 +02:00
parent 0ba6b57157
commit c616d9dada

@ -1,6 +1,6 @@
# Wolf tutorial with the OBITools3
A (cooler) remake of the infamous [wolf tutorial](https://pythonhosted.org/OBITools/wolves.html). And a work in progress.
The OBITools3 version of the [wolf tutorial](https://pythonhosted.org/OBITools/wolves.html) made for the first OBITools.
### 0.1 Before starting: the OBITools3 data structure
@ -44,28 +44,44 @@ The new database system used by the OBITools3 (called **DMS** for Data Managemen
### 0.3 Before starting: installing the OBITools3
Not working yet...
This is going to change, but for now:
Requirements: **python3, python3-venv, git, CMake**
Then you can do:
git clone https://git.metabarcoding.org/obitools/obitools3.git
cd obitools3
python3 -m venv obi3-env
. obi3-env/bin/activate
pip install cython
python setup.py install
And test the installation with:
obi test
### 1. Import the sequencing data in a DMS
Download the reads and the ngs file:
Download this archive containing the reads and the ngs file:
[wolf_F.fastq.gz](/uploads/09dada3587189c3b3a7af7024981c074/wolf_F.fastq.gz)
[wolf_tutorial.tar.gz](/uploads/9b86f67ad05815ddee14526640d81137/wolf_tutorial.tar.gz)
[wolf_R.fastq.gz](/uploads/a95dbad14b75474c8307cab56fa083ca/wolf_R.fastq.gz)
And unzip it:
tar -zxvf wolf_tutorial.tar.gz
[wolf_diet_ngsfilter.txt](/uploads/379d01fabbe9adf21d33c1fd8f5ee43c/wolf_diet_ngsfilter.txt)
1. Import the first set of reads, with :
obi import --quality-solexa wolf_tutorial/wolf_F.fastq.gz wolf/reads1
obi import --quality-solexa wolf_tutorial/wolf_F.fastq wolf/reads1
`--quality-solexa` is the appropriate fastq quality option because it's an old dataset, `wolf_tutorial/wolf_F.fastq` is the path to the file to import, `wolf` is the path to the DMS that will be automatically created, and `reads1` is the name of the view into which the file will be imported.
2. Import the second set of reads:
obi import --quality-solexa wolf_tutorial/wolf_R.fastq.gz wolf/reads2
obi import --quality-solexa wolf_tutorial/wolf_R.fastq wolf/reads2
3. Import the [ngsfilter file](https://pythonhosted.org/OBITools/scripts/ngsfilter.html) describing the primers and tags used for each sample:
@ -99,7 +115,7 @@ Unlike the OBITools1, the OBITools3 make it possible to run ngsfilter before ali
### 4. Remove unaligned sequence records
obi grep -p "mode!=b'joined'" wolf/aligned_reads wolf/good_sequences
obi grep -a mode:alignment wolf/aligned_reads wolf/good_sequences
### 5. Dereplicate reads into unique sequences
@ -109,7 +125,7 @@ Unlike the OBITools1, the OBITools3 make it possible to run ngsfilter before ali
1. First let's clean the useless metadata and keep only the `COUNT` and `merged_sample` (count by sample) tags:
obi annotate -k COUNT -k merged_sample wolf/dereplicated_sequences wolf/cleaned_metadata_sequences
obi annotate -k COUNT -k MERGED_sample wolf/dereplicated_sequences wolf/cleaned_metadata_sequences
2. Keep only the sequences having a count greater or equal to 10 and a length shorter than 80 bp:
@ -117,7 +133,7 @@ Unlike the OBITools1, the OBITools3 make it possible to run ngsfilter before ali
3. Clean the sequences from PCR/sequencing errors (sequence variants):
obi clean -s merged_sample -r 0.05 -H wolf/denoised_sequences wolf/cleaned_sequences
obi clean -s MERGED_sample -r 0.05 -H wolf/denoised_sequences wolf/cleaned_sequences
### 7. Taxonomic assignment of the sequences