diff --git a/wolf_tutorial.md b/wolf_tutorial.md index a5dee6f..7e898df 100644 --- a/wolf_tutorial.md +++ b/wolf_tutorial.md @@ -1,6 +1,6 @@ # Wolf tutorial with the OBITools3 -A (cooler) remake of the infamous [wolf tutorial](https://pythonhosted.org/OBITools/wolves.html). And a work in progress. +The OBITools3 version of the [wolf tutorial](https://pythonhosted.org/OBITools/wolves.html) made for the first OBITools. ### 0.1 Before starting: the OBITools3 data structure @@ -44,28 +44,44 @@ The new database system used by the OBITools3 (called **DMS** for Data Managemen ### 0.3 Before starting: installing the OBITools3 -Not working yet... +This is going to change, but for now: + +Requirements: **python3, python3-venv, git, CMake** + +Then you can do: + + git clone https://git.metabarcoding.org/obitools/obitools3.git + cd obitools3 + python3 -m venv obi3-env + . obi3-env/bin/activate + pip install cython + python setup.py install + +And test the installation with: + + obi test ### 1. Import the sequencing data in a DMS -Download the reads and the ngs file: +Download this archive containing the reads and the ngs file: -[wolf_F.fastq.gz](/uploads/09dada3587189c3b3a7af7024981c074/wolf_F.fastq.gz) +[wolf_tutorial.tar.gz](/uploads/9b86f67ad05815ddee14526640d81137/wolf_tutorial.tar.gz) -[wolf_R.fastq.gz](/uploads/a95dbad14b75474c8307cab56fa083ca/wolf_R.fastq.gz) +And unzip it: + + tar -zxvf wolf_tutorial.tar.gz -[wolf_diet_ngsfilter.txt](/uploads/379d01fabbe9adf21d33c1fd8f5ee43c/wolf_diet_ngsfilter.txt) 1. Import the first set of reads, with : - obi import --quality-solexa wolf_tutorial/wolf_F.fastq.gz wolf/reads1 + obi import --quality-solexa wolf_tutorial/wolf_F.fastq wolf/reads1 `--quality-solexa` is the appropriate fastq quality option because it's an old dataset, `wolf_tutorial/wolf_F.fastq` is the path to the file to import, `wolf` is the path to the DMS that will be automatically created, and `reads1` is the name of the view into which the file will be imported. 2. Import the second set of reads: - obi import --quality-solexa wolf_tutorial/wolf_R.fastq.gz wolf/reads2 + obi import --quality-solexa wolf_tutorial/wolf_R.fastq wolf/reads2 3. Import the [ngsfilter file](https://pythonhosted.org/OBITools/scripts/ngsfilter.html) describing the primers and tags used for each sample: @@ -99,7 +115,7 @@ Unlike the OBITools1, the OBITools3 make it possible to run ngsfilter before ali ### 4. Remove unaligned sequence records - obi grep -p "mode!=b'joined'" wolf/aligned_reads wolf/good_sequences + obi grep -a mode:alignment wolf/aligned_reads wolf/good_sequences ### 5. Dereplicate reads into unique sequences @@ -109,7 +125,7 @@ Unlike the OBITools1, the OBITools3 make it possible to run ngsfilter before ali 1. First let's clean the useless metadata and keep only the `COUNT` and `merged_sample` (count by sample) tags: - obi annotate -k COUNT -k merged_sample wolf/dereplicated_sequences wolf/cleaned_metadata_sequences + obi annotate -k COUNT -k MERGED_sample wolf/dereplicated_sequences wolf/cleaned_metadata_sequences 2. Keep only the sequences having a count greater or equal to 10 and a length shorter than 80 bp: @@ -117,7 +133,7 @@ Unlike the OBITools1, the OBITools3 make it possible to run ngsfilter before ali 3. Clean the sequences from PCR/sequencing errors (sequence variants): - obi clean -s merged_sample -r 0.05 -H wolf/denoised_sequences wolf/cleaned_sequences + obi clean -s MERGED_sample -r 0.05 -H wolf/denoised_sequences wolf/cleaned_sequences ### 7. Taxonomic assignment of the sequences