1 line
595 KiB
JSON
1 line
595 KiB
JSON
|
|
[{"id":0,"href":"/obidoc/docs/cookbook/illumina/","title":"Analysing an Illumina data set","section":"Cookbook","content":" The wolf diet tutorial # Here is a short tutorial for analyzing metabarcoding data, on an Illumina dataset from a wolf diet study, using the OBITools 4 and basic unix commands. It presents the following analysis steps:\nPairing (i.e. partial alignment) of forward and reverse reads Exclusion of unpaired reads Reads demultiplexing (i.e. assignment to their original sample) Reads dereplication Dataset denoising Sequence taxonomic assignment Exporting the results in a tabular format The dataset to analyze and the reference database # The dataset used in this tutorial corresponds to data obtained from the analysis of four wolf scats using the protocol published in ( Citation: Shehzad,\u0026#32;Riaz \u0026amp; al.,\u0026#32;2012 Shehzad,\u0026#32; W.,\u0026#32; Riaz,\u0026#32; T.,\u0026#32; Nawaz,\u0026#32; M.,\u0026#32; Miquel,\u0026#32; C.,\u0026#32; Poillot,\u0026#32; C.,\u0026#32; Shah,\u0026#32; S.,\u0026#32; Pompanon,\u0026#32; F.,\u0026#32; Coissac,\u0026#32; E.\u0026#32;\u0026amp;\u0026#32;Taberlet,\u0026#32; P. \u0026#32; (2012). \u0026#32;Carnivore diet analysis based on next-generation sequencing: application to the leopard cat (Prionailurus bengalensis) in Pakistan: LEOPARD CAT DIET. Molecular ecology,\u0026#32;21(8).\u0026#32;1951–1965. https://doi.org/10.1111/j.1365-294X.2011.05424.x ) for carnivore diet assessment. After extraction of DNA from feces, DNA amplification was performed using the Vert01 primers (TTAGATACCCCACTATGC and TAGAACAGGCTCCTCTAG amplifying the 12S-V5 region ( Citation: Riaz,\u0026#32;Shehzad \u0026amp; al.,\u0026#32;2011 Riaz,\u0026#32; T.,\u0026#32; Shehzad,\u0026#32; W.,\u0026#32; Viari,\u0026#32; A.,\u0026#32; Pompanon,\u0026#32; F.,\u0026#32; Taberlet,\u0026#32; P.\u0026#32;\u0026amp;\u0026#32;Coissac,\u0026#32; E. \u0026#32; (2011). \u0026#32;ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic acids research,\u0026#32;39(21).\u0026#32;e145. https://doi.org/10.1093/nar/gkr732 ) ), together with a wolf blocking oligonucleotide.\nAn archive containing all the files needed for the analysis can be downloaded by clicking here: wolf_diet_dataset\nThe downloaded archive can be unarchived using the following unix command:\ntar zxvf wolf_diet_dataset.tgz It creates a directory named wolf_data, containing the following files:\nTwo fastq files generated by the sequencing of DNA extracted and amplified from four wolf feces using the Genome Analyzer IIx plateform (Illumina) and the paired-end (2 x 108 bp) sequencing chemistry:\nwolf_F.fastq.gz with the forward sequences wolf_R.fastq.gz with the reverse sequences A csv tabular file for the reads demultiplexing step, named wolf_diet_ngsfilter.csv. This file contains the primer and tag sequences used for each sample. The tags correspond to short and specific sequences added to the 5' end of each primer to distinguish the different samples.\nA reference database in fasta format named db_v05_r117.fasta.gz, extracted from the EMBL release 117 following the procedure indicated in the tutorial build a reference database.\nWe recommend to create a new folder to store the results and separate them from the raw data:\nmkdir results Recover full length sequences from forward and reverse reads # When using the result of a paired-end sequencing with supposedly overlapping forward and reverse reads, the first step is to assemble them in order to recover the corresponding full length sequence.\nThe forward and reverse reads of the same fragment are located at the same line position in both fastq files. These two files are used as inputs by the obipairing program to assemble the forward and reverse reads. This program then returns the reconstructed sequence as output:\nobipairing --min-identity=0.8 \\ --min-overlap=10 \\ -F wolf_data/wolf_F.fastq.gz \\ -R wolf_data/wolf_R.fastq.gz \\ \u0026gt; results/wolf.fastq The --min-identity and --min-overlap options allow to discard sequences with low alignment
|