Files
obitools4/doc/book/_freeze/tutorial/execute-results/epub.json

20 lines
27 KiB
JSON
Raw Normal View History

{
"hash": "86547a9298483fb00c80e5530b9c8997",
"result": {
"markdown": "# OBITools V4 Tutorial\n\nHere is a short tutorial on how to analyze DNA metabarcoding data produced on Illumina sequencers using:\n\n- the OBITools\n- some basic Unix commands\n\n## Wolves diet based on DNA metabarcoding\n\nThe data used in this tutorial correspond to the analysis of four wolf scats, using the protocol published in @Shehzad2012-pn for assessing carnivore diet. After extracting DNA from the faeces, the DNA amplifications were carried out using the primers `TTAGATACCCCACTATGC` and `TAGAACAGGCTCCTCTAG` amplifiying the *12S-V5* region [@Riaz2011-gn], together with a wolf blocking oligonucleotide.\n\nThe complete data set can be downloaded here: [the tutorial dataset](wolf_diet.tgz) \n\nOnce the data file is downloaded, using a UNIX terminal unarchive the data from the `tgz` file.\n\n\n\n\n::: {.cell}\n\n```{.bash .cell-code}\ntar zxvf wolf_diet.tgz\n```\n:::\n\n\n\n\nThat command create a new directory named `wolf_data` containing every required data files:\n\n- `fastq <fastq>` files resulting of aGA IIx (Illumina) paired-end (2 x 108 bp) \n sequencing assay of DNA extracted and amplified from four wolf faeces:\n\n - `wolf_F.fastq`\n - `wolf_R.fastq`\n\n- the file describing the primers and tags used for all samples\n sequenced:\n\n - `wolf_diet_ngsfilter.txt` The tags correspond to short and\n specific sequences added on the 5\\' end of each primer to\n distinguish the different samples\n\n- the file containing the reference database in a fasta format:\n\n - `db_v05_r117.fasta` This reference database has been extracted\n from the release 117 of EMBL using `obipcr`\n\n\n\n\n::: {.cell}\n\n:::\n\n\n\n\nTo not mix raw data and processed data a new directory called `results` is created.\n\n\n\n\n::: {.cell}\n\n```{.bash .cell-code}\nmkdir results\n```\n:::\n\n\n\n \n## Step by step analysis\n\n### Recover full sequence reads from forward and reverse partial reads\n\nWhen using the result of a paired-end sequencing assay with supposedly\noverlapping forward and reverse reads, the first step is to recover the\nassembled sequence.\n\nThe forward and reverse reads of the same fragment are *at the same line\nposition* in the two fastq files obtained after sequencing. Based on\nthese two files, the assembly of the forward and reverse reads is done\nwith the `obipairing` utility that aligns the two reads and returns the\nreconstructed sequence.\n\nIn our case, the command is:\n\n\n\n\n::: {.cell}\n\n```{.bash .cell-code}\nobipairing --min-identity=0.8 \\\n --min-overlap=10 \\\n -F wolf_data/wolf_F.fastq \\\n -R wolf_data/wolf_R.fastq \\\n > results/wolf.fastq \n```\n:::\n\n\n\n\nThe `--min-identity` and `--min-overlap` options allow\ndiscarding sequences with low alignment quality. If after the aligment,\nthe overlaping parts of the reads is shorter than 10 base pairs or the \nsimilarity over this aligned region is below 80% of identity, in the output file,\nthe forward and reverse reads are not aligned but concatenated, and the value of \nthe `mode` attribute in the sequence header is set to `joined` instead of `alignment`.\n\n### Remove unaligned sequence records\n\nUnaligned sequences (:py`mode=joined`{.interpreted-text role=\"mod\"})\ncannot be used. The following command allows removing them from the\ndataset:\n\n\n\n\n::: {.cell}\n\n```{.bash .cell-code}\nobigrep -p 'annotations.mode != \"join\"' \\\n results/wolf.fastq > results/wolf.ali.fastq\n```\n:::\n\n\n\n\nThe `-p` requires a go like expression. `annotations.mode != \"join\"` means that\nif the value of the `mode` annotation of a sequence is\ndifferent from `join`, the corresponding sequence record will be kept.\n\nThe first sequence record of `wolf.ali.fastq` can be obtained using the\nfollowing command line:\n\n\n\n\n::: {.cell}\n\n```{.bash .cell-code}\nhead -n 4 results/wolf.ali.fastq\n```\n:::\n\n\n\n\nThe folling piece of code appears on thew window of tour terminal.\n\n```\n@HELIUM_000100422_612GNAAXX:7:108:5640:3823#0/1 {\"ali
"supporting": [
"tutorial_files/figure-epub"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {
"knitr": [
"{\"type\":\"list\",\"attributes\":{},\"value\":[]}"
]
},
"preserve": null,
"postProcess": false
}
}