This commit is contained in:
2014-01-30 13:28:35 +00:00
parent 695ab6addb
commit 1cc212e246
3 changed files with 36 additions and 5 deletions

View File

@ -9,6 +9,8 @@ Contents:
.. toctree::
:maxdepth: 2
attributes/avg_quality
attributes/complemented
attributes/count
attributes/cut
attributes/direction
@ -19,10 +21,14 @@ Contents:
attributes/forward_error
attributes/forward_match
attributes/forward_primer
attributes/forward_score
attributes/forward_tag
attributes/forward_tm
attributes/genus
attributes/genus_name
attributes/head_quality
attributes/merged_star
attributes/mid_quality
attributes/mode
attributes/occurrence
attributes/order
@ -33,16 +39,19 @@ Contents:
attributes/reverse_error
attributes/reverse_match
attributes/reverse_primer
attributes/reverse_score
attributes/reverse_tag
attributes/reverse_tm
attributes/sample
attributes/scientific_name
attributes/score
attributes/seq_length
attributes/seq_length_ori
attributes/sminL
attributes/sminR
attributes/species
attributes/species_name
attributes/status
attributes/tail_quality
attributes/taxid

View File

@ -2,11 +2,11 @@ merged_*
========
The `merged_*` attribute is built based on another attribute `*` (for example,
`sample`). `merged_*` is a dictionary with the values of the `*` attribute as
keys, and the numbers of times the sequence was observed for each key as
corresponding values. For instance, `merged_sample={'X1':12, 'X2'=10}`
means that the sequence was observed 12 times in the sample 'X1' and 10 times
in the sample 'X2'.
`sample`) by the :doc:`obiuniq <../scripts/obiuniq>` program. The value associated to the `merged_*`
attribute is a contingency table summarizing modality frequencies associated to the `*` attribute.
For instance, `merged_sample={'X1':12, 'X2':10}` means that among the 22 identical sequences merged
by the :doc:`obiuniq <../scripts/obiuniq>`, the `sample` attribute was set 12 and 10 times to the modality 'X1'
and 'X2', respectively.
Attribute added by the program:
- :doc:`obiuniq <../scripts/obiuniq>`

View File

@ -1,3 +1,25 @@
The OBITools formatted taxonomy
===============================
Management of the taxonomy
--------------------------
Filtering and annotation steps in the processing of DNA metabarcoding sequence data are greatly
eased by the explicit association of taxonomic information to sequences together with an easy
access to the taxonomy. Taxonomic information, including a taxonomic identifier, can thus be
stored in the set of attributes of each sequence record. Specifically, the `taxid` attribute
is used by the OBITools when querying taxonomic information of a sequence record, nevertheless
several OBITools commands can annotate sequence records with taxonomy-related attributes for
the user's convenience. The value of the `taxid` attribute must be a unique integer referring
unambiguously to one taxon in the taxonomic associated database. Although this is not mandatory,
the NCBI taxonomy is a preferred source of taxonomic information as the OBITools provide commands
to easily extract the full taxonomic information from it. The command `obitaxonomy` is useful to
build a taxonomic database in the OBITools format from a dump of the NCBI taxonomic database
(downloadable at the following URL: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz).
Moreover, the `obitaxonomy` command can enrich an existing taxonomy with private taxa, therefore
enabling to associate sequence records to taxa not initially present in the reference taxonomic database.
As the OBITools have access to the full taxonomic tree topology, they are able to inform higher taxonomic
levels from a taxon identifier (e.g. the family, order, class, phylum, etc. corresponding to a genus)
leading to efficient and simple annotation and querying of taxonomic information.