documentation updates

Former-commit-id: 39653082c9cd026299f6fcabef7446d569704e1f
2026-02-03 14:50:33 +00:00 · 2023-08-14 15:20:02 +02:00
parent 70a77c9ec6
commit 845c76abeb
30 changed files with 4303 additions and 117 deletions
--- a/doc/build/_book/search.json
+++ b/doc/build/_book/search.json
@@ -137,14 +137,14 @@
    "href": "expressions.html#function-defined-in-the-language",
    "title": "7  OBITools expression language",
    "section": "7.2 Function defined in the language",
-    "text": "7.2 Function defined in the language\n\nInstrospection functions\n\nlen(x)is a generic function allowing to retreive the size of a object. It returns the length of a sequences, the number of element in a map like annotations, the number of elements in an array. The reurned value is an int.\n\n\n\nCast functions\n\nint(x) converts if possible the x value to an integer value. The function returns an int.\nnumeric(x) converts if possible the x value to a float value. The function returns a float.\nbool(x) converts if possible the x value to a boolean value. The function returns a bool.\n\n\n\nString related functions\n\nprintf(format,...) allows to combine several values to build a string. format follows the classical C printf syntax. The function returns a string.\nsubspc(x) substitutes every space in the x string by the underscore (_) character. The function returns a string."
+    "text": "7.2 Function defined in the language\n\nInstrospection functions\n\nlen(x)\n\nIt is a generic function allowing to retreive the size of a object. It returns the length of a sequences, the number of element in a map like annotations, the number of elements in an array. The reurned value is an int.\n\ncontains(map,key)\n\nTests if the map contains a value assciated to key\n\n\n\n\nCast functions\n\nint(x)\n\nConverts if possible the x value to an integer value. The function returns an int.\n\nnumeric(x)\n\nConverts if possible the x value to a float value. The function returns a float.\n\nbool(x)\n\nConverts if possible the x value to a boolean value. The function returns a bool.\n\n\n\n\nString related functions\n\nprintf(format,...)\n\nAllows to combine several values to build a string. format follows the classical C printf syntax. The function returns a string.\n\nsubspc(x)\n\nsubstitutes every space in the x string by the underscore (_) character. The function returns a string.\n\n\n\n\nCondition function\n\nifelse(condition,val1,val2)\n\nThe condition value has to be a bool value. If it is true the function returns val1, otherwise, it is returning val2.\n\n\n\n\n7.2.1 Sequence analysis related function\n\ncomposition(sequence)\n\nThe nucleotide composition of the sequence is returned as as map indexed by a, c, g, or t and each value is the number of occurrences of that nucleotide. A fifth key others accounts for all others symboles.\n\ngcskew(sequence)\n\nComputes the excess of g compare to c of the sequence, known as the GC skew.\n\\[\nSkew_{GC}=\\frac{G-C}{G+C}\n\\]"
  },
  {
    "objectID": "expressions.html#accessing-to-the-sequence-annotations",
    "href": "expressions.html#accessing-to-the-sequence-annotations",
    "title": "7  OBITools expression language",
    "section": "7.3 Accessing to the sequence annotations",
-    "text": "7.3 Accessing to the sequence annotations\nThe annotations variable is a map object containing all the annotations associated to the currently processed sequence. Index of the map are the attribute names. It exists to possibillities to retreive an annotation. It is possible to use the classical [] indexing operator, putting the attribute name quoted by double quotes between them.\nannotations[\"direction\"]\nThe above code retreives the direction annotation. A second notation using the dot (.) is often more convenient.\nannotations.direction\nSpecial attributes of the sequence are accessible only by dedicated methods of the sequence object.\n\nThe sequence identifier : Id()\nTHe sequence definition : Definition()"
+    "text": "7.3 Accessing to the sequence annotations\nThe annotations variable is a map object containing all the annotations associated to the currently processed sequence. Index of the map are the attribute names. It exists to possibillities to retreive an annotation. It is possible to use the classical [] indexing operator, putting the attribute name quoted by double quotes between them.\nannotations[\"direction\"]\nThe above code retreives the direction annotation. A second notation using the dot (.) is often more convenient.\nannotations.direction\nSpecial attributes of the sequence are accessible only by dedicated methods of the sequence object.\n\nThe sequence identifier : Id()\nTHe sequence definition : Definition()\n\nsequence.Id()"
  },
  {
    "objectID": "comm_metabarcode_design.html#obipcr",
@@ -174,6 +174,13 @@
    "section": "10.2 obitag",
    "text": "10.2 obitag"
  },
+  {
+    "objectID": "comm_annotation.html#obitagpcr",
+    "href": "comm_annotation.html#obitagpcr",
+    "title": "10  Sequence annotations",
+    "section": "10.3 obitagpcr",
+    "text": "10.3 obitagpcr"
+  },
  {
    "objectID": "comm_computation.html#obipairing",
    "href": "comm_computation.html#obipairing",
@@ -214,7 +221,7 @@
    "href": "comm_sampling.html#obigrep-filters-sequence-files-according-to-numerous-conditions",
    "title": "12  Sequence sampling and filtering",
    "section": "12.1 obigrep – filters sequence files according to numerous conditions",
-    "text": "12.1 obigrep – filters sequence files according to numerous conditions\nThe obigrep command is somewhat analogous to the standard Unix grep command. It selects a subset of sequence records from a sequence file. A sequence record is a complex object consisting of an identifier, a set of attributes (a key, defined by its name, associated with a value), a definition, and the sequence itself. Instead of working text line by text line like the standard Unix tool, obigrep selection is done sequence record by sequence record. A large number of options allow you to refine the selection on any element of the sequence. obigrep allows you to specify multiple conditions simultaneously (which take on the value TRUE or FALSE) and only those sequence records which meet all conditions (all conditions are TRUE) are selected. obigrep is able to work on two paired read files. The selection criteria apply to one or the other of the readings in each pair depending on the mode chosen (--paired-mode option). In all cases the selection is applied in the same way to both files, thus maintaining their consistency.\n\n12.1.1 The options usable with obigrep\n\n12.1.1.1 Selecting sequences based on their caracteristics\nSequences can be selected on several of their caracteristics, their length, their id, their sequence. Options allow for specifying the condition if selection.\n\n--min-count | -c COUNT\n\nonly sequences reprensenting at least COUNT reads will be selected. That option rely on the count attribute. If the count attribute is not defined for a sequence record, it is assumed equal to \\(1\\).\n\n--max-count | -C COUNT\n\nonly sequences reprensenting no more than COUNT reads will be selected. That option rely on the count attribute. If the count attribute is not defined for a sequence record, it is assumed equal to \\(1\\).\n\nExample\n\nSelecting sequence records representing at least five reads in the dataset.\n\n\nobigrep -c 5 data_SPER01.fasta > data_norare_SPER01.fasta"
+    "text": "12.1 obigrep – filters sequence files according to numerous conditions\nThe obigrep command is somewhat analogous to the standard Unix grep command. It selects a subset of sequence records from a sequence file. A sequence record is a complex object consisting of an identifier, a set of attributes (a key, defined by its name, associated with a value), a definition, and the sequence itself. Instead of working text line by text line like the standard Unix tool, obigrep selection is done sequence record by sequence record. A large number of options allow you to refine the selection on any element of the sequence. obigrep allows you to specify multiple conditions simultaneously (which take on the value TRUE or FALSE) and only those sequence records which meet all conditions (all conditions are TRUE) are selected. obigrep is able to work on two paired read files. The selection criteria apply to one or the other of the readings in each pair depending on the mode chosen (--paired-mode option). In all cases the selection is applied in the same way to both files, thus maintaining their consistency.\n\n12.1.1 The options usable with obigrep\n\n12.1.1.1 Selecting sequences based on their caracteristics\nSequences can be selected on several of their caracteristics, their length, their id, their sequence. Options allow for specifying the condition if selection.\nSelection based on the sequence\nSequence records can be selected according if they match or not with a pattern. The simplest pattern is as short sequence (e.g AACCTT). But the usage of regular patterns allows for looking for more complex pattern. As example, A[TG]C+G matches a A, followed by a T or a G, then one or several C and endly a G.\n\n--sequence|-s PATTERN\n\nRegular expression pattern to be tested against the sequence itself. The pattern is case insensitive. A complete description of the regular pattern grammar is available here.\n\nExamples:\n\nSelects only the sequence records that contain an EcoRI restriction site.\n\n\nobigrep -s 'GAATTC' seq1.fasta > seq2.fasta\n: Selects only the sequence records that contain a stretch of at least 10 A.\nobigrep -s 'A{10,}' seq1.fasta > seq2.fasta\n: Selects only the sequence records that do not contain ambiguous nucleotides.\nobigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta\n\n--min-count | -c COUNT\n\nonly sequences reprensenting at least COUNT reads will be selected. That option rely on the count attribute. If the count attribute is not defined for a sequence record, it is assumed equal to \\(1\\).\n\n--max-count | -C COUNT\n\nonly sequences reprensenting no more than COUNT reads will be selected. That option rely on the count attribute. If the count attribute is not defined for a sequence record, it is assumed equal to \\(1\\).\n\nExamples\n\nSelecting sequence records representing at least five reads in the dataset.\n\n\nobigrep -c 5 data_SPER01.fasta > data_norare_SPER01.fasta"
  },
  {
    "objectID": "comm_utilities.html#obicount",
@@ -252,11 +259,11 @@
    "text": "The sequence iterator\nThe pakage obiter provides an iterator mecanism for manipulating sequences. The main class provided by this package is obiiter.IBioSequence. An IBioSequence iterator provides batch of sequences.\n\nBasic usage of a sequence iterator\nMany functions, among them functions reading sequences from a text file, return a IBioSequence iterator. The iterator class provides two main methods:\n\nNext() bool\nGet() obiiter.BioSequenceBatch\n\nThe Next method moves the iterator to the next value, while the Get method returns the currently pointed value. Using them, it is possible to loop over the data as in the following code chunk.\nimport (\n    \"git.metabarcoding.org/lecasofts/go/obitools/pkg/obiformats\"\n)\n\nfunc main() {\n    mydata := obiformats.ReadFastSeqFromFile(\"myfile.fasta\")\n       \n    for mydata.Next() {\n        data := mydata.Get()\n        //\n        // Whatever you want to do with the data chunk\n        //\n    }\n}\nAn obiseq.BioSequenceBatch instance is a set of sequences stored in an obiseq.BioSequenceSlice and a sequence number. The number of sequences in a batch is not defined. A batch can even contain zero sequences, if for example all sequences initially included in the batch have been filtered out at some stage of their processing.\n\n\nThe Pipable functions\nA function consuming a obiiter.IBioSequence and returning a obiiter.IBioSequence is of class obiiter.Pipable.\n\n\nThe Teeable functions\nA function consuming a obiiter.IBioSequence and returning two obiiter.IBioSequence instance is of class obiiter.Teeable."
  },
  {
-    "objectID": "annexes.html",
-    "href": "annexes.html",
+    "objectID": "annexes.html#sequence-attributes",
+    "href": "annexes.html#sequence-attributes",
    "title": "Appendix A — Annexes",
-    "section": "",
-    "text": "A.0.1 Sequence attributes\n\nA.0.1.1 Reserved sequence attributes\n\nA.0.1.1.1 ali_dir\n\nA.0.1.1.1.1 Type : string\nThe attribute can contain 2 string values \"left\" or \"right\".\n\n\nA.0.1.1.1.2 Set by the obipairing tool\nThe alignment generated by obipairing is a 3’-end gap free algorithm. Two cases can occur when aligning the forward and reverse reads. If the barcode is long enough, both the reads overlap only on their 3’ ends. In such case, the alignment direction ali_dir is set to left. If the barcode is shorter than the read length, the paired reads overlap by their 5’ ends, and the complete barcode is sequenced by both the reads. In that later case, ali_dir is set to right.\n\n\n\nA.0.1.1.2 ali_length\n\nA.0.1.1.2.1 Set by the obipairing tool\nLength of the aligned parts when merging forward and reverse reads\n\n\n\nA.0.1.1.3 count : the number of sequence occurrences\n\nA.0.1.1.3.1 Set by the obiuniq tool\nThe count attribute indicates how-many strictly identical sequences have been merged in a single record. It contains an integer value. If it is absent this means that the sequence record represents a single occurrence of the sequence.\n\n\nA.0.1.1.3.2 Getter : method Count()\nThe Count() method allows to access to the count attribute as an integer value. If the count attribute is not defined for the given sequence, the value 1 is returned\n\n\n\nA.0.1.1.4 merged_*\n\nA.0.1.1.4.1 Type : map[string]int\n\n\nA.0.1.1.4.2 Set by the obiuniq tool\nThe -m option of the obiuniq tools allows for keeping track of the distribution of the values stored in given attribute of interest. Often this option is used to summarise distribution of a sequence variant accross samples when obiuniq is run after running obimultiplex. The actual name of the attribute depends on the name of the monitored attribute. If -m option is used with the attribute sample, then this attribute names merged_sample.\n\n\n\nA.0.1.1.5 mode\n\nA.0.1.1.5.1 Set by the obipairing tool\nobitag_ref_index\n\n\nA.0.1.1.5.2 Set by the obirefidx tool.\nIt resumes to which taxonomic annotation a match to that sequence must lead according to the number of differences existing between the query sequence and the reference sequence having that tag.\n\n\nA.0.1.1.5.3 Getter : method Count()\n\n\n\nA.0.1.1.6 pairing_mismatches\n\nA.0.1.1.6.1 Set by the obipairing tool\n\n\n\nA.0.1.1.7 score\n\nA.0.1.1.7.1 Set by the obipairing tool\n\n\n\nA.0.1.1.8 score_norm\n\nA.0.1.1.8.1 Set by the obipairing tool"
+    "section": "A.1 Sequence attributes",
+    "text": "A.1 Sequence attributes\nali_dir (string)\n\nSet by the obipairing tool\nThe attribute can contain 2 string values left or right.\n\nThe alignment generated by obipairing is a 3’-end gap free algorithm. Two cases can occur when aligning the forward and reverse reads. If the barcode is long enough, both the reads overlap only on their 3’ ends. In such case, the alignment direction ali_dir is set to left. If the barcode is shorter than the read length, the paired reads overlap by their 5’ ends, and the complete barcode is sequenced by both the reads. In that later case, ali_dir is set to right.\nali_length (int)\n\nSet by the obipairing tool\n\nLength of the aligned parts when merging forward and reverse reads\ncount (int)\n\nSet by the obiuniq tool\nGetter : method Count()\nSetter : method SetCount(int)\n\nThe count attribute indicates how-many strictly identical reads have been merged in a single record. It contains an integer value. If it is absent this means that the sequence record represents a single occurrence of the sequence.\nThe Count() method allows to access to the count attribute as an integer value. If the count attribute is not defined for the given sequence, the value 1 is returned\nmerged_* (map[string]int)\n\nSet by the obiuniq tool\n\nThe -m option of the obiuniq tools allows for keeping track of the distribution of the values stored in given attribute of interest. Often this option is used to summarise distribution of a sequence variant accross samples when obiuniq is run after running obimultiplex. The actual name of the attribute depends on the name of the monitored attribute. If -m option is used with the attribute sample, then this attribute names merged_sample.\nmode (string)\n\nSet by the obipairing tool\nThe attribute can contain 2 string values join or alignment.\n\nobitag_ref_index (map[string]string)\n\nSet by the obirefidx tool.\n\nIt resumes to which taxonomic annotation a match to that sequence must lead according to the number of differences existing between the query sequence and the reference sequence having that tag.\n   {\"0\":\"9606@Homo sapiens@species\",\n    \"2\":\"207598@Homininae@subfamily\",\n    \"3\":\"9604@Hominidae@family\",\n    \"8\":\"314295@Hominoidea@superfamily\",\n    \"10\":\"9526@Catarrhini@parvorder\",\n    \"12\":\"1437010@Boreoeutheria@clade\",\n    \"16\":\"9347@Eutheria@clade\",\n    \"17\":\"40674@Mammalia@class\",\n    \"22\":\"117571@Euteleostomi@clade\",\n    \"25\":\"7776@Gnathostomata@clade\",\n    \"29\":\"33213@Bilateria@clade\",\n    \"30\":\"6072@Eumetazoa@clade\"}\npairing_mismatches (map[string]string)\n\nSet by the obipairing tool\n\nseq_a_single (int)\n\nSet by the obipairing tool\n\nseq_ab_match (int)\n\nSet by the obipairing tool\n\nseq_b_single (int)\n\nSet by the obipairing tool\n\nscore (int)\n\nSet by the obipairing tool\n\nscore_norm (float)\n\nSet by the obipairing tool\nThe value ranges between 0 and 1.\n\nScore of the alignment between forward and reverse reads expressed as a fraction of identity."
  },
  {
    "objectID": "references.html",