Files
obitools4/autodoc/docmd/pkg/obiformats/fastseq_obi_header.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

2.1 KiB
Raw Blame History

OBIFormats Package: Semantic Description

The obiformats package provides parsing and formatting utilities for OBI-compliant FASTA headers, enabling structured annotation of biological sequences.

  • It supports parsing key-value annotations embedded in sequence definitions (e.g., key=value;), including nested dictionaries.

  • Three core parsing functions detect value types:

    • __match__key__: Identifies assignment patterns (Key = ...).
    • __obi_header_value_numeric_pattern__: Matches floats/integers (e.g., 42.0;).
    • __obi_header_value_string_pattern__: Matches quoted strings (e.g., 'example';).
    • __match__dict__: Parses balanced {...} blocks, handling nested structures and string delimiters.
  • Boolean detection (__is_true__/__is_false__) handles multiple case variants (e.g., true, True, TRUE).

  • The main entry point, ParseOBIFeatures(text string, annotations obiseq.Annotation),
    iteratively extracts key-value pairs from a header string and populates an Annotation map.

    • Numeric values are stored as integers if they have no fractional part.
    • Dictionary-like strings (e.g., {'a':1,'b':2}) are JSON-unmarshalled into typed maps:
      • *_countmap[string]int,
      • merged_* → wrapped in a statistics object (obiseq.StatsOnValues).
      • *_status/*_mutationmap[string]string.
  • ParseFastSeqOBIHeader(sequence *obiseq.BioSequence) applies parsing to a sequences definition line, moving annotations into its metadata map and preserving leftover text.

  • WriteFastSeqOBIHeade(buffer *bytes.Buffer, sequence) serializes annotations back into OBI header format:

    • Strings and booleans use key=value;.
    • Maps/dicts are JSON-encoded, then single-quoted for compatibility.
    • Special handling ensures obiseq.StatsOnValues are safely marshalled.
  • FormatFastSeqOBIHeader(sequence) returns the formatted header as a string (zero-copy via unsafe.String for performance).

  • Designed to interoperate with the broader OBITools4 ecosystem (obiseq, obiutils), supporting both human-readable and machine-processable sequence metadata.