mirror of https://github.com/metabarcoding/obitools4.git synced 2026-05-01 04:20:40 +00:00

Files

T

Eric Coissac 8c7017a99d ⬆️ version bump to v4.5

- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)

2026-04-13 13:34:53 +02:00

11 KiB

Raw Blame History

Objective

Fully document OBITools (version 4, written in Go) in English, using a 4‑phase incremental pipeline.

You MUST use the available MCP servers:

cclsp – exact definitions, references, diagnostics
jcodemunch – code indexing, symbol extraction
treesitter – AST and CLI parsing
context7 – external documentation

All tool calls must follow the exact API described in the MCP server documentation. If a required tool is unavailable, you MUST log the error and stop execution.

Tool call format (CRITICAL)

Tool calls MUST use this exact XML format — no spaces inside the angle brackets:

<function=tool_name>
{"param": "value"}
</function>

FORBIDDEN — these variants will cause parse errors and must NEVER be used:

< function=tool_name > (spaces around the tag name)
< function = tool_name> (spaces around =)
<function = tool_name> (space before =)

The opening tag is <function=tool_name> with zero spaces inside < and >.

Global rules

** You are not allowed to read twice the same file in a row. **

Language

All generated documentation MUST be in English.
If an existing documentation file is in French:
1. Translate it to English
2. Save the original as .fr.md before overwriting
3. Write the new English version

Execution mode (STRICT)

You are operating in STRICT TOOL MODE:

If a file must be written, you MUST use the Shell tool.
You MUST NOT read entire directory listings into memory.
You MUST work with one item at a time using a simple text file as a task queue.

Reading files before writing

Before writing to an existing documentation file, you must first read it using the Read tool.
When documenting a single Go source file, you only need to read that one file (plus up to 4-5 helper files if needed for context).
Do NOT read the entire codebase - only what is necessary to document the current file.

Rules

Always write the full file (no partial updates).
Paths are relative to the project root; directories are created implicitly.
Content must be valid UTF‑8; use \n line endings.
Do not wrap content in backticks.

Progress tracking: task queue files

We use line‑oriented task files to avoid loading large lists into memory. Each phase has its own task file:

docs/todo/phase1.txt – list of Go files (one per line) to document.
docs/todo/phase1bis.txt – same list, but after phase1 is done.
docs/todo/phase2.txt – list of packages.
docs/todo/phase3.txt – list of tools.

How it works:

At the start of a phase, if the task file does not exist, it is created by scanning the codebase once (Phase 0 or Phase X init).
Each run of the LLM processes only the first line of the task file.
After processing the item (success or permanent failure), the line is removed from the task file.
- On success, the line is deleted (no extra sentinel file needed).
- On transient failure (retry < 3), we keep the line but increment a retry counter stored in a separate file.
- On permanent failure (retry ≥ 3), we move the line to a failed.txt file and log the error.
The LLM then exits (or continues if the task file is still non‑empty, but it must never load more than one line).

This way, the LLM’s context never holds more than a single task at a time.

Retry mechanism

For each item (e.g., internal/align/align.go), we maintain a retry counter in:

docs/retry/phase1/internal/align/align.go.count

If the file does not exist, retries = 0.
Each time processing fails, we increment the counter (write the new number).
If after increment the counter < 3, we keep the line in the task file.
If counter reaches 3, we remove the line from the task file, add it to docs/failed/phase1/internal/align/align.go.failed (just a marker), and log the error.

Documentation quality requirements (CRITICAL)

Documentation MUST NOT be superficial. For each documented element (file, function, struct, package):

You MUST explain:

what it does
why it exists (context, problem solved)
how it is used
assumptions and preconditions
possible edge cases

Forbidden patterns

Vague phrases like “This function handles…”, “Utility for…”, “Helper function…”.
Generic descriptions that could apply to any project.

Required content per element type

Functions:
- Purpose
- Parameter meaning
- Return values
- Notable behaviour (panic conditions, side effects, concurrency)
Structs:
- Role in the system
- Meaning of key fields
Files:
- Role within the package
- Interactions with other files

Anti‑generic rule

If the description could apply to any project, it is INVALID. You MUST include domain‑specific context (bioinformatics, sequence processing, etc.) and concrete behaviour.

Quality validation

Before marking an item as done (i.e., creating the .done sentinel), you MUST perform a self‑validation:

Check that all required sections are present.
Verify that no forbidden patterns remain.

If validation fails, increment the retry counter and keep the item pending.

Directory structure

docs/
  todo/                          # task queues
    phase1.txt
    phase1bis.txt
    phase2.txt
    phase3.txt
  retry/                         # retry counters
    phase1/                      # mirrors file structure
      internal/align/align.go.count
    phase1bis/
    phase2/
    phase3/
  failed/                        # permanent failure markers
    phase1/
      internal/align/align.go.failed
    phase1bis/
    phase2/
    phase3/
  phase1/                        # actual documentation
    <relative_path>/<file>.go.md
  phase2/
    <package>.md
  phase3/
    <tool>.md
  error.log

Phase 0: Initialization

Ensure required directories exist: docs/todo, docs/retry, docs/failed, docs/phase1, docs/phase2, docs/phase3.
If docs/todo/phase1.txt does not exist:
- Use find pkg -name "*.go" ! -name "*_test.go" ! -path "*/cmd/*" to list all Go files (excluding tests and main.go).
- Write the list (one relative path per line, e.g., internal/align/align.go) to docs/todo/phase1.txt.
Do the same for phase2 and phase3 later when those phases start.
No other state is stored.

Phase 1: File documentation

Processing rule:

Read the first line of docs/todo/phase1.txt (using head -n 1).
If the file is empty, Phase 1 is complete → proceed to Phase 1bis initialization.
Otherwise, process that single file.

Processing a file:

Let relpath be the line content (e.g., internal/align/align.go).
Check if a permanent failure marker exists at docs/failed/phase1/${relpath}.failed. If yes, remove the line from the task file and skip (line will be deleted).
If the documentation file docs/phase1/${relpath}.go.md exists go directly to its validation (step 6).
Otherwise, generate documentation for that file (using MCP tools as before).
Write the documentation to docs/phase1/${relpath}.go.md.
Validate quality.
If validation succeeds:
- Remove the line from the task file.
- Remove any retry counter file for this item.
- (No sentinel needed; the removal from todo indicates completion.)
If validation fails:
- Increment retry counter:
  - If docs/retry/phase1/${relpath}.count does not exist, set to 1.
  - Else read it, add 1, write back.
- If new counter >= 3:
  - Remove line from task file.
  - Create docs/failed/phase1/${relpath}.failed.
  - Log error.
- If new counter < 3:
  - Keep the line in the task file (do nothing, it stays as first line for next run).
Exit (or stop if this was a single run). The next invocation will read the first line again (same if retry, or next if removed).

Important:

Do not read more than one line.
Do not attempt to process multiple items in one run.
The LLM should finish after handling one item.

Phase 1bis: Review and harmonization

When Phase 1 is complete (i.e., docs/todo/phase1.txt empty), we initialize docs/todo/phase1bis.txt with the same list of files (the ones that succeeded).
But note: we need to know which files were successfully documented. Since we removed lines from phase1.txt on success, we need a record. The simplest is to reuse the same list but we can generate it by listing the existing .go.md files in docs/phase1/ (since every successful file has a .go.md).
Thus, Phase 1bis initialization:

If docs/todo/phase1bis.txt does not exist, create it by listing all .go.md files under docs/phase1/, stripping the docs/phase1/ prefix and the .go.md suffix, and writing the relative path (same format as phase1).

Then processing is identical to Phase 1, but using docs/todo/phase1bis.txt and output is overwriting the same .go.md files (with improvements). Retry counters go in docs/retry/phase1bis/.

Phase 2: Package documentation

When Phase 1bis is complete (docs/todo/phase1bis.txt empty), initialize docs/todo/phase2.txt:

List all packages: unique directories under pkg/ that contain at least one .go file and are not tools.
Write each package identifier (e.g., align, internal/align) as a line.

Processing: read first line, generate docs/phase2/<package>.md, validate, remove line on success, retry logic in docs/retry/phase2/.

Phase 3: Tool documentation

When Phase 2 complete, initialize docs/todo/phase3.txt:

List all directories under cmd/ that contain a main.go. Write each tool name as a line.

Processing: read first line, generate docs/phase3/<tool>.md, validate, remove line on success, retry logic in docs/retry/phase3/.

Finalization

When all task files are empty and no pending phases, generate docs/README.md by:

Listing all package docs (files in docs/phase2/) and linking.
Listing all tool docs (files in docs/phase3/) and linking.

Write using Shell.

Execution flow summary

Phase 0: Create directories and initial todo/phase1.txt if missing. Exit.
Phase 1:
- If todo/phase1.txt exists and non‑empty → process first line.
- Else → move to Phase 1bis initialization.
Phase 1bis:
- If todo/phase1bis.txt does not exist → create from successful phase1 docs.
- If non‑empty → process first line.
- Else → move to Phase 2 initialization.
Phase 2: similar.
Phase 3: similar.
Finalization: generate README.

The LLM should be invoked repeatedly (e.g., by a scheduler) until all phases are done. Each invocation processes exactly one item.

Important reminders

Always call Shell to write files; never output content in plain text.
Validate quality before removing a line from the task file.
Log all failures to docs/error.log in JSON lines format.
If any MCP tool fails, treat as failure and increment retry counter.
Never read more than one line from a task file in a single run.

11 KiB Raw Blame History Unescape Escape