From 962b6bafb4ae7101dff9c3a0cc557a8cbd3a8dc1 Mon Sep 17 00:00:00 2001 From: b3nw Date: Sun, 19 Apr 2026 04:01:38 +0000 Subject: [PATCH] Move project documentation to .private and exclude from git --- .gitignore | 1 + ROADMAP.md | 75 ----- docs/infrastructure/git-sync.md | 50 --- docs/infrastructure/wikijs-auth.md | 36 -- docs/schema/hierarchy.md | 41 --- docs/schema/state-schema.md | 83 ----- docs/validation/quality-control.md | 60 ---- eve-online-wiki-plan.md | 524 ----------------------------- wiki-plan-review-blockers.md | 117 ------- 9 files changed, 1 insertion(+), 986 deletions(-) create mode 100644 .gitignore delete mode 100644 ROADMAP.md delete mode 100644 docs/infrastructure/git-sync.md delete mode 100644 docs/infrastructure/wikijs-auth.md delete mode 100644 docs/schema/hierarchy.md delete mode 100644 docs/schema/state-schema.md delete mode 100644 docs/validation/quality-control.md delete mode 100644 eve-online-wiki-plan.md delete mode 100644 wiki-plan-review-blockers.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..1112a5f --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.private/ diff --git a/ROADMAP.md b/ROADMAP.md deleted file mode 100644 index 489a265..0000000 --- a/ROADMAP.md +++ /dev/null @@ -1,75 +0,0 @@ -# Implementation Roadmap: EVE Online Automated Wiki - -This roadmap tracks the progress of the automated wiki system. Status indicators: `[ ]` Todo, `[/]` In-Progress, `[x]` Done, `[!]` Blocked. - ---- - -## Phase 1: Foundation (Current Phase) -*Goal: Establish the core infrastructure, databases, and sync layers.* - -- [ ] **Infrastructure Deployment** - - [ ] Deploy PostgreSQL for Wiki.js & LangGraph Checkpointing. - - [ ] Deploy Redis for Agent Heartbeats. - - [ ] Deploy Wiki.js with read-only UI configuration. - - *Verification:* Verify all containers are running and healthy via `docker ps` or Komodo. -- [ ] **Synchronization & Auth** - - [ ] Initialize Git repository with directory structure from `docs/infrastructure/git-sync.md`. - - [ ] Configure Wiki.js Git Storage backend. - - [ ] Generate and store Wiki.js API token in environment. - - *Verification:* Successful API "Hello World" call to Wiki.js. -- [ ] **LangGraph Initialization** - - [ ] Implement `WikiState` Pydantic model (`docs/schema/state-schema.md`). - - [ ] Configure `PostgresSaver` for persistent checkpointing. - - *Verification:* Run a skeleton graph and verify state persists in Postgres. - ---- - -## Phase 2: Content Pipeline -*Goal: Implement the primary extraction and publication agents.* - -- [ ] **Agent D: ESI Data Collector** - - [ ] Implement Swagger-based ESI client with rate limiting (`docs/validation/quality-control.md`). - - [ ] Create extraction logic for Ship and Module data. -- [ ] **Agent A: Source Harvester** - - [ ] Implement MediaWiki API client for EVE University. - - [ ] Implement Google Sites parser for WCKG. -- [ ] **Agent E & F: Validation Layer** - - [ ] Implement the weighted scoring formula (`docs/validation/quality-control.md`). - - [ ] Implement the "Must Pass" numerical validation against ESI. -- [ ] **Initial Seed Run** - - [ ] Execute Agent A for a subset of 50 pages. - - [ ] Perform full validation and Git sync. - - *Verification:* Verify pages render correctly in Wiki.js with proper hierarchy. - ---- - -## Phase 3: Automated Monitoring & Updates -*Goal: Enable daily updates and patch note tracking.* - -- [ ] **Agent B: Patch Note Monitor** - - [ ] Implement RSS polling with browser-headers for EVE Online. - - [ ] Implement LLM-based diff analysis for patch notes. -- [ ] **Failure Handling & Alerts** - - [ ] Implement Tiered Response Matrix and Webhook alerts. - - [ ] Implement the "Correction Request" feedback loop. -- [ ] **Scheduling** - - [ ] Configure daily batch runs at 02:00 UTC. - ---- - -## Phase 4: Expansion & Advanced Features -*Goal: Handle major game changes and link optimization.* - -- [ ] **Cross-link Generation** - - [ ] Implement automated entity matching for wiki-linking. -- [ ] **Expansion Protocol** - - [ ] Create "Freeze Mode" logic for major CCP expansions. -- [ ] **Asset Management** - - [ ] Finalize local vs S3 asset storage for images. - ---- - -## Task Manifest (Unallocated) -- [ ] Define Skill, Item, and Faction schemas. -- [ ] Implement content merge strategy for multi-source pages. -- [ ] Document final "Human-in-the-loop" emergency runbook. diff --git a/docs/infrastructure/git-sync.md b/docs/infrastructure/git-sync.md deleted file mode 100644 index f9fe209..0000000 --- a/docs/infrastructure/git-sync.md +++ /dev/null @@ -1,50 +0,0 @@ -# Git Sync Protocol & Repository Structure - -The Git repository serves as the **Single Source of Truth (SSOT)** for all wiki content. This document defines how data is structured and synchronized. - -## 1. Repository Layout - -```text -/ -├── content/ # All Markdown wiki pages -│ ├── ships/ -│ ├── modules/ -│ └── ... -├── assets/ # Images, PDFs, and static files -│ ├── images/ -│ └── ... -├── schema/ # Shared validation schemas (JSON Schema) -├── metadata/ # Global metadata (e.g., typeID mapping table) -│ └── mapping.json -└── .wikijsignore # Files to be ignored by Wiki.js sync -``` - -## 2. Synchronization Flow - -### Primary Write Path (Agent -> Git -> Wiki.js) -1. **Agent Modification:** An agent (A, B, or C) generates or updates a Markdown file. -2. **Local Commit:** The agent commits the change to its local clone of the repository. -3. **Push to Origin:** The agent pushes to the `main` branch. -4. **Wiki.js Sync:** - - Wiki.js is configured with the "Git" storage target. - - It pulls changes from the `main` branch at a set interval (default: 5 minutes) or via Webhook. - - Wiki.js renders the new Markdown content in the UI. - -### Wiki.js to Git (Optional / Prohibited) -- **Status:** DISABLED -- **Rationale:** Since human editing is disabled, there should be no writes originating from the Wiki.js UI. This prevents merge conflicts and ensures the Agent pipeline remains the sole source of content. - -## 3. Commit Standards - -To ensure a clean audit trail, all commits must follow the Conventional Commits-style with agent identifiers: - -**Format:** `[AGENT_ID] action: description (hash: source_hash)` - -**Examples:** -- `[AGENT_A] seed: ships/caldari/condor (hash: sha256_...)` -- `[AGENT_B] update: ships/amarr/abaddon (patch: 2026-04-16)` -- `[AGENT_G] asset: images/ships/condor.png` - -## 4. Conflict Resolution -- **Strategy:** Last-Write-Wins (LWW) based on Git commit timestamp. -- **Merge Logic:** Automated merges are preferred. If a conflict occurs (rare in agent-only environments), the pipeline will halt and trigger a "System Alert" for manual intervention. diff --git a/docs/infrastructure/wikijs-auth.md b/docs/infrastructure/wikijs-auth.md deleted file mode 100644 index 470d4e1..0000000 --- a/docs/infrastructure/wikijs-auth.md +++ /dev/null @@ -1,36 +0,0 @@ -# Wiki.js API Authentication & Security Strategy - -## 1. Authentication Method -- **Token Type:** Permanent API Keys (Bearer Tokens) -- **Generation:** Generated via the Wiki.js Administration Area -> API Keys -- **Storage:** Stored as environment variables in the agent runtime environment (e.g., `WIKIJS_API_TOKEN`). - -## 2. Permission Scopes -To maintain security, the API token used by agents will be restricted to the minimum necessary scopes: - -| Scope | Requirement | Justification | -|-------|-------------|---------------| -| `write:pages` | Mandatory | Allows agents to create and update content | -| `read:pages` | Mandatory | Allows agents to check existing content before updates | -| `write:assets` | Mandatory | Allows Agent G to upload images/files | -| `read:assets` | Mandatory | Allows checking for existing assets | -| `read:tags` | Optional | Allows metadata tagging | -| `manage:system` | **Prohibited** | Agents must NOT have administrative system access | - -## 3. Token Rotation Policy -- **Frequency:** Tokens should be rotated every 90 days. -- **Process:** - 1. Generate new token in Wiki.js. - 2. Update environment variable in agent deployment (Komodo/Docker). - 3. Verify connectivity. - 4. Revoke old token. - -## 4. Write Access Control -- **Human Editing:** All human accounts in Wiki.js will be assigned to a "Read-Only" group. -- **Agent Editing:** Only the API account (associated with the token) will have write permissions. -- **Emergency Bypass:** A single "Admin" account will be maintained for emergency manual intervention, protected by 2FA. - -## 5. Security Best Practices -- **TLS:** All API calls MUST be made over HTTPS. -- **IP Whitelisting:** If possible, Wiki.js should be configured to only accept API requests from the IP of the agent runner. -- **Audit Logs:** Enable Wiki.js audit logging to track all changes made via the API token. diff --git a/docs/schema/hierarchy.md b/docs/schema/hierarchy.md deleted file mode 100644 index eafa5c0..0000000 --- a/docs/schema/hierarchy.md +++ /dev/null @@ -1,41 +0,0 @@ -# Wiki Page Hierarchy & URL Structure - -To ensure a structured and navigable wiki, all pages must follow this hierarchical pathing and categorization schema. - -## 1. Top-Level Categories - -| Directory | Content Type | URL Pattern | -|-----------|--------------|-------------| -| `ships/` | All ship hulls | `/ships/{race}/{group}/{ship_name}` | -| `modules/`| All ship modules | `/modules/{category}/{group}/{module_name}` | -| `mechanics/`| Game mechanics/guides | `/mechanics/{category}/{topic}` | -| `items/` | General items (ammo, etc) | `/items/{category}/{item_name}` | -| `skills/` | Character skills | `/skills/{category}/{skill_name}` | -| `factions/`| NPC Factions | `/factions/{faction_name}` | - -## 2. Detailed Pathing Examples - -### Ships -- **Path:** `/ships/caldari/interceptors/raptor` -- **Breadcrumb:** Ships > Caldari > Interceptors > Raptor - -### Modules -- **Path:** `/modules/shield/shield-extenders/large-shield-extender-ii` -- **Breadcrumb:** Modules > Shield > Shield Extenders > Large Shield Extender II - -### Mechanics -- **Path:** `/mechanics/combat/tracking-guide` -- **Breadcrumb:** Mechanics > Combat > Tracking Guide - -## 3. Redirect & Alias Strategy -- **TypeID Redirects:** Every page must be aliased by its ESI TypeID (e.g., `/id/603` -> `/ships/caldari/interceptors/raptor`) to allow easy linking from external tools. -- **Lowercase Enforcement:** All URLs must be strictly lowercase. -- **Slugification:** Spaces replaced by hyphens, special characters removed. - -## 4. Metadata Requirement -Each page MUST include the following metadata in its frontmatter to support the hierarchy: -```yaml -path: "ships/caldari/interceptors/raptor" -parent: "ships/caldari/interceptors" -order: 10 # Optional: for sorting in lists -``` diff --git a/docs/schema/state-schema.md b/docs/schema/state-schema.md deleted file mode 100644 index 9443c7b..0000000 --- a/docs/schema/state-schema.md +++ /dev/null @@ -1,83 +0,0 @@ -# LangGraph State Schema Definition - -This document defines the shared state object used by the LangGraph orchestration layer. - -## 1. Shared State Object (`WikiState`) - -```python -from typing import List, Optional, Dict, Annotated -from pydantic import BaseModel, Field -from datetime import datetime - -class ContentSource(BaseModel): - name: str # e.g., "eve-university", "wckg", "esi" - url: str - content_hash: str - extracted_at: datetime - -class ValidationResult(BaseModel): - category: str # "structural", "content", "numerical", "cross-reference" - passed: bool - score: float # 0.0 to 1.0 - feedback: Optional[str] = None - details: Dict = {} - -class PageMetadata(BaseModel): - page_type: str # "ship", "module", "mechanic", "guide" - source: str - source_url: str - imported_date: datetime - last_updated: datetime - last_validated: datetime - update_frequency: str - validation_score: float - categories: List[str] - -class WikiPage(BaseModel): - path: str # e.g., "ships/frigates/condor" - title: str - content_markdown: str - metadata: PageMetadata - frontmatter: Dict - assets: List[str] # List of local asset paths - -class WikiState(BaseModel): - # Core processing data - current_page: Optional[WikiPage] = None - proposed_changes: List[WikiPage] = [] - - # Pipeline tracking - sources: List[ContentSource] = [] - validation_pipeline_results: List[ValidationResult] = [] - - # Control flow - retry_count: int = 0 - max_retries: int = 3 - is_approved: bool = False - error: Optional[str] = None - - # Checkpointing metadata - thread_id: str - checkpoint_id: Optional[str] = None -``` - -## 2. Page Type Schemas - -### Ship Schema Overlay -Extends the numerical validation requirements. - -```python -class ShipData(BaseModel): - type_id: int - group: str - race: str - hull_stats: Dict[str, int] - fitting_stats: Dict[str, int] - velocity: int - skill_requirements: List[Dict[str, int]] -``` - -## 3. Storage Strategy -- **State Persistence:** Redis (via `RedisSaver` for LangGraph) -- **Content Persistence:** Git (Markdown + YAML Frontmatter) -- **Asset Persistence:** Local filesystem / S3-compatible storage diff --git a/docs/validation/quality-control.md b/docs/validation/quality-control.md deleted file mode 100644 index e9df8af..0000000 --- a/docs/validation/quality-control.md +++ /dev/null @@ -1,60 +0,0 @@ -# Quality Control & Operations Protocol - -This document defines the automated decision-making logic for content validation and the system-wide response to operational failures. - -## 1. Validation Scoring Formula (Blocker #6) - -Agent E (Validation) and Agent F (Numerical) calculate a combined **Confidence Score (0-100%)**. A score of **95% or higher** is required for auto-publication to the `main` branch. - -### 1.1 Weighted Categories - -| Category | Weight | Must Pass? | Criteria | -|----------|--------|------------|----------| -| **Numerical (ESI)** | 40% | **YES** | Stats (HP, Slots, PG/CPU) must match ESI ±0%. | -| **Structural** | 20% | **YES** | Valid YAML, required fields present, correct URL path. | -| **Relational** | 20% | NO | Internal links resolve, TypeIDs are valid. | -| **Semantic** | 20% | NO | Prose description matches structured data intent. | - -### 1.2 The "Must Pass" Rule -If any **"Must Pass"** category fails (score < 100% in that category), the total Confidence Score is immediately capped at **0%**, regardless of other categories. This prevents LLM hallucinations from overriding official game data. - -### 1.3 Scoring Logic -`Total Score = (ESI * 0.4) + (Struct * 0.2) + (Relat * 0.2) + (Semant * 0.2)` - ---- - -## 2. Failure Handling Matrix (Blocker #8) - -The system distinguishes between transient infrastructure issues and content quality failures. - -### 2.1 Tiered Response Matrix - -| Error Type | Tier | Initial Action | Max Retries | Escalation | -|------------|------|----------------|-------------|------------| -| **API Timeout / 5xx** | 1 | Exponential Backoff (1m, 5m, 15m) | 3 | Tier 3 Alert | -| **Validation Fail (70-94%)** | 2 | Regeneration with Error Feedback | 2 | Tier 3 Alert | -| **Validation Fail (<70%)** | 2 | Immediate Rejection | 1 | Tier 3 Alert | -| **Auth Failure / 401** | 3 | Halt Pipeline | 0 | **Critical System Alert** | -| **Git Conflict** | 3 | Halt Pipeline | 0 | **Critical System Alert** | - -### 2.2 Feedback Loop (Tier 2) -When a page fails validation with a score of 70-94%, Agent E sends a "Correction Request" back to the source agent (A, B, or C) containing: -1. The specific field that failed. -2. The expected value (from ESI or Schema). -3. The current value produced. - -### 2.3 Critical System Alerting -Tier 3 failures trigger a webhook notification to the system administrator. The LangGraph state is persisted as a "Suspended" checkpoint, allowing for manual inspection and resume via LangSmith. - ---- - -## 3. Rate Limiting Policy (Blocker #11) - -To ensure stability and prevent IP bans, the following global limits are enforced at the transport layer: - -| Target | Rate Limit | Burst | -|--------|------------|-------| -| **ESI API** | 20 req / sec | 50 | -| **MediaWiki API** | 2 req / sec | 5 | -| **Wiki.js API** | 5 req / sec | 10 | -| **LLM APIs** | Per Provider Tier | N/A | diff --git a/eve-online-wiki-plan.md b/eve-online-wiki-plan.md deleted file mode 100644 index a717ab1..0000000 --- a/eve-online-wiki-plan.md +++ /dev/null @@ -1,524 +0,0 @@ -# EVE Online Automated Wiki System - High Level Plan - -## 1. Wiki Software Recommendation: Wiki.js - -**Why Wiki.js:** -- Modern, open-source (AGPL-3.0), actively maintained -- First-class Docker support with official image -- REST API built-in for automated content updates -- Markdown-based editing (great for AI-generated content) -- Git-based storage option for complete version control -- Built-in search, analytics, and access controls -- Lightweight (~200MB RAM) -- **Perfect for agent-only workflow:** Can disable all human editing entirely while retaining API write access - -**Alternatives considered:** - -| Software | Pros | Cons | -|----------|------|------| -| MediaWiki | Industry standard, massive extension ecosystem | Heavy, PHP, API is less developer-friendly | -| DokuWiki | Flat file, extremely simple | No native API, dated interface | -| BookStack | Structured organization | Less suited for interconnected knowledge | -| Wiki.js | Modern API, Git sync, Docker-native, read-only UI support | Younger project, smaller community | - ---- - -## 2. System Architecture - -``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ EVE Online Wiki System │ -├─────────────────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ LangGraph Orchestration Layer │ │ -│ │ ┌───────────────────────────────────────────────────────────────┐ │ │ -│ │ │ StateGraph (Main Graph) │ │ │ -│ │ │ │ │ │ -│ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ -│ │ │ │ Source │ │ Patch │ │External │ │ ESI │ │ │ │ -│ │ │ │Harvester│ │ Monitor │ │ Monitor │ │Collector│ │ │ │ -│ │ │ │ Node │ │ Node │ │ Node │ │ Node │ │ │ │ -│ │ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ -│ │ │ └────────────┴─────────────┴─────────────┘ │ │ │ -│ │ │ │ │ │ │ -│ │ │ ┌────────▼────────┐ │ │ │ -│ │ │ │ Validation │ │ │ │ -│ │ │ │ Subgraph │ │ │ │ -│ │ │ │ (E → F → G) │ │ │ │ -│ │ │ └────────┬────────┘ │ │ │ -│ │ │ │ │ │ │ -│ │ │ ┌────────▼────────┐ │ │ │ -│ │ │ │ Git Sync │ │ │ │ -│ │ │ │ Node │ │ │ │ -│ │ │ └────────┬────────┘ │ │ │ -│ │ │ │ │ │ │ -│ │ │ ┌────────▼────────┐ │ │ │ -│ │ │ │ Wiki.js API │ │ │ │ -│ │ │ │ Node │ │ │ │ -│ │ │ └─────────────────┘ │ │ │ -│ │ └───────────────────────────────────────────────────────────────┘ │ │ -│ │ │ │ -│ │ LangGraph Features: │ │ -│ │ • Checkpointing: Durable state persistence across crashes │ │ -│ │ • Conditional Edges: Dynamic routing based on validation results │ │ -│ │ • Subgraphs: Nested validation pipeline (E→F→G) as single node │ │ -│ │ • Streaming: Real-time token output from LLM agents │ │ -│ │ • LangSmith: Built-in observability and tracing │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────────────────┘ -``` - -**Infrastructure Stack:** -- **Wiki.js** - Content management (read-only UI) -- **PostgreSQL** - Primary database for Wiki.js (production-grade, required) -- **Redis** - State persistence backend for LangGraph checkpointing -- **LangGraph** - Agent orchestration framework (MIT licensed, built on LangChain) -- **LangSmith** - Observability platform for tracing agent executions -- **Git** - Content storage backend for version control - -**Architecture Principles:** -- All content flows top-to-bottom, no exceptions -- 100% agent-only editing - no human accounts have write access -- All changes go through full validation pipeline before publication -- Complete audit trail maintained in Git storage backend -- Immutable content sources are isolated from publication layer -- Git sync acts as single source of truth for all content -- LangGraph maintains checkpoint state and full audit log via LangSmith traces -- No direct API writes from edge agents - all writes go through pipeline -- Daily batch updates for prose/content (not continuous) - scheduled at 02:00 UTC -- ESI structured data updates run on independent schedule (hourly for static data, daily for dynamic) - ---- - -## 3. Agent Specifications - -### Agent A: Initial Wiki Construction - -**Purpose:** Seed the wiki with existing content from source sites. - -**Inputs:** -- Source URLs (EVE University wiki, WCKG, CCP Support, ESI API) -- Content scope (what categories/sections to import) -- Deduplication and merge strategy - -**Process:** -1. **Source Extraction:** - - **EVE University:** Utilize MediaWiki `api.php` for structured content retrieval (avoids HTML scraping issues). - - **WCKG:** Specialized Google Sites parser for dynamic content rendering. - - **CCP Support:** Content extraction with browser headers to bypass Cloudflare challenges. -2. Extract structured content (ships, modules, mechanics) -3. Normalize content format to Markdown -4. Extract all images and references -5. Compute content hash (SHA-256) for each page and skip if unchanged since last import -6. Create wiki pages with proper hierarchy (see [Page Hierarchy Strategy](docs/schema/hierarchy.md)) -7. Tag pages with complete metadata -8. Generate cross-links between related pages -9. Pass all content to Validation Agent before publication - -**Scheduling:** One-time run (with option to replay) - ---- - -### Agent B: Patch Note Monitor - -**Purpose:** Detect EVE Online patch changes and update affected wiki pages. - -**Inputs:** -- RSS feed: `https://www.eveonline.com/rss/patch-notes` (Verified accessible via GET) -- Update frequency (recommended: daily) -- Affected page mapping (which pages relate to which game systems) - -**Process:** -1. Poll RSS feed on schedule (Standard RSS 2.0 parsing) -2. Parse new patch entries using LLM content analysis -3. Identify *exact* content changes required for affected pages -4. Generate complete revised page content, not just append sections -5. Pass proposed changes to Validation Agent -6. If validation passes, apply update automatically -7. If validation fails, retry generation or flag for system alert - ---- - -## 3.5 Infrastructure Protocols - -### Git Sync Protocol -- **Single Source of Truth:** Git repository acts as the primary storage. -- **Bi-directional Sync:** - - Agent Write -> Git Commit -> Wiki.js Push - - Wiki.js renders directly from the Git-backed storage. -- **Repository Structure:** - - `/content`: Markdown files mapping to wiki paths (e.g., `content/ships/frigates/condor.md`) - - `/assets`: Images and files mapping to local paths. -- **Commit Format:** `[AGENT_ID] update: path/to/page (hash: abc123)` - -### API Authentication -- **Strategy:** Bearer tokens with minimum scopes (`write:pages`, `write:assets`). -- **Storage:** Managed via environment variables. -- See [Wiki.js API Auth Strategy](docs/infrastructure/wikijs-auth.md) for details. - -### Shared State -- **Schema:** Managed via LangGraph `WikiState` Pydantic model. -- See [State Schema Definition](docs/schema/state-schema.md) for details. - ---- - -### Agent C: External Wiki Monitor - -**Purpose:** Track changes on source wikis and refresh content. - -**Inputs:** -- Monitored URLs and change detection rules -- Check frequency (recommended: weekly) - -**Process:** -1. Poll source sites on schedule respecting robots.txt -2. Detect new pages or modified content -3. Compare against imported content hash in local wiki -4. Ignore minor formatting/link changes -5. Generate revised page content with merged changes -6. Pass proposed changes to Validation Agent -7. Apply update automatically on validation pass - ---- - -### Agent D: ESI Data Collector - -**Purpose:** Pull official structured data directly from CCP's ESI API. - -**Inputs:** -- ESI API endpoints for ships, modules, items, skills -- Update frequency: hourly for static data (independent of daily batch), daily for dynamic data (part of 02:00 UTC batch) - -**Process:** -1. Poll ESI API on schedule with proper rate limiting -2. Extract structured game data -3. Generate or update data-driven pages automatically -4. Merge structured data with human-readable content from other sources -5. Compute content hash and skip if unchanged since last poll -6. Pass proposed changes to Validation Agent - ---- - -### Agent E: Content Validation & Review Agent - -**Purpose:** Automated quality control for all content changes. **Replaces all human review.** - -**Validation Rules:** -1. **Structural validation:** Markdown syntax, page hierarchy, metadata presence -2. **Content validation:** Factual consistency, no broken references, completeness -3. **Change validation:** Diff analysis, only expected changes applied, no unintended modifications -4. **Cross-reference validation:** All internal links resolve correctly -5. **TOS compliance:** Proper attribution included for all imported content - -**Process:** -1. Receive proposed change from upstream agent -2. Run all validation checks -3. Generate confidence score (0-100%) -4. If score > 95%: Approve for publication -5. If score 70-95%: Request regeneration with feedback -6. If score <70%: Reject change and generate system alert - ---- - -### Agent F: Numerical Validation Layer - -**Purpose:** Rule-based validation for game data, separate from LLM validation. Catches LLM hallucinations in structured data. - -**Validation Categories:** - -| Data Type | Validation Rules | Source of Truth | -|-----------|------------------|-----------------| -| Ship stats | Base HP, velocity, slots, fitting stats within ±0% of ESI | ESI API | -| Module stats | CPU, PG, range, damage multipliers | ESI API | -| Skill requirements | Prerequisites match skill tree | ESI API | -| Fitting calculations | Must pass CPU/PG budget checks | Local calculation | -| Market data | Prices non-negative, volume non-negative | ESI API | -| Links/IDs | All typeIDs resolve to valid entities | ESI API lookup | - -**Process:** -1. Extract all numerical values from proposed content -2. Cross-reference against ESI API for game data -3. Flag any discrepancies >0% for ship/module stats -4. Reject content with invalid typeIDs or broken references -5. Log all validation results for audit - -**Override Rules:** -- Numerical validation failures = auto-reject (no LLM override possible) -- Historical content (archived ships/modules) flagged for manual review - ---- - -### Agent G: Asset & Reference Handler - -**Purpose:** Centralized management of all images, links, and external references. - -**Process:** -1. Receive all extracted images and references from other agents -2. Download images to local storage (respecting copyright/attribution) -3. Rewrite all image URLs to local wiki paths -4. Rewrite all external links to reference original source -5. Add source attribution footer to all pages -6. Check for broken links on every update -7. Maintain asset integrity across all pages - ---- - -## 4. Model Intelligence Tiers - -Different agents require different levels of LLM reasoning. Using appropriate models reduces cost and improves reliability. - -| Agent | Tier | Model Requirements | Justification | -|-------|------|--------------------|---------------| -| A: Source Harvester | Low | Basic extraction model (e.g., GPT-4o-mini, Claude Haiku) | Template-based extraction, structured output format | -| B: Patch Note Monitor | High | Strong reasoning model (e.g., Claude Sonnet, GPT-4o) | Requires understanding game mechanics to map changes to pages | -| C: External Wiki Monitor | Low | Basic extraction model | Simple change detection and content extraction | -| D: ESI Data Collector | None | No LLM needed | Pure API calls, structured data, programmatic transformation | -| E: Content Validation | Medium | Balanced model (e.g., Claude Sonnet) | Needs semantic understanding but structured validation rules | -| F: Numerical Validation | None | No LLM needed | Pure rule-based, deterministic validation | -| G: Asset Handler | Low | Basic model for categorization | Mostly file operations, minimal reasoning | - -**Recommended Model Stack:** -- **High reasoning:** Claude 3.5 Sonnet / GPT-4o (Agents B, E) -- **Low cost:** Claude 3.5 Haiku / GPT-4o-mini (Agents A, C, G) -- **No LLM:** Agents D, F (programmatic only) - -**Daily Batch Cost Estimate:** -With daily updates (not continuous), typical daily operations: -- ~10-20 patch note analyses (Agent B): ~$0.10-0.30 -- ~50-100 content validations (Agent E): ~$0.50-1.00 -- ~100-200 extractions (Agents A, C, G): ~$0.10-0.20 -- **Daily total: ~$0.70-1.50** (note: subscription tiers have rate limits and token caps; bulk operations like initial import may temporarily exceed these, requiring pay-per-use fallback) - ---- - -## 5. Content Schema Per Page Type - -Each page type has a template with required fields that Agent E validates structurally before semantic validation. - -### Ship Page Template -```yaml -page_type: ship -required_fields: - - name: string - - type_id: integer (ESI) - - group: string (e.g., "Interceptor", "Battleship") - - race: string (Caldari, Minmatar, Amarr, Gallente) - - hull_stats: - hp_shield: integer - hp_armor: integer - hp_structure: integer - - fitting_stats: - cpu_output: integer - powergrid_output: integer - high_slots: integer - med_slots: integer - low_slots: integer - rig_slots: integer - - velocity: integer - - skill_requirements: list[{skill_id: integer, level: integer}] - - description: string (prose, sourced from external wiki or generated) -optional_fields: - - role_bonus: string - - ship_bonus: list[string] - - capacitor_capacity: integer - - targeting_range: integer - - drone_bandwidth: integer - - probe_launcher_fitting: boolean -``` - -### Module Page Template -```yaml -page_type: module -required_fields: - - name: string - - type_id: integer (ESI) - - group: string (e.g., "Shield Booster", "Afterburner") - - slot: string (high, mid, low, rig) - - cpu_usage: integer - - powergrid_usage: integer - - description: string -optional_fields: - - duration: integer - - range: integer - - damage_multiplier: float - - skill_requirements: list[{skill_id: integer, level: integer}] - - meta_level: integer - - tech_level: integer (1 or 2) -``` - -### Mechanic/Guide Page Template -```yaml -page_type: mechanic -required_fields: - - title: string - - summary: string (1-3 sentences) - - categories: list[string] - - source: string (eve-university | wckg | ccp | generated) - - last_reviewed: date -optional_fields: - - related_ships: list[string] - - related_modules: list[string] - - related_mechanics: list[string] - - difficulty: string (beginner | intermediate | advanced) -``` - -### Validation Against Schema -Agent E enforces: -1. All `required_fields` present and non-empty -2. All integer fields contain valid integers (no strings, no nulls) -3. All `type_id` fields pass Agent F numerical validation against ESI -4. All `skill_requirements` reference valid typeIDs -5. Page type matches one of the defined templates (reject unknown types) - ---- - -## 6. Agent Health Monitoring - -LangGraph provides built-in checkpointing for state persistence, but agent-level health monitoring requires a separate heartbeat system. - -**Heartbeat Protocol:** -- Each LangGraph node (agent) sends a `HEARTBEAT` message to Redis every 60 seconds during active operation, every 5 minutes when idle -- Heartbeat payload: `{ node_name, status: healthy|degraded|error, thread_id, last_completed_at, checkpoint_id }` -- Heartbeat registry uses Redis with TTL (3x interval for stale, 10x for dead) - -**LangGraph Checkpointing:** -- LangGraph's `MemorySaver` or `PostgresSaver` persists graph state at each step -- Workflows resume exactly where they left off after crashes -- Checkpoint TTL configurable (24-48 hours for batch workflows, session-based for conversational) - -**Staleness Detection:** -- If no heartbeat received within 3x the expected interval → mark agent as `stale` -- If no heartbeat received within 10x the expected interval → mark agent as `dead` and trigger critical alert -- Stale nodes: LangGraph checkpoint indicates last state, new invocations wait for recovery -- Dead nodes: halt dependent pipeline stages, escalate alert - -**LangSmith Integration:** -- Every LLM call, tool invocation, and state transition emits traces to LangSmith -- QueryLangSmith audit logs for execution history, latency, token usage -- Alerts configured via LangSmith webhooks for validation failures - -**Alerting:** -- Agent status transitions emit events to the audit log -- Critical alerts (dead node, repeated validation failures, checkpoint gaps > threshold) notify via configured channel (webhook, email, etc.) - ---- - -## 7. Standard Page Metadata - -All pages will include standard frontmatter: -```yaml -source: eve-university | wckg | ccp | esi | generated -source_url: https://... -imported_date: 2026-04-16 -last_updated: 2026-04-16 -last_validated: 2026-04-16 -update_frequency: daily | weekly | monthly -validation_score: 98 -categories: [ships, pvp, modules, industry] -``` - ---- - -## 8. Implementation Phases - -### Phase 0: Pre-Work & Compliance -- Confirm scraping TOS with source wiki maintainers -- Implement rate limiting and proper User-Agent headers -- Define metadata schema and validation rules -- Test content extraction on sample pages - -### Phase 1: Foundation -- Deploy PostgreSQL database via Docker (production configuration) -- Deploy Redis instance for LangGraph checkpointing + heartbeat registry -- Deploy Wiki.js via Docker (connected to PostgreSQL) -- **Disable all human write permissions** - configure API-only write access -- Configure Git storage backend for complete change history -- Configure Git sync layer as single source of truth -- Set up HTTPS and domain routing -- Establish automated backup strategy -- Deploy LangGraph with `StateGraph` defining all agent nodes and edges -- Configure LangSmith for observability (tracing, audit logs) -- Deploy agent heartbeat monitoring (Redis TTL registry) - -### Phase 2: Content Pipeline -- Deploy Source Harvester Agent -- Deploy Validation Agent -- Deploy Asset Handler Agent -- Deploy ESI Data Collector Agent -- Execute initial import with full validation pipeline -- Establish content quality baseline - -### Phase 2.5: Smoke Test -- Run Agent A on 50 representative pages across all page types (ships, modules, mechanics, guides) -- Pass all 50 pages through the full validation pipeline (Agents E + F + G) -- Calibrate validation thresholds based on results (adjust confidence scoring weights) -- Verify merge logic when ESI data and external wiki content overlap on same pages -- Confirm Git sync round-trip: write → Git → Wiki.js render matches expected output -- Identify and fix integration bugs before full import -- Document baseline validation pass rate and failure patterns - -### Phase 3: Automated Monitoring -- Deploy Patch Note Monitor Agent -- Implement LLM-based patch parsing and content generation -- Configure validation thresholds -- Test end-to-end update workflow - -### Phase 4: External Change Tracking -- Deploy External Wiki Monitor Agent -- Configure source site monitoring -- Implement change detection and merge logic -- Set up system alerting for failures - -### Phase 5: Major Expansion Handling -- Create expansion detection webhook (CCP announces expansions 2-4 weeks ahead) -- Build bulk update workflow for expansion releases -- Implement "freeze" mode during expansion deployment (content locked until ESI stabilizes) -- Create post-expansion audit job to verify all affected pages -- Document expansion runbook for manual triggering - -**Expansion Workflow:** -1. Expansion announced → Create tracking ticket -2. Expansion deploys → Freeze wiki updates, wait for ESI stability (typically 24-48h) -3. Run bulk ESI sync → Update all ship/module/item pages -4. Run Patch Note Agent → Process expansion notes, generate new pages -5. Run full validation → All pages validated against new ESI data -6. Unfreeze → Resume daily batch updates - ---- - -## 9. Validation Questions - -### Wiki Infrastructure -468: 1. **Hosting requirements:** What server/container host will run this? (RAM/CPU allocation) -469: 2. **Access & secrets management:** Plan for storing ESI credentials, Git credentials, and Wiki.js API tokens in a secrets manager (e.g., Vault, AWS Secrets Manager). -470: 3. **Backup requirements:** How many days of backup retention are required? -471: 4. **User access:** Will this wiki be public read-only, or require authentication? -472: 5. **Storage:** How much content do you anticipate? (affects storage planning) - -### Content Scope -475: 6. **Priority domains:** Should we prioritize specific game aspects? (PVP, mining, industry, nullsec, etc.) -476: 7. **Content age:** Should imported content include historical versions, or only current state? -477: 8. **Completeness threshold:** What's an acceptable import percentage? (80% of pages vs. all) - -### Agent Behavior -480: 9. **Validation threshold:** What minimum validation score should be required for auto-approval? (Recommended: 95%) -481: 10. **Conflict resolution:** If multiple sources have conflicting information, which source takes priority? -482: 11. **Update frequency:** How fresh should content be? (real-time, daily, weekly) -483: 12. **Alerting:** How should the system notify on validation failures or errors? - -### Operational -486: 13. **Monitoring access:** Do you have access to the Nginx Proxy Manager instance for SSL/proxy configuration? -487: 14. **Container management:** Will you use Komodo or another container management platform, or manual Docker? -488: 15. **Error handling:** Should the system pause and alert on repeated failures, or continue with skipped items? - -### 10. Next Steps - -Once questions are answered, I can: -1. Provide detailed Docker Compose configuration for Wiki.js with read-only UI and secrets integration -2. Design the LangGraph StateGraph specification (node definitions, edge conditions, state schema) -3. Define the patch-note-to-wiki mapping schema -4. Create the content import runbook for Agent A -5. Implement the standard metadata schema and validation rules -6. Configure LangSmith dashboards for wiki content monitoring \ No newline at end of file diff --git a/wiki-plan-review-blockers.md b/wiki-plan-review-blockers.md deleted file mode 100644 index df71dbb..0000000 --- a/wiki-plan-review-blockers.md +++ /dev/null @@ -1,117 +0,0 @@ -# EVE Online Wiki Plan Review - Implementation Blockers - -Review completed 2026-04-16. Legal issues excluded per request. - ---- - -## ✅ RESOLVED BLOCKERS - -### 1. State Schema Definition -- **Status**: RESOLVED -- **Reference**: [State Schema Definition](docs/schema/state-schema.md) - -### 2. Wiki.js API Authentication Flow -- **Status**: RESOLVED -- **Reference**: [Wiki.js API Auth Strategy](docs/infrastructure/wikijs-auth.md) - -### 3. ESI Client Specification -- **Status**: RESOLVED -- **Reference**: [ESI Client Design](docs/infrastructure/esi-client.md) - -### 4. Content Hash Algorithm -- **Status**: RESOLVED -- **Outcome**: SHA-256 standardized. - -### 5. Git Sync Layer Specification -- **Status**: RESOLVED -- **Reference**: [Git Sync Protocol](docs/infrastructure/git-sync.md) - -### 6. Validation Agent Scoring Formula -- **Status**: RESOLVED -- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md) -- **Outcome**: Defined 4-category weighted scoring with "Must Pass" logic. - -### 7. Page Hierarchy Specification -- **Status**: RESOLVED -- **Reference**: [Wiki Page Hierarchy](docs/schema/hierarchy.md) - -### 8. Failure Handling Behavior -- **Status**: RESOLVED -- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md) -- **Outcome**: Established 3-Tier response matrix and feedback loop logic. - -### 11. Rate Limiting Specifications -- **Status**: RESOLVED -- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md) -- **Outcome**: Defined specific req/sec limits for ESI, MediaWiki, and Wiki.js. - ---- - -## 🚨 CRITICAL BLOCKERS (Cannot start implementation without these) - -*No critical blockers remain.* - ---- - -## 🟡 MEDIUM IMPACT ISSUES (Need resolution before Phase 2) - ---- - -## 🟡 MEDIUM IMPACT ISSUES (Need resolution before Phase 2) - -### 9. Cross-link Generation Logic Missing -- **Location**: Agent A (line 114) -- **Issue**: "Generate cross-links between related pages" has no logic defined -- **Missing**: Link extraction rules, entity matching strategy, and placement rules - -### 10. Content Merge Strategy Undefined -- **Location**: Agent D (line 172) -- **Issue**: "Merge structured data with human-readable content" has no strategy -- **Impact**: Cannot resolve conflicts between ESI data and wiki content - -### 11. No Rate Limiting Specifications -- **Location**: All external agents -- **Issue**: Only Agent A specifies rate limiting (1 request/second) -- **Missing**: Rate limits for: - - ESI API - - External wiki scraping - - Wiki.js API writes - - Git operations - -### 12. Page Template Schema Incomplete -- **Location**: Content schema section -- **Missing fields**: - - Ship pages: capacitor, targeting, drone stats marked optional but required for validation - - No schema for skill pages, item pages, or faction pages - ---- - -## 🟢 MINOR ISSUES / CLARIFICATIONS NEEDED - -### 13. LangGraph Checkpoint Implementation -- **Location**: Line 361 -- **Issue**: Mentions both `MemorySaver` and `PostgresSaver` but doesn't specify which to use -- **Recommendation**: Use `PostgresSaver` for production durability - -### 14. Agent Scheduling Mechanism -- **Location**: All agent specifications -- **Issue**: No specification for how scheduled agents will be triggered -- **Options**: Cron jobs, LangGraph timers, external scheduler - -### 15. Asset Storage Strategy -- **Location**: Agent G -- **Issue**: "Download images to local storage" doesn't specify storage backend -- **Options**: Wiki.js asset manager, S3, local filesystem - ---- - -## 📋 RECOMMENDED NEXT STEPS - -To unblock implementation immediately, resolve these **4 critical items first**: - -1. Define the shared `State` schema for LangGraph -2. Specify Wiki.js API authentication strategy -3. Define ESI client implementation requirements -4. Specify content hashing algorithm and storage - -No showstopper architectural issues were identified. The design is sound and follows best practices for agent orchestration with LangGraph. \ No newline at end of file