From 31c3083ff4a066341402e90236fd9e89d88305b7 Mon Sep 17 00:00:00 2001 From: b3nw Date: Sun, 19 Apr 2026 03:57:03 +0000 Subject: [PATCH] Initial wiki structure --- ROADMAP.md | 75 +++++ assets/images/.gitkeep | 0 content/.gitkeep | 0 deploy/docker-compose.yml | 55 +++ docs/infrastructure/git-sync.md | 50 +++ docs/infrastructure/wikijs-auth.md | 36 ++ docs/schema/hierarchy.md | 41 +++ docs/schema/state-schema.md | 83 +++++ docs/validation/quality-control.md | 60 ++++ eve-online-wiki-plan.md | 524 +++++++++++++++++++++++++++++ wiki-plan-review-blockers.md | 117 +++++++ 11 files changed, 1041 insertions(+) create mode 100644 ROADMAP.md create mode 100644 assets/images/.gitkeep create mode 100644 content/.gitkeep create mode 100644 deploy/docker-compose.yml create mode 100644 docs/infrastructure/git-sync.md create mode 100644 docs/infrastructure/wikijs-auth.md create mode 100644 docs/schema/hierarchy.md create mode 100644 docs/schema/state-schema.md create mode 100644 docs/validation/quality-control.md create mode 100644 eve-online-wiki-plan.md create mode 100644 wiki-plan-review-blockers.md diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..489a265 --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,75 @@ +# Implementation Roadmap: EVE Online Automated Wiki + +This roadmap tracks the progress of the automated wiki system. Status indicators: `[ ]` Todo, `[/]` In-Progress, `[x]` Done, `[!]` Blocked. + +--- + +## Phase 1: Foundation (Current Phase) +*Goal: Establish the core infrastructure, databases, and sync layers.* + +- [ ] **Infrastructure Deployment** + - [ ] Deploy PostgreSQL for Wiki.js & LangGraph Checkpointing. + - [ ] Deploy Redis for Agent Heartbeats. + - [ ] Deploy Wiki.js with read-only UI configuration. + - *Verification:* Verify all containers are running and healthy via `docker ps` or Komodo. +- [ ] **Synchronization & Auth** + - [ ] Initialize Git repository with directory structure from `docs/infrastructure/git-sync.md`. + - [ ] Configure Wiki.js Git Storage backend. + - [ ] Generate and store Wiki.js API token in environment. + - *Verification:* Successful API "Hello World" call to Wiki.js. +- [ ] **LangGraph Initialization** + - [ ] Implement `WikiState` Pydantic model (`docs/schema/state-schema.md`). + - [ ] Configure `PostgresSaver` for persistent checkpointing. + - *Verification:* Run a skeleton graph and verify state persists in Postgres. + +--- + +## Phase 2: Content Pipeline +*Goal: Implement the primary extraction and publication agents.* + +- [ ] **Agent D: ESI Data Collector** + - [ ] Implement Swagger-based ESI client with rate limiting (`docs/validation/quality-control.md`). + - [ ] Create extraction logic for Ship and Module data. +- [ ] **Agent A: Source Harvester** + - [ ] Implement MediaWiki API client for EVE University. + - [ ] Implement Google Sites parser for WCKG. +- [ ] **Agent E & F: Validation Layer** + - [ ] Implement the weighted scoring formula (`docs/validation/quality-control.md`). + - [ ] Implement the "Must Pass" numerical validation against ESI. +- [ ] **Initial Seed Run** + - [ ] Execute Agent A for a subset of 50 pages. + - [ ] Perform full validation and Git sync. + - *Verification:* Verify pages render correctly in Wiki.js with proper hierarchy. + +--- + +## Phase 3: Automated Monitoring & Updates +*Goal: Enable daily updates and patch note tracking.* + +- [ ] **Agent B: Patch Note Monitor** + - [ ] Implement RSS polling with browser-headers for EVE Online. + - [ ] Implement LLM-based diff analysis for patch notes. +- [ ] **Failure Handling & Alerts** + - [ ] Implement Tiered Response Matrix and Webhook alerts. + - [ ] Implement the "Correction Request" feedback loop. +- [ ] **Scheduling** + - [ ] Configure daily batch runs at 02:00 UTC. + +--- + +## Phase 4: Expansion & Advanced Features +*Goal: Handle major game changes and link optimization.* + +- [ ] **Cross-link Generation** + - [ ] Implement automated entity matching for wiki-linking. +- [ ] **Expansion Protocol** + - [ ] Create "Freeze Mode" logic for major CCP expansions. +- [ ] **Asset Management** + - [ ] Finalize local vs S3 asset storage for images. + +--- + +## Task Manifest (Unallocated) +- [ ] Define Skill, Item, and Faction schemas. +- [ ] Implement content merge strategy for multi-source pages. +- [ ] Document final "Human-in-the-loop" emergency runbook. diff --git a/assets/images/.gitkeep b/assets/images/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/content/.gitkeep b/content/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/deploy/docker-compose.yml b/deploy/docker-compose.yml new file mode 100644 index 0000000..da546b3 --- /dev/null +++ b/deploy/docker-compose.yml @@ -0,0 +1,55 @@ +services: + db: + image: postgres:16-alpine + environment: + POSTGRES_DB: wikijs + POSTGRES_PASSWORD: ${DB_PASS:-wikijsrocks} + POSTGRES_USER: wikijs + healthcheck: + test: ["CMD-SHELL", "pg_isready -U wikijs"] + interval: 10s + timeout: 5s + retries: 5 + logging: + driver: "json-file" + options: + max-size: "10m" + restart: unless-stopped + volumes: + - db-data:/var/lib/postgresql/data + + wiki: + image: requarks/wiki:2 + depends_on: + db: + condition: service_healthy + environment: + DB_TYPE: postgres + DB_HOST: db + DB_PORT: 5432 + DB_USER: wikijs + DB_PASS: ${DB_PASS:-wikijsrocks} + DB_NAME: wikijs + logging: + driver: "json-file" + options: + max-size: "10m" + restart: unless-stopped + ports: + - "3010:3000" + + redis: + image: redis:7-alpine + command: redis-server --save 60 1 --loglevel warning + healthcheck: + test: ["CMD-SHELL", "redis-cli ping | grep PONG"] + interval: 10s + timeout: 5s + retries: 5 + restart: unless-stopped + volumes: + - redis-data:/data + +volumes: + db-data: + redis-data: diff --git a/docs/infrastructure/git-sync.md b/docs/infrastructure/git-sync.md new file mode 100644 index 0000000..f9fe209 --- /dev/null +++ b/docs/infrastructure/git-sync.md @@ -0,0 +1,50 @@ +# Git Sync Protocol & Repository Structure + +The Git repository serves as the **Single Source of Truth (SSOT)** for all wiki content. This document defines how data is structured and synchronized. + +## 1. Repository Layout + +```text +/ +├── content/ # All Markdown wiki pages +│ ├── ships/ +│ ├── modules/ +│ └── ... +├── assets/ # Images, PDFs, and static files +│ ├── images/ +│ └── ... +├── schema/ # Shared validation schemas (JSON Schema) +├── metadata/ # Global metadata (e.g., typeID mapping table) +│ └── mapping.json +└── .wikijsignore # Files to be ignored by Wiki.js sync +``` + +## 2. Synchronization Flow + +### Primary Write Path (Agent -> Git -> Wiki.js) +1. **Agent Modification:** An agent (A, B, or C) generates or updates a Markdown file. +2. **Local Commit:** The agent commits the change to its local clone of the repository. +3. **Push to Origin:** The agent pushes to the `main` branch. +4. **Wiki.js Sync:** + - Wiki.js is configured with the "Git" storage target. + - It pulls changes from the `main` branch at a set interval (default: 5 minutes) or via Webhook. + - Wiki.js renders the new Markdown content in the UI. + +### Wiki.js to Git (Optional / Prohibited) +- **Status:** DISABLED +- **Rationale:** Since human editing is disabled, there should be no writes originating from the Wiki.js UI. This prevents merge conflicts and ensures the Agent pipeline remains the sole source of content. + +## 3. Commit Standards + +To ensure a clean audit trail, all commits must follow the Conventional Commits-style with agent identifiers: + +**Format:** `[AGENT_ID] action: description (hash: source_hash)` + +**Examples:** +- `[AGENT_A] seed: ships/caldari/condor (hash: sha256_...)` +- `[AGENT_B] update: ships/amarr/abaddon (patch: 2026-04-16)` +- `[AGENT_G] asset: images/ships/condor.png` + +## 4. Conflict Resolution +- **Strategy:** Last-Write-Wins (LWW) based on Git commit timestamp. +- **Merge Logic:** Automated merges are preferred. If a conflict occurs (rare in agent-only environments), the pipeline will halt and trigger a "System Alert" for manual intervention. diff --git a/docs/infrastructure/wikijs-auth.md b/docs/infrastructure/wikijs-auth.md new file mode 100644 index 0000000..470d4e1 --- /dev/null +++ b/docs/infrastructure/wikijs-auth.md @@ -0,0 +1,36 @@ +# Wiki.js API Authentication & Security Strategy + +## 1. Authentication Method +- **Token Type:** Permanent API Keys (Bearer Tokens) +- **Generation:** Generated via the Wiki.js Administration Area -> API Keys +- **Storage:** Stored as environment variables in the agent runtime environment (e.g., `WIKIJS_API_TOKEN`). + +## 2. Permission Scopes +To maintain security, the API token used by agents will be restricted to the minimum necessary scopes: + +| Scope | Requirement | Justification | +|-------|-------------|---------------| +| `write:pages` | Mandatory | Allows agents to create and update content | +| `read:pages` | Mandatory | Allows agents to check existing content before updates | +| `write:assets` | Mandatory | Allows Agent G to upload images/files | +| `read:assets` | Mandatory | Allows checking for existing assets | +| `read:tags` | Optional | Allows metadata tagging | +| `manage:system` | **Prohibited** | Agents must NOT have administrative system access | + +## 3. Token Rotation Policy +- **Frequency:** Tokens should be rotated every 90 days. +- **Process:** + 1. Generate new token in Wiki.js. + 2. Update environment variable in agent deployment (Komodo/Docker). + 3. Verify connectivity. + 4. Revoke old token. + +## 4. Write Access Control +- **Human Editing:** All human accounts in Wiki.js will be assigned to a "Read-Only" group. +- **Agent Editing:** Only the API account (associated with the token) will have write permissions. +- **Emergency Bypass:** A single "Admin" account will be maintained for emergency manual intervention, protected by 2FA. + +## 5. Security Best Practices +- **TLS:** All API calls MUST be made over HTTPS. +- **IP Whitelisting:** If possible, Wiki.js should be configured to only accept API requests from the IP of the agent runner. +- **Audit Logs:** Enable Wiki.js audit logging to track all changes made via the API token. diff --git a/docs/schema/hierarchy.md b/docs/schema/hierarchy.md new file mode 100644 index 0000000..eafa5c0 --- /dev/null +++ b/docs/schema/hierarchy.md @@ -0,0 +1,41 @@ +# Wiki Page Hierarchy & URL Structure + +To ensure a structured and navigable wiki, all pages must follow this hierarchical pathing and categorization schema. + +## 1. Top-Level Categories + +| Directory | Content Type | URL Pattern | +|-----------|--------------|-------------| +| `ships/` | All ship hulls | `/ships/{race}/{group}/{ship_name}` | +| `modules/`| All ship modules | `/modules/{category}/{group}/{module_name}` | +| `mechanics/`| Game mechanics/guides | `/mechanics/{category}/{topic}` | +| `items/` | General items (ammo, etc) | `/items/{category}/{item_name}` | +| `skills/` | Character skills | `/skills/{category}/{skill_name}` | +| `factions/`| NPC Factions | `/factions/{faction_name}` | + +## 2. Detailed Pathing Examples + +### Ships +- **Path:** `/ships/caldari/interceptors/raptor` +- **Breadcrumb:** Ships > Caldari > Interceptors > Raptor + +### Modules +- **Path:** `/modules/shield/shield-extenders/large-shield-extender-ii` +- **Breadcrumb:** Modules > Shield > Shield Extenders > Large Shield Extender II + +### Mechanics +- **Path:** `/mechanics/combat/tracking-guide` +- **Breadcrumb:** Mechanics > Combat > Tracking Guide + +## 3. Redirect & Alias Strategy +- **TypeID Redirects:** Every page must be aliased by its ESI TypeID (e.g., `/id/603` -> `/ships/caldari/interceptors/raptor`) to allow easy linking from external tools. +- **Lowercase Enforcement:** All URLs must be strictly lowercase. +- **Slugification:** Spaces replaced by hyphens, special characters removed. + +## 4. Metadata Requirement +Each page MUST include the following metadata in its frontmatter to support the hierarchy: +```yaml +path: "ships/caldari/interceptors/raptor" +parent: "ships/caldari/interceptors" +order: 10 # Optional: for sorting in lists +``` diff --git a/docs/schema/state-schema.md b/docs/schema/state-schema.md new file mode 100644 index 0000000..9443c7b --- /dev/null +++ b/docs/schema/state-schema.md @@ -0,0 +1,83 @@ +# LangGraph State Schema Definition + +This document defines the shared state object used by the LangGraph orchestration layer. + +## 1. Shared State Object (`WikiState`) + +```python +from typing import List, Optional, Dict, Annotated +from pydantic import BaseModel, Field +from datetime import datetime + +class ContentSource(BaseModel): + name: str # e.g., "eve-university", "wckg", "esi" + url: str + content_hash: str + extracted_at: datetime + +class ValidationResult(BaseModel): + category: str # "structural", "content", "numerical", "cross-reference" + passed: bool + score: float # 0.0 to 1.0 + feedback: Optional[str] = None + details: Dict = {} + +class PageMetadata(BaseModel): + page_type: str # "ship", "module", "mechanic", "guide" + source: str + source_url: str + imported_date: datetime + last_updated: datetime + last_validated: datetime + update_frequency: str + validation_score: float + categories: List[str] + +class WikiPage(BaseModel): + path: str # e.g., "ships/frigates/condor" + title: str + content_markdown: str + metadata: PageMetadata + frontmatter: Dict + assets: List[str] # List of local asset paths + +class WikiState(BaseModel): + # Core processing data + current_page: Optional[WikiPage] = None + proposed_changes: List[WikiPage] = [] + + # Pipeline tracking + sources: List[ContentSource] = [] + validation_pipeline_results: List[ValidationResult] = [] + + # Control flow + retry_count: int = 0 + max_retries: int = 3 + is_approved: bool = False + error: Optional[str] = None + + # Checkpointing metadata + thread_id: str + checkpoint_id: Optional[str] = None +``` + +## 2. Page Type Schemas + +### Ship Schema Overlay +Extends the numerical validation requirements. + +```python +class ShipData(BaseModel): + type_id: int + group: str + race: str + hull_stats: Dict[str, int] + fitting_stats: Dict[str, int] + velocity: int + skill_requirements: List[Dict[str, int]] +``` + +## 3. Storage Strategy +- **State Persistence:** Redis (via `RedisSaver` for LangGraph) +- **Content Persistence:** Git (Markdown + YAML Frontmatter) +- **Asset Persistence:** Local filesystem / S3-compatible storage diff --git a/docs/validation/quality-control.md b/docs/validation/quality-control.md new file mode 100644 index 0000000..e9df8af --- /dev/null +++ b/docs/validation/quality-control.md @@ -0,0 +1,60 @@ +# Quality Control & Operations Protocol + +This document defines the automated decision-making logic for content validation and the system-wide response to operational failures. + +## 1. Validation Scoring Formula (Blocker #6) + +Agent E (Validation) and Agent F (Numerical) calculate a combined **Confidence Score (0-100%)**. A score of **95% or higher** is required for auto-publication to the `main` branch. + +### 1.1 Weighted Categories + +| Category | Weight | Must Pass? | Criteria | +|----------|--------|------------|----------| +| **Numerical (ESI)** | 40% | **YES** | Stats (HP, Slots, PG/CPU) must match ESI ±0%. | +| **Structural** | 20% | **YES** | Valid YAML, required fields present, correct URL path. | +| **Relational** | 20% | NO | Internal links resolve, TypeIDs are valid. | +| **Semantic** | 20% | NO | Prose description matches structured data intent. | + +### 1.2 The "Must Pass" Rule +If any **"Must Pass"** category fails (score < 100% in that category), the total Confidence Score is immediately capped at **0%**, regardless of other categories. This prevents LLM hallucinations from overriding official game data. + +### 1.3 Scoring Logic +`Total Score = (ESI * 0.4) + (Struct * 0.2) + (Relat * 0.2) + (Semant * 0.2)` + +--- + +## 2. Failure Handling Matrix (Blocker #8) + +The system distinguishes between transient infrastructure issues and content quality failures. + +### 2.1 Tiered Response Matrix + +| Error Type | Tier | Initial Action | Max Retries | Escalation | +|------------|------|----------------|-------------|------------| +| **API Timeout / 5xx** | 1 | Exponential Backoff (1m, 5m, 15m) | 3 | Tier 3 Alert | +| **Validation Fail (70-94%)** | 2 | Regeneration with Error Feedback | 2 | Tier 3 Alert | +| **Validation Fail (<70%)** | 2 | Immediate Rejection | 1 | Tier 3 Alert | +| **Auth Failure / 401** | 3 | Halt Pipeline | 0 | **Critical System Alert** | +| **Git Conflict** | 3 | Halt Pipeline | 0 | **Critical System Alert** | + +### 2.2 Feedback Loop (Tier 2) +When a page fails validation with a score of 70-94%, Agent E sends a "Correction Request" back to the source agent (A, B, or C) containing: +1. The specific field that failed. +2. The expected value (from ESI or Schema). +3. The current value produced. + +### 2.3 Critical System Alerting +Tier 3 failures trigger a webhook notification to the system administrator. The LangGraph state is persisted as a "Suspended" checkpoint, allowing for manual inspection and resume via LangSmith. + +--- + +## 3. Rate Limiting Policy (Blocker #11) + +To ensure stability and prevent IP bans, the following global limits are enforced at the transport layer: + +| Target | Rate Limit | Burst | +|--------|------------|-------| +| **ESI API** | 20 req / sec | 50 | +| **MediaWiki API** | 2 req / sec | 5 | +| **Wiki.js API** | 5 req / sec | 10 | +| **LLM APIs** | Per Provider Tier | N/A | diff --git a/eve-online-wiki-plan.md b/eve-online-wiki-plan.md new file mode 100644 index 0000000..a717ab1 --- /dev/null +++ b/eve-online-wiki-plan.md @@ -0,0 +1,524 @@ +# EVE Online Automated Wiki System - High Level Plan + +## 1. Wiki Software Recommendation: Wiki.js + +**Why Wiki.js:** +- Modern, open-source (AGPL-3.0), actively maintained +- First-class Docker support with official image +- REST API built-in for automated content updates +- Markdown-based editing (great for AI-generated content) +- Git-based storage option for complete version control +- Built-in search, analytics, and access controls +- Lightweight (~200MB RAM) +- **Perfect for agent-only workflow:** Can disable all human editing entirely while retaining API write access + +**Alternatives considered:** + +| Software | Pros | Cons | +|----------|------|------| +| MediaWiki | Industry standard, massive extension ecosystem | Heavy, PHP, API is less developer-friendly | +| DokuWiki | Flat file, extremely simple | No native API, dated interface | +| BookStack | Structured organization | Less suited for interconnected knowledge | +| Wiki.js | Modern API, Git sync, Docker-native, read-only UI support | Younger project, smaller community | + +--- + +## 2. System Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ EVE Online Wiki System │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ LangGraph Orchestration Layer │ │ +│ │ ┌───────────────────────────────────────────────────────────────┐ │ │ +│ │ │ StateGraph (Main Graph) │ │ │ +│ │ │ │ │ │ +│ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ +│ │ │ │ Source │ │ Patch │ │External │ │ ESI │ │ │ │ +│ │ │ │Harvester│ │ Monitor │ │ Monitor │ │Collector│ │ │ │ +│ │ │ │ Node │ │ Node │ │ Node │ │ Node │ │ │ │ +│ │ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ +│ │ │ └────────────┴─────────────┴─────────────┘ │ │ │ +│ │ │ │ │ │ │ +│ │ │ ┌────────▼────────┐ │ │ │ +│ │ │ │ Validation │ │ │ │ +│ │ │ │ Subgraph │ │ │ │ +│ │ │ │ (E → F → G) │ │ │ │ +│ │ │ └────────┬────────┘ │ │ │ +│ │ │ │ │ │ │ +│ │ │ ┌────────▼────────┐ │ │ │ +│ │ │ │ Git Sync │ │ │ │ +│ │ │ │ Node │ │ │ │ +│ │ │ └────────┬────────┘ │ │ │ +│ │ │ │ │ │ │ +│ │ │ ┌────────▼────────┐ │ │ │ +│ │ │ │ Wiki.js API │ │ │ │ +│ │ │ │ Node │ │ │ │ +│ │ │ └─────────────────┘ │ │ │ +│ │ └───────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ LangGraph Features: │ │ +│ │ • Checkpointing: Durable state persistence across crashes │ │ +│ │ • Conditional Edges: Dynamic routing based on validation results │ │ +│ │ • Subgraphs: Nested validation pipeline (E→F→G) as single node │ │ +│ │ • Streaming: Real-time token output from LLM agents │ │ +│ │ • LangSmith: Built-in observability and tracing │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +**Infrastructure Stack:** +- **Wiki.js** - Content management (read-only UI) +- **PostgreSQL** - Primary database for Wiki.js (production-grade, required) +- **Redis** - State persistence backend for LangGraph checkpointing +- **LangGraph** - Agent orchestration framework (MIT licensed, built on LangChain) +- **LangSmith** - Observability platform for tracing agent executions +- **Git** - Content storage backend for version control + +**Architecture Principles:** +- All content flows top-to-bottom, no exceptions +- 100% agent-only editing - no human accounts have write access +- All changes go through full validation pipeline before publication +- Complete audit trail maintained in Git storage backend +- Immutable content sources are isolated from publication layer +- Git sync acts as single source of truth for all content +- LangGraph maintains checkpoint state and full audit log via LangSmith traces +- No direct API writes from edge agents - all writes go through pipeline +- Daily batch updates for prose/content (not continuous) - scheduled at 02:00 UTC +- ESI structured data updates run on independent schedule (hourly for static data, daily for dynamic) + +--- + +## 3. Agent Specifications + +### Agent A: Initial Wiki Construction + +**Purpose:** Seed the wiki with existing content from source sites. + +**Inputs:** +- Source URLs (EVE University wiki, WCKG, CCP Support, ESI API) +- Content scope (what categories/sections to import) +- Deduplication and merge strategy + +**Process:** +1. **Source Extraction:** + - **EVE University:** Utilize MediaWiki `api.php` for structured content retrieval (avoids HTML scraping issues). + - **WCKG:** Specialized Google Sites parser for dynamic content rendering. + - **CCP Support:** Content extraction with browser headers to bypass Cloudflare challenges. +2. Extract structured content (ships, modules, mechanics) +3. Normalize content format to Markdown +4. Extract all images and references +5. Compute content hash (SHA-256) for each page and skip if unchanged since last import +6. Create wiki pages with proper hierarchy (see [Page Hierarchy Strategy](docs/schema/hierarchy.md)) +7. Tag pages with complete metadata +8. Generate cross-links between related pages +9. Pass all content to Validation Agent before publication + +**Scheduling:** One-time run (with option to replay) + +--- + +### Agent B: Patch Note Monitor + +**Purpose:** Detect EVE Online patch changes and update affected wiki pages. + +**Inputs:** +- RSS feed: `https://www.eveonline.com/rss/patch-notes` (Verified accessible via GET) +- Update frequency (recommended: daily) +- Affected page mapping (which pages relate to which game systems) + +**Process:** +1. Poll RSS feed on schedule (Standard RSS 2.0 parsing) +2. Parse new patch entries using LLM content analysis +3. Identify *exact* content changes required for affected pages +4. Generate complete revised page content, not just append sections +5. Pass proposed changes to Validation Agent +6. If validation passes, apply update automatically +7. If validation fails, retry generation or flag for system alert + +--- + +## 3.5 Infrastructure Protocols + +### Git Sync Protocol +- **Single Source of Truth:** Git repository acts as the primary storage. +- **Bi-directional Sync:** + - Agent Write -> Git Commit -> Wiki.js Push + - Wiki.js renders directly from the Git-backed storage. +- **Repository Structure:** + - `/content`: Markdown files mapping to wiki paths (e.g., `content/ships/frigates/condor.md`) + - `/assets`: Images and files mapping to local paths. +- **Commit Format:** `[AGENT_ID] update: path/to/page (hash: abc123)` + +### API Authentication +- **Strategy:** Bearer tokens with minimum scopes (`write:pages`, `write:assets`). +- **Storage:** Managed via environment variables. +- See [Wiki.js API Auth Strategy](docs/infrastructure/wikijs-auth.md) for details. + +### Shared State +- **Schema:** Managed via LangGraph `WikiState` Pydantic model. +- See [State Schema Definition](docs/schema/state-schema.md) for details. + +--- + +### Agent C: External Wiki Monitor + +**Purpose:** Track changes on source wikis and refresh content. + +**Inputs:** +- Monitored URLs and change detection rules +- Check frequency (recommended: weekly) + +**Process:** +1. Poll source sites on schedule respecting robots.txt +2. Detect new pages or modified content +3. Compare against imported content hash in local wiki +4. Ignore minor formatting/link changes +5. Generate revised page content with merged changes +6. Pass proposed changes to Validation Agent +7. Apply update automatically on validation pass + +--- + +### Agent D: ESI Data Collector + +**Purpose:** Pull official structured data directly from CCP's ESI API. + +**Inputs:** +- ESI API endpoints for ships, modules, items, skills +- Update frequency: hourly for static data (independent of daily batch), daily for dynamic data (part of 02:00 UTC batch) + +**Process:** +1. Poll ESI API on schedule with proper rate limiting +2. Extract structured game data +3. Generate or update data-driven pages automatically +4. Merge structured data with human-readable content from other sources +5. Compute content hash and skip if unchanged since last poll +6. Pass proposed changes to Validation Agent + +--- + +### Agent E: Content Validation & Review Agent + +**Purpose:** Automated quality control for all content changes. **Replaces all human review.** + +**Validation Rules:** +1. **Structural validation:** Markdown syntax, page hierarchy, metadata presence +2. **Content validation:** Factual consistency, no broken references, completeness +3. **Change validation:** Diff analysis, only expected changes applied, no unintended modifications +4. **Cross-reference validation:** All internal links resolve correctly +5. **TOS compliance:** Proper attribution included for all imported content + +**Process:** +1. Receive proposed change from upstream agent +2. Run all validation checks +3. Generate confidence score (0-100%) +4. If score > 95%: Approve for publication +5. If score 70-95%: Request regeneration with feedback +6. If score <70%: Reject change and generate system alert + +--- + +### Agent F: Numerical Validation Layer + +**Purpose:** Rule-based validation for game data, separate from LLM validation. Catches LLM hallucinations in structured data. + +**Validation Categories:** + +| Data Type | Validation Rules | Source of Truth | +|-----------|------------------|-----------------| +| Ship stats | Base HP, velocity, slots, fitting stats within ±0% of ESI | ESI API | +| Module stats | CPU, PG, range, damage multipliers | ESI API | +| Skill requirements | Prerequisites match skill tree | ESI API | +| Fitting calculations | Must pass CPU/PG budget checks | Local calculation | +| Market data | Prices non-negative, volume non-negative | ESI API | +| Links/IDs | All typeIDs resolve to valid entities | ESI API lookup | + +**Process:** +1. Extract all numerical values from proposed content +2. Cross-reference against ESI API for game data +3. Flag any discrepancies >0% for ship/module stats +4. Reject content with invalid typeIDs or broken references +5. Log all validation results for audit + +**Override Rules:** +- Numerical validation failures = auto-reject (no LLM override possible) +- Historical content (archived ships/modules) flagged for manual review + +--- + +### Agent G: Asset & Reference Handler + +**Purpose:** Centralized management of all images, links, and external references. + +**Process:** +1. Receive all extracted images and references from other agents +2. Download images to local storage (respecting copyright/attribution) +3. Rewrite all image URLs to local wiki paths +4. Rewrite all external links to reference original source +5. Add source attribution footer to all pages +6. Check for broken links on every update +7. Maintain asset integrity across all pages + +--- + +## 4. Model Intelligence Tiers + +Different agents require different levels of LLM reasoning. Using appropriate models reduces cost and improves reliability. + +| Agent | Tier | Model Requirements | Justification | +|-------|------|--------------------|---------------| +| A: Source Harvester | Low | Basic extraction model (e.g., GPT-4o-mini, Claude Haiku) | Template-based extraction, structured output format | +| B: Patch Note Monitor | High | Strong reasoning model (e.g., Claude Sonnet, GPT-4o) | Requires understanding game mechanics to map changes to pages | +| C: External Wiki Monitor | Low | Basic extraction model | Simple change detection and content extraction | +| D: ESI Data Collector | None | No LLM needed | Pure API calls, structured data, programmatic transformation | +| E: Content Validation | Medium | Balanced model (e.g., Claude Sonnet) | Needs semantic understanding but structured validation rules | +| F: Numerical Validation | None | No LLM needed | Pure rule-based, deterministic validation | +| G: Asset Handler | Low | Basic model for categorization | Mostly file operations, minimal reasoning | + +**Recommended Model Stack:** +- **High reasoning:** Claude 3.5 Sonnet / GPT-4o (Agents B, E) +- **Low cost:** Claude 3.5 Haiku / GPT-4o-mini (Agents A, C, G) +- **No LLM:** Agents D, F (programmatic only) + +**Daily Batch Cost Estimate:** +With daily updates (not continuous), typical daily operations: +- ~10-20 patch note analyses (Agent B): ~$0.10-0.30 +- ~50-100 content validations (Agent E): ~$0.50-1.00 +- ~100-200 extractions (Agents A, C, G): ~$0.10-0.20 +- **Daily total: ~$0.70-1.50** (note: subscription tiers have rate limits and token caps; bulk operations like initial import may temporarily exceed these, requiring pay-per-use fallback) + +--- + +## 5. Content Schema Per Page Type + +Each page type has a template with required fields that Agent E validates structurally before semantic validation. + +### Ship Page Template +```yaml +page_type: ship +required_fields: + - name: string + - type_id: integer (ESI) + - group: string (e.g., "Interceptor", "Battleship") + - race: string (Caldari, Minmatar, Amarr, Gallente) + - hull_stats: + hp_shield: integer + hp_armor: integer + hp_structure: integer + - fitting_stats: + cpu_output: integer + powergrid_output: integer + high_slots: integer + med_slots: integer + low_slots: integer + rig_slots: integer + - velocity: integer + - skill_requirements: list[{skill_id: integer, level: integer}] + - description: string (prose, sourced from external wiki or generated) +optional_fields: + - role_bonus: string + - ship_bonus: list[string] + - capacitor_capacity: integer + - targeting_range: integer + - drone_bandwidth: integer + - probe_launcher_fitting: boolean +``` + +### Module Page Template +```yaml +page_type: module +required_fields: + - name: string + - type_id: integer (ESI) + - group: string (e.g., "Shield Booster", "Afterburner") + - slot: string (high, mid, low, rig) + - cpu_usage: integer + - powergrid_usage: integer + - description: string +optional_fields: + - duration: integer + - range: integer + - damage_multiplier: float + - skill_requirements: list[{skill_id: integer, level: integer}] + - meta_level: integer + - tech_level: integer (1 or 2) +``` + +### Mechanic/Guide Page Template +```yaml +page_type: mechanic +required_fields: + - title: string + - summary: string (1-3 sentences) + - categories: list[string] + - source: string (eve-university | wckg | ccp | generated) + - last_reviewed: date +optional_fields: + - related_ships: list[string] + - related_modules: list[string] + - related_mechanics: list[string] + - difficulty: string (beginner | intermediate | advanced) +``` + +### Validation Against Schema +Agent E enforces: +1. All `required_fields` present and non-empty +2. All integer fields contain valid integers (no strings, no nulls) +3. All `type_id` fields pass Agent F numerical validation against ESI +4. All `skill_requirements` reference valid typeIDs +5. Page type matches one of the defined templates (reject unknown types) + +--- + +## 6. Agent Health Monitoring + +LangGraph provides built-in checkpointing for state persistence, but agent-level health monitoring requires a separate heartbeat system. + +**Heartbeat Protocol:** +- Each LangGraph node (agent) sends a `HEARTBEAT` message to Redis every 60 seconds during active operation, every 5 minutes when idle +- Heartbeat payload: `{ node_name, status: healthy|degraded|error, thread_id, last_completed_at, checkpoint_id }` +- Heartbeat registry uses Redis with TTL (3x interval for stale, 10x for dead) + +**LangGraph Checkpointing:** +- LangGraph's `MemorySaver` or `PostgresSaver` persists graph state at each step +- Workflows resume exactly where they left off after crashes +- Checkpoint TTL configurable (24-48 hours for batch workflows, session-based for conversational) + +**Staleness Detection:** +- If no heartbeat received within 3x the expected interval → mark agent as `stale` +- If no heartbeat received within 10x the expected interval → mark agent as `dead` and trigger critical alert +- Stale nodes: LangGraph checkpoint indicates last state, new invocations wait for recovery +- Dead nodes: halt dependent pipeline stages, escalate alert + +**LangSmith Integration:** +- Every LLM call, tool invocation, and state transition emits traces to LangSmith +- QueryLangSmith audit logs for execution history, latency, token usage +- Alerts configured via LangSmith webhooks for validation failures + +**Alerting:** +- Agent status transitions emit events to the audit log +- Critical alerts (dead node, repeated validation failures, checkpoint gaps > threshold) notify via configured channel (webhook, email, etc.) + +--- + +## 7. Standard Page Metadata + +All pages will include standard frontmatter: +```yaml +source: eve-university | wckg | ccp | esi | generated +source_url: https://... +imported_date: 2026-04-16 +last_updated: 2026-04-16 +last_validated: 2026-04-16 +update_frequency: daily | weekly | monthly +validation_score: 98 +categories: [ships, pvp, modules, industry] +``` + +--- + +## 8. Implementation Phases + +### Phase 0: Pre-Work & Compliance +- Confirm scraping TOS with source wiki maintainers +- Implement rate limiting and proper User-Agent headers +- Define metadata schema and validation rules +- Test content extraction on sample pages + +### Phase 1: Foundation +- Deploy PostgreSQL database via Docker (production configuration) +- Deploy Redis instance for LangGraph checkpointing + heartbeat registry +- Deploy Wiki.js via Docker (connected to PostgreSQL) +- **Disable all human write permissions** - configure API-only write access +- Configure Git storage backend for complete change history +- Configure Git sync layer as single source of truth +- Set up HTTPS and domain routing +- Establish automated backup strategy +- Deploy LangGraph with `StateGraph` defining all agent nodes and edges +- Configure LangSmith for observability (tracing, audit logs) +- Deploy agent heartbeat monitoring (Redis TTL registry) + +### Phase 2: Content Pipeline +- Deploy Source Harvester Agent +- Deploy Validation Agent +- Deploy Asset Handler Agent +- Deploy ESI Data Collector Agent +- Execute initial import with full validation pipeline +- Establish content quality baseline + +### Phase 2.5: Smoke Test +- Run Agent A on 50 representative pages across all page types (ships, modules, mechanics, guides) +- Pass all 50 pages through the full validation pipeline (Agents E + F + G) +- Calibrate validation thresholds based on results (adjust confidence scoring weights) +- Verify merge logic when ESI data and external wiki content overlap on same pages +- Confirm Git sync round-trip: write → Git → Wiki.js render matches expected output +- Identify and fix integration bugs before full import +- Document baseline validation pass rate and failure patterns + +### Phase 3: Automated Monitoring +- Deploy Patch Note Monitor Agent +- Implement LLM-based patch parsing and content generation +- Configure validation thresholds +- Test end-to-end update workflow + +### Phase 4: External Change Tracking +- Deploy External Wiki Monitor Agent +- Configure source site monitoring +- Implement change detection and merge logic +- Set up system alerting for failures + +### Phase 5: Major Expansion Handling +- Create expansion detection webhook (CCP announces expansions 2-4 weeks ahead) +- Build bulk update workflow for expansion releases +- Implement "freeze" mode during expansion deployment (content locked until ESI stabilizes) +- Create post-expansion audit job to verify all affected pages +- Document expansion runbook for manual triggering + +**Expansion Workflow:** +1. Expansion announced → Create tracking ticket +2. Expansion deploys → Freeze wiki updates, wait for ESI stability (typically 24-48h) +3. Run bulk ESI sync → Update all ship/module/item pages +4. Run Patch Note Agent → Process expansion notes, generate new pages +5. Run full validation → All pages validated against new ESI data +6. Unfreeze → Resume daily batch updates + +--- + +## 9. Validation Questions + +### Wiki Infrastructure +468: 1. **Hosting requirements:** What server/container host will run this? (RAM/CPU allocation) +469: 2. **Access & secrets management:** Plan for storing ESI credentials, Git credentials, and Wiki.js API tokens in a secrets manager (e.g., Vault, AWS Secrets Manager). +470: 3. **Backup requirements:** How many days of backup retention are required? +471: 4. **User access:** Will this wiki be public read-only, or require authentication? +472: 5. **Storage:** How much content do you anticipate? (affects storage planning) + +### Content Scope +475: 6. **Priority domains:** Should we prioritize specific game aspects? (PVP, mining, industry, nullsec, etc.) +476: 7. **Content age:** Should imported content include historical versions, or only current state? +477: 8. **Completeness threshold:** What's an acceptable import percentage? (80% of pages vs. all) + +### Agent Behavior +480: 9. **Validation threshold:** What minimum validation score should be required for auto-approval? (Recommended: 95%) +481: 10. **Conflict resolution:** If multiple sources have conflicting information, which source takes priority? +482: 11. **Update frequency:** How fresh should content be? (real-time, daily, weekly) +483: 12. **Alerting:** How should the system notify on validation failures or errors? + +### Operational +486: 13. **Monitoring access:** Do you have access to the Nginx Proxy Manager instance for SSL/proxy configuration? +487: 14. **Container management:** Will you use Komodo or another container management platform, or manual Docker? +488: 15. **Error handling:** Should the system pause and alert on repeated failures, or continue with skipped items? + +### 10. Next Steps + +Once questions are answered, I can: +1. Provide detailed Docker Compose configuration for Wiki.js with read-only UI and secrets integration +2. Design the LangGraph StateGraph specification (node definitions, edge conditions, state schema) +3. Define the patch-note-to-wiki mapping schema +4. Create the content import runbook for Agent A +5. Implement the standard metadata schema and validation rules +6. Configure LangSmith dashboards for wiki content monitoring \ No newline at end of file diff --git a/wiki-plan-review-blockers.md b/wiki-plan-review-blockers.md new file mode 100644 index 0000000..df71dbb --- /dev/null +++ b/wiki-plan-review-blockers.md @@ -0,0 +1,117 @@ +# EVE Online Wiki Plan Review - Implementation Blockers + +Review completed 2026-04-16. Legal issues excluded per request. + +--- + +## ✅ RESOLVED BLOCKERS + +### 1. State Schema Definition +- **Status**: RESOLVED +- **Reference**: [State Schema Definition](docs/schema/state-schema.md) + +### 2. Wiki.js API Authentication Flow +- **Status**: RESOLVED +- **Reference**: [Wiki.js API Auth Strategy](docs/infrastructure/wikijs-auth.md) + +### 3. ESI Client Specification +- **Status**: RESOLVED +- **Reference**: [ESI Client Design](docs/infrastructure/esi-client.md) + +### 4. Content Hash Algorithm +- **Status**: RESOLVED +- **Outcome**: SHA-256 standardized. + +### 5. Git Sync Layer Specification +- **Status**: RESOLVED +- **Reference**: [Git Sync Protocol](docs/infrastructure/git-sync.md) + +### 6. Validation Agent Scoring Formula +- **Status**: RESOLVED +- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md) +- **Outcome**: Defined 4-category weighted scoring with "Must Pass" logic. + +### 7. Page Hierarchy Specification +- **Status**: RESOLVED +- **Reference**: [Wiki Page Hierarchy](docs/schema/hierarchy.md) + +### 8. Failure Handling Behavior +- **Status**: RESOLVED +- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md) +- **Outcome**: Established 3-Tier response matrix and feedback loop logic. + +### 11. Rate Limiting Specifications +- **Status**: RESOLVED +- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md) +- **Outcome**: Defined specific req/sec limits for ESI, MediaWiki, and Wiki.js. + +--- + +## 🚨 CRITICAL BLOCKERS (Cannot start implementation without these) + +*No critical blockers remain.* + +--- + +## 🟡 MEDIUM IMPACT ISSUES (Need resolution before Phase 2) + +--- + +## 🟡 MEDIUM IMPACT ISSUES (Need resolution before Phase 2) + +### 9. Cross-link Generation Logic Missing +- **Location**: Agent A (line 114) +- **Issue**: "Generate cross-links between related pages" has no logic defined +- **Missing**: Link extraction rules, entity matching strategy, and placement rules + +### 10. Content Merge Strategy Undefined +- **Location**: Agent D (line 172) +- **Issue**: "Merge structured data with human-readable content" has no strategy +- **Impact**: Cannot resolve conflicts between ESI data and wiki content + +### 11. No Rate Limiting Specifications +- **Location**: All external agents +- **Issue**: Only Agent A specifies rate limiting (1 request/second) +- **Missing**: Rate limits for: + - ESI API + - External wiki scraping + - Wiki.js API writes + - Git operations + +### 12. Page Template Schema Incomplete +- **Location**: Content schema section +- **Missing fields**: + - Ship pages: capacitor, targeting, drone stats marked optional but required for validation + - No schema for skill pages, item pages, or faction pages + +--- + +## 🟢 MINOR ISSUES / CLARIFICATIONS NEEDED + +### 13. LangGraph Checkpoint Implementation +- **Location**: Line 361 +- **Issue**: Mentions both `MemorySaver` and `PostgresSaver` but doesn't specify which to use +- **Recommendation**: Use `PostgresSaver` for production durability + +### 14. Agent Scheduling Mechanism +- **Location**: All agent specifications +- **Issue**: No specification for how scheduled agents will be triggered +- **Options**: Cron jobs, LangGraph timers, external scheduler + +### 15. Asset Storage Strategy +- **Location**: Agent G +- **Issue**: "Download images to local storage" doesn't specify storage backend +- **Options**: Wiki.js asset manager, S3, local filesystem + +--- + +## 📋 RECOMMENDED NEXT STEPS + +To unblock implementation immediately, resolve these **4 critical items first**: + +1. Define the shared `State` schema for LangGraph +2. Specify Wiki.js API authentication strategy +3. Define ESI client implementation requirements +4. Specify content hashing algorithm and storage + +No showstopper architectural issues were identified. The design is sound and follows best practices for agent orchestration with LangGraph. \ No newline at end of file