Initial wiki structure

This commit is contained in:
2026-04-19 03:57:03 +00:00
commit 31c3083ff4
11 changed files with 1041 additions and 0 deletions

75
ROADMAP.md Normal file
View File

@@ -0,0 +1,75 @@
# Implementation Roadmap: EVE Online Automated Wiki
This roadmap tracks the progress of the automated wiki system. Status indicators: `[ ]` Todo, `[/]` In-Progress, `[x]` Done, `[!]` Blocked.
---
## Phase 1: Foundation (Current Phase)
*Goal: Establish the core infrastructure, databases, and sync layers.*
- [ ] **Infrastructure Deployment**
- [ ] Deploy PostgreSQL for Wiki.js & LangGraph Checkpointing.
- [ ] Deploy Redis for Agent Heartbeats.
- [ ] Deploy Wiki.js with read-only UI configuration.
- *Verification:* Verify all containers are running and healthy via `docker ps` or Komodo.
- [ ] **Synchronization & Auth**
- [ ] Initialize Git repository with directory structure from `docs/infrastructure/git-sync.md`.
- [ ] Configure Wiki.js Git Storage backend.
- [ ] Generate and store Wiki.js API token in environment.
- *Verification:* Successful API "Hello World" call to Wiki.js.
- [ ] **LangGraph Initialization**
- [ ] Implement `WikiState` Pydantic model (`docs/schema/state-schema.md`).
- [ ] Configure `PostgresSaver` for persistent checkpointing.
- *Verification:* Run a skeleton graph and verify state persists in Postgres.
---
## Phase 2: Content Pipeline
*Goal: Implement the primary extraction and publication agents.*
- [ ] **Agent D: ESI Data Collector**
- [ ] Implement Swagger-based ESI client with rate limiting (`docs/validation/quality-control.md`).
- [ ] Create extraction logic for Ship and Module data.
- [ ] **Agent A: Source Harvester**
- [ ] Implement MediaWiki API client for EVE University.
- [ ] Implement Google Sites parser for WCKG.
- [ ] **Agent E & F: Validation Layer**
- [ ] Implement the weighted scoring formula (`docs/validation/quality-control.md`).
- [ ] Implement the "Must Pass" numerical validation against ESI.
- [ ] **Initial Seed Run**
- [ ] Execute Agent A for a subset of 50 pages.
- [ ] Perform full validation and Git sync.
- *Verification:* Verify pages render correctly in Wiki.js with proper hierarchy.
---
## Phase 3: Automated Monitoring & Updates
*Goal: Enable daily updates and patch note tracking.*
- [ ] **Agent B: Patch Note Monitor**
- [ ] Implement RSS polling with browser-headers for EVE Online.
- [ ] Implement LLM-based diff analysis for patch notes.
- [ ] **Failure Handling & Alerts**
- [ ] Implement Tiered Response Matrix and Webhook alerts.
- [ ] Implement the "Correction Request" feedback loop.
- [ ] **Scheduling**
- [ ] Configure daily batch runs at 02:00 UTC.
---
## Phase 4: Expansion & Advanced Features
*Goal: Handle major game changes and link optimization.*
- [ ] **Cross-link Generation**
- [ ] Implement automated entity matching for wiki-linking.
- [ ] **Expansion Protocol**
- [ ] Create "Freeze Mode" logic for major CCP expansions.
- [ ] **Asset Management**
- [ ] Finalize local vs S3 asset storage for images.
---
## Task Manifest (Unallocated)
- [ ] Define Skill, Item, and Faction schemas.
- [ ] Implement content merge strategy for multi-source pages.
- [ ] Document final "Human-in-the-loop" emergency runbook.

0
assets/images/.gitkeep Normal file
View File

0
content/.gitkeep Normal file
View File

55
deploy/docker-compose.yml Normal file
View File

@@ -0,0 +1,55 @@
services:
db:
image: postgres:16-alpine
environment:
POSTGRES_DB: wikijs
POSTGRES_PASSWORD: ${DB_PASS:-wikijsrocks}
POSTGRES_USER: wikijs
healthcheck:
test: ["CMD-SHELL", "pg_isready -U wikijs"]
interval: 10s
timeout: 5s
retries: 5
logging:
driver: "json-file"
options:
max-size: "10m"
restart: unless-stopped
volumes:
- db-data:/var/lib/postgresql/data
wiki:
image: requarks/wiki:2
depends_on:
db:
condition: service_healthy
environment:
DB_TYPE: postgres
DB_HOST: db
DB_PORT: 5432
DB_USER: wikijs
DB_PASS: ${DB_PASS:-wikijsrocks}
DB_NAME: wikijs
logging:
driver: "json-file"
options:
max-size: "10m"
restart: unless-stopped
ports:
- "3010:3000"
redis:
image: redis:7-alpine
command: redis-server --save 60 1 --loglevel warning
healthcheck:
test: ["CMD-SHELL", "redis-cli ping | grep PONG"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
volumes:
- redis-data:/data
volumes:
db-data:
redis-data:

View File

@@ -0,0 +1,50 @@
# Git Sync Protocol & Repository Structure
The Git repository serves as the **Single Source of Truth (SSOT)** for all wiki content. This document defines how data is structured and synchronized.
## 1. Repository Layout
```text
/
├── content/ # All Markdown wiki pages
│ ├── ships/
│ ├── modules/
│ └── ...
├── assets/ # Images, PDFs, and static files
│ ├── images/
│ └── ...
├── schema/ # Shared validation schemas (JSON Schema)
├── metadata/ # Global metadata (e.g., typeID mapping table)
│ └── mapping.json
└── .wikijsignore # Files to be ignored by Wiki.js sync
```
## 2. Synchronization Flow
### Primary Write Path (Agent -> Git -> Wiki.js)
1. **Agent Modification:** An agent (A, B, or C) generates or updates a Markdown file.
2. **Local Commit:** The agent commits the change to its local clone of the repository.
3. **Push to Origin:** The agent pushes to the `main` branch.
4. **Wiki.js Sync:**
- Wiki.js is configured with the "Git" storage target.
- It pulls changes from the `main` branch at a set interval (default: 5 minutes) or via Webhook.
- Wiki.js renders the new Markdown content in the UI.
### Wiki.js to Git (Optional / Prohibited)
- **Status:** DISABLED
- **Rationale:** Since human editing is disabled, there should be no writes originating from the Wiki.js UI. This prevents merge conflicts and ensures the Agent pipeline remains the sole source of content.
## 3. Commit Standards
To ensure a clean audit trail, all commits must follow the Conventional Commits-style with agent identifiers:
**Format:** `[AGENT_ID] action: description (hash: source_hash)`
**Examples:**
- `[AGENT_A] seed: ships/caldari/condor (hash: sha256_...)`
- `[AGENT_B] update: ships/amarr/abaddon (patch: 2026-04-16)`
- `[AGENT_G] asset: images/ships/condor.png`
## 4. Conflict Resolution
- **Strategy:** Last-Write-Wins (LWW) based on Git commit timestamp.
- **Merge Logic:** Automated merges are preferred. If a conflict occurs (rare in agent-only environments), the pipeline will halt and trigger a "System Alert" for manual intervention.

View File

@@ -0,0 +1,36 @@
# Wiki.js API Authentication & Security Strategy
## 1. Authentication Method
- **Token Type:** Permanent API Keys (Bearer Tokens)
- **Generation:** Generated via the Wiki.js Administration Area -> API Keys
- **Storage:** Stored as environment variables in the agent runtime environment (e.g., `WIKIJS_API_TOKEN`).
## 2. Permission Scopes
To maintain security, the API token used by agents will be restricted to the minimum necessary scopes:
| Scope | Requirement | Justification |
|-------|-------------|---------------|
| `write:pages` | Mandatory | Allows agents to create and update content |
| `read:pages` | Mandatory | Allows agents to check existing content before updates |
| `write:assets` | Mandatory | Allows Agent G to upload images/files |
| `read:assets` | Mandatory | Allows checking for existing assets |
| `read:tags` | Optional | Allows metadata tagging |
| `manage:system` | **Prohibited** | Agents must NOT have administrative system access |
## 3. Token Rotation Policy
- **Frequency:** Tokens should be rotated every 90 days.
- **Process:**
1. Generate new token in Wiki.js.
2. Update environment variable in agent deployment (Komodo/Docker).
3. Verify connectivity.
4. Revoke old token.
## 4. Write Access Control
- **Human Editing:** All human accounts in Wiki.js will be assigned to a "Read-Only" group.
- **Agent Editing:** Only the API account (associated with the token) will have write permissions.
- **Emergency Bypass:** A single "Admin" account will be maintained for emergency manual intervention, protected by 2FA.
## 5. Security Best Practices
- **TLS:** All API calls MUST be made over HTTPS.
- **IP Whitelisting:** If possible, Wiki.js should be configured to only accept API requests from the IP of the agent runner.
- **Audit Logs:** Enable Wiki.js audit logging to track all changes made via the API token.

41
docs/schema/hierarchy.md Normal file
View File

@@ -0,0 +1,41 @@
# Wiki Page Hierarchy & URL Structure
To ensure a structured and navigable wiki, all pages must follow this hierarchical pathing and categorization schema.
## 1. Top-Level Categories
| Directory | Content Type | URL Pattern |
|-----------|--------------|-------------|
| `ships/` | All ship hulls | `/ships/{race}/{group}/{ship_name}` |
| `modules/`| All ship modules | `/modules/{category}/{group}/{module_name}` |
| `mechanics/`| Game mechanics/guides | `/mechanics/{category}/{topic}` |
| `items/` | General items (ammo, etc) | `/items/{category}/{item_name}` |
| `skills/` | Character skills | `/skills/{category}/{skill_name}` |
| `factions/`| NPC Factions | `/factions/{faction_name}` |
## 2. Detailed Pathing Examples
### Ships
- **Path:** `/ships/caldari/interceptors/raptor`
- **Breadcrumb:** Ships > Caldari > Interceptors > Raptor
### Modules
- **Path:** `/modules/shield/shield-extenders/large-shield-extender-ii`
- **Breadcrumb:** Modules > Shield > Shield Extenders > Large Shield Extender II
### Mechanics
- **Path:** `/mechanics/combat/tracking-guide`
- **Breadcrumb:** Mechanics > Combat > Tracking Guide
## 3. Redirect & Alias Strategy
- **TypeID Redirects:** Every page must be aliased by its ESI TypeID (e.g., `/id/603` -> `/ships/caldari/interceptors/raptor`) to allow easy linking from external tools.
- **Lowercase Enforcement:** All URLs must be strictly lowercase.
- **Slugification:** Spaces replaced by hyphens, special characters removed.
## 4. Metadata Requirement
Each page MUST include the following metadata in its frontmatter to support the hierarchy:
```yaml
path: "ships/caldari/interceptors/raptor"
parent: "ships/caldari/interceptors"
order: 10 # Optional: for sorting in lists
```

View File

@@ -0,0 +1,83 @@
# LangGraph State Schema Definition
This document defines the shared state object used by the LangGraph orchestration layer.
## 1. Shared State Object (`WikiState`)
```python
from typing import List, Optional, Dict, Annotated
from pydantic import BaseModel, Field
from datetime import datetime
class ContentSource(BaseModel):
name: str # e.g., "eve-university", "wckg", "esi"
url: str
content_hash: str
extracted_at: datetime
class ValidationResult(BaseModel):
category: str # "structural", "content", "numerical", "cross-reference"
passed: bool
score: float # 0.0 to 1.0
feedback: Optional[str] = None
details: Dict = {}
class PageMetadata(BaseModel):
page_type: str # "ship", "module", "mechanic", "guide"
source: str
source_url: str
imported_date: datetime
last_updated: datetime
last_validated: datetime
update_frequency: str
validation_score: float
categories: List[str]
class WikiPage(BaseModel):
path: str # e.g., "ships/frigates/condor"
title: str
content_markdown: str
metadata: PageMetadata
frontmatter: Dict
assets: List[str] # List of local asset paths
class WikiState(BaseModel):
# Core processing data
current_page: Optional[WikiPage] = None
proposed_changes: List[WikiPage] = []
# Pipeline tracking
sources: List[ContentSource] = []
validation_pipeline_results: List[ValidationResult] = []
# Control flow
retry_count: int = 0
max_retries: int = 3
is_approved: bool = False
error: Optional[str] = None
# Checkpointing metadata
thread_id: str
checkpoint_id: Optional[str] = None
```
## 2. Page Type Schemas
### Ship Schema Overlay
Extends the numerical validation requirements.
```python
class ShipData(BaseModel):
type_id: int
group: str
race: str
hull_stats: Dict[str, int]
fitting_stats: Dict[str, int]
velocity: int
skill_requirements: List[Dict[str, int]]
```
## 3. Storage Strategy
- **State Persistence:** Redis (via `RedisSaver` for LangGraph)
- **Content Persistence:** Git (Markdown + YAML Frontmatter)
- **Asset Persistence:** Local filesystem / S3-compatible storage

View File

@@ -0,0 +1,60 @@
# Quality Control & Operations Protocol
This document defines the automated decision-making logic for content validation and the system-wide response to operational failures.
## 1. Validation Scoring Formula (Blocker #6)
Agent E (Validation) and Agent F (Numerical) calculate a combined **Confidence Score (0-100%)**. A score of **95% or higher** is required for auto-publication to the `main` branch.
### 1.1 Weighted Categories
| Category | Weight | Must Pass? | Criteria |
|----------|--------|------------|----------|
| **Numerical (ESI)** | 40% | **YES** | Stats (HP, Slots, PG/CPU) must match ESI ±0%. |
| **Structural** | 20% | **YES** | Valid YAML, required fields present, correct URL path. |
| **Relational** | 20% | NO | Internal links resolve, TypeIDs are valid. |
| **Semantic** | 20% | NO | Prose description matches structured data intent. |
### 1.2 The "Must Pass" Rule
If any **"Must Pass"** category fails (score < 100% in that category), the total Confidence Score is immediately capped at **0%**, regardless of other categories. This prevents LLM hallucinations from overriding official game data.
### 1.3 Scoring Logic
`Total Score = (ESI * 0.4) + (Struct * 0.2) + (Relat * 0.2) + (Semant * 0.2)`
---
## 2. Failure Handling Matrix (Blocker #8)
The system distinguishes between transient infrastructure issues and content quality failures.
### 2.1 Tiered Response Matrix
| Error Type | Tier | Initial Action | Max Retries | Escalation |
|------------|------|----------------|-------------|------------|
| **API Timeout / 5xx** | 1 | Exponential Backoff (1m, 5m, 15m) | 3 | Tier 3 Alert |
| **Validation Fail (70-94%)** | 2 | Regeneration with Error Feedback | 2 | Tier 3 Alert |
| **Validation Fail (<70%)** | 2 | Immediate Rejection | 1 | Tier 3 Alert |
| **Auth Failure / 401** | 3 | Halt Pipeline | 0 | **Critical System Alert** |
| **Git Conflict** | 3 | Halt Pipeline | 0 | **Critical System Alert** |
### 2.2 Feedback Loop (Tier 2)
When a page fails validation with a score of 70-94%, Agent E sends a "Correction Request" back to the source agent (A, B, or C) containing:
1. The specific field that failed.
2. The expected value (from ESI or Schema).
3. The current value produced.
### 2.3 Critical System Alerting
Tier 3 failures trigger a webhook notification to the system administrator. The LangGraph state is persisted as a "Suspended" checkpoint, allowing for manual inspection and resume via LangSmith.
---
## 3. Rate Limiting Policy (Blocker #11)
To ensure stability and prevent IP bans, the following global limits are enforced at the transport layer:
| Target | Rate Limit | Burst |
|--------|------------|-------|
| **ESI API** | 20 req / sec | 50 |
| **MediaWiki API** | 2 req / sec | 5 |
| **Wiki.js API** | 5 req / sec | 10 |
| **LLM APIs** | Per Provider Tier | N/A |

524
eve-online-wiki-plan.md Normal file
View File

@@ -0,0 +1,524 @@
# EVE Online Automated Wiki System - High Level Plan
## 1. Wiki Software Recommendation: Wiki.js
**Why Wiki.js:**
- Modern, open-source (AGPL-3.0), actively maintained
- First-class Docker support with official image
- REST API built-in for automated content updates
- Markdown-based editing (great for AI-generated content)
- Git-based storage option for complete version control
- Built-in search, analytics, and access controls
- Lightweight (~200MB RAM)
- **Perfect for agent-only workflow:** Can disable all human editing entirely while retaining API write access
**Alternatives considered:**
| Software | Pros | Cons |
|----------|------|------|
| MediaWiki | Industry standard, massive extension ecosystem | Heavy, PHP, API is less developer-friendly |
| DokuWiki | Flat file, extremely simple | No native API, dated interface |
| BookStack | Structured organization | Less suited for interconnected knowledge |
| Wiki.js | Modern API, Git sync, Docker-native, read-only UI support | Younger project, smaller community |
---
## 2. System Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ EVE Online Wiki System │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ LangGraph Orchestration Layer │ │
│ │ ┌───────────────────────────────────────────────────────────────┐ │ │
│ │ │ StateGraph (Main Graph) │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │
│ │ │ │ Source │ │ Patch │ │External │ │ ESI │ │ │ │
│ │ │ │Harvester│ │ Monitor │ │ Monitor │ │Collector│ │ │ │
│ │ │ │ Node │ │ Node │ │ Node │ │ Node │ │ │ │
│ │ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │
│ │ │ └────────────┴─────────────┴─────────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ┌────────▼────────┐ │ │ │
│ │ │ │ Validation │ │ │ │
│ │ │ │ Subgraph │ │ │ │
│ │ │ │ (E → F → G) │ │ │ │
│ │ │ └────────┬────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ┌────────▼────────┐ │ │ │
│ │ │ │ Git Sync │ │ │ │
│ │ │ │ Node │ │ │ │
│ │ │ └────────┬────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ┌────────▼────────┐ │ │ │
│ │ │ │ Wiki.js API │ │ │ │
│ │ │ │ Node │ │ │ │
│ │ │ └─────────────────┘ │ │ │
│ │ └───────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ LangGraph Features: │ │
│ │ • Checkpointing: Durable state persistence across crashes │ │
│ │ • Conditional Edges: Dynamic routing based on validation results │ │
│ │ • Subgraphs: Nested validation pipeline (E→F→G) as single node │ │
│ │ • Streaming: Real-time token output from LLM agents │ │
│ │ • LangSmith: Built-in observability and tracing │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
**Infrastructure Stack:**
- **Wiki.js** - Content management (read-only UI)
- **PostgreSQL** - Primary database for Wiki.js (production-grade, required)
- **Redis** - State persistence backend for LangGraph checkpointing
- **LangGraph** - Agent orchestration framework (MIT licensed, built on LangChain)
- **LangSmith** - Observability platform for tracing agent executions
- **Git** - Content storage backend for version control
**Architecture Principles:**
- All content flows top-to-bottom, no exceptions
- 100% agent-only editing - no human accounts have write access
- All changes go through full validation pipeline before publication
- Complete audit trail maintained in Git storage backend
- Immutable content sources are isolated from publication layer
- Git sync acts as single source of truth for all content
- LangGraph maintains checkpoint state and full audit log via LangSmith traces
- No direct API writes from edge agents - all writes go through pipeline
- Daily batch updates for prose/content (not continuous) - scheduled at 02:00 UTC
- ESI structured data updates run on independent schedule (hourly for static data, daily for dynamic)
---
## 3. Agent Specifications
### Agent A: Initial Wiki Construction
**Purpose:** Seed the wiki with existing content from source sites.
**Inputs:**
- Source URLs (EVE University wiki, WCKG, CCP Support, ESI API)
- Content scope (what categories/sections to import)
- Deduplication and merge strategy
**Process:**
1. **Source Extraction:**
- **EVE University:** Utilize MediaWiki `api.php` for structured content retrieval (avoids HTML scraping issues).
- **WCKG:** Specialized Google Sites parser for dynamic content rendering.
- **CCP Support:** Content extraction with browser headers to bypass Cloudflare challenges.
2. Extract structured content (ships, modules, mechanics)
3. Normalize content format to Markdown
4. Extract all images and references
5. Compute content hash (SHA-256) for each page and skip if unchanged since last import
6. Create wiki pages with proper hierarchy (see [Page Hierarchy Strategy](docs/schema/hierarchy.md))
7. Tag pages with complete metadata
8. Generate cross-links between related pages
9. Pass all content to Validation Agent before publication
**Scheduling:** One-time run (with option to replay)
---
### Agent B: Patch Note Monitor
**Purpose:** Detect EVE Online patch changes and update affected wiki pages.
**Inputs:**
- RSS feed: `https://www.eveonline.com/rss/patch-notes` (Verified accessible via GET)
- Update frequency (recommended: daily)
- Affected page mapping (which pages relate to which game systems)
**Process:**
1. Poll RSS feed on schedule (Standard RSS 2.0 parsing)
2. Parse new patch entries using LLM content analysis
3. Identify *exact* content changes required for affected pages
4. Generate complete revised page content, not just append sections
5. Pass proposed changes to Validation Agent
6. If validation passes, apply update automatically
7. If validation fails, retry generation or flag for system alert
---
## 3.5 Infrastructure Protocols
### Git Sync Protocol
- **Single Source of Truth:** Git repository acts as the primary storage.
- **Bi-directional Sync:**
- Agent Write -> Git Commit -> Wiki.js Push
- Wiki.js renders directly from the Git-backed storage.
- **Repository Structure:**
- `/content`: Markdown files mapping to wiki paths (e.g., `content/ships/frigates/condor.md`)
- `/assets`: Images and files mapping to local paths.
- **Commit Format:** `[AGENT_ID] update: path/to/page (hash: abc123)`
### API Authentication
- **Strategy:** Bearer tokens with minimum scopes (`write:pages`, `write:assets`).
- **Storage:** Managed via environment variables.
- See [Wiki.js API Auth Strategy](docs/infrastructure/wikijs-auth.md) for details.
### Shared State
- **Schema:** Managed via LangGraph `WikiState` Pydantic model.
- See [State Schema Definition](docs/schema/state-schema.md) for details.
---
### Agent C: External Wiki Monitor
**Purpose:** Track changes on source wikis and refresh content.
**Inputs:**
- Monitored URLs and change detection rules
- Check frequency (recommended: weekly)
**Process:**
1. Poll source sites on schedule respecting robots.txt
2. Detect new pages or modified content
3. Compare against imported content hash in local wiki
4. Ignore minor formatting/link changes
5. Generate revised page content with merged changes
6. Pass proposed changes to Validation Agent
7. Apply update automatically on validation pass
---
### Agent D: ESI Data Collector
**Purpose:** Pull official structured data directly from CCP's ESI API.
**Inputs:**
- ESI API endpoints for ships, modules, items, skills
- Update frequency: hourly for static data (independent of daily batch), daily for dynamic data (part of 02:00 UTC batch)
**Process:**
1. Poll ESI API on schedule with proper rate limiting
2. Extract structured game data
3. Generate or update data-driven pages automatically
4. Merge structured data with human-readable content from other sources
5. Compute content hash and skip if unchanged since last poll
6. Pass proposed changes to Validation Agent
---
### Agent E: Content Validation & Review Agent
**Purpose:** Automated quality control for all content changes. **Replaces all human review.**
**Validation Rules:**
1. **Structural validation:** Markdown syntax, page hierarchy, metadata presence
2. **Content validation:** Factual consistency, no broken references, completeness
3. **Change validation:** Diff analysis, only expected changes applied, no unintended modifications
4. **Cross-reference validation:** All internal links resolve correctly
5. **TOS compliance:** Proper attribution included for all imported content
**Process:**
1. Receive proposed change from upstream agent
2. Run all validation checks
3. Generate confidence score (0-100%)
4. If score > 95%: Approve for publication
5. If score 70-95%: Request regeneration with feedback
6. If score <70%: Reject change and generate system alert
---
### Agent F: Numerical Validation Layer
**Purpose:** Rule-based validation for game data, separate from LLM validation. Catches LLM hallucinations in structured data.
**Validation Categories:**
| Data Type | Validation Rules | Source of Truth |
|-----------|------------------|-----------------|
| Ship stats | Base HP, velocity, slots, fitting stats within ±0% of ESI | ESI API |
| Module stats | CPU, PG, range, damage multipliers | ESI API |
| Skill requirements | Prerequisites match skill tree | ESI API |
| Fitting calculations | Must pass CPU/PG budget checks | Local calculation |
| Market data | Prices non-negative, volume non-negative | ESI API |
| Links/IDs | All typeIDs resolve to valid entities | ESI API lookup |
**Process:**
1. Extract all numerical values from proposed content
2. Cross-reference against ESI API for game data
3. Flag any discrepancies >0% for ship/module stats
4. Reject content with invalid typeIDs or broken references
5. Log all validation results for audit
**Override Rules:**
- Numerical validation failures = auto-reject (no LLM override possible)
- Historical content (archived ships/modules) flagged for manual review
---
### Agent G: Asset & Reference Handler
**Purpose:** Centralized management of all images, links, and external references.
**Process:**
1. Receive all extracted images and references from other agents
2. Download images to local storage (respecting copyright/attribution)
3. Rewrite all image URLs to local wiki paths
4. Rewrite all external links to reference original source
5. Add source attribution footer to all pages
6. Check for broken links on every update
7. Maintain asset integrity across all pages
---
## 4. Model Intelligence Tiers
Different agents require different levels of LLM reasoning. Using appropriate models reduces cost and improves reliability.
| Agent | Tier | Model Requirements | Justification |
|-------|------|--------------------|---------------|
| A: Source Harvester | Low | Basic extraction model (e.g., GPT-4o-mini, Claude Haiku) | Template-based extraction, structured output format |
| B: Patch Note Monitor | High | Strong reasoning model (e.g., Claude Sonnet, GPT-4o) | Requires understanding game mechanics to map changes to pages |
| C: External Wiki Monitor | Low | Basic extraction model | Simple change detection and content extraction |
| D: ESI Data Collector | None | No LLM needed | Pure API calls, structured data, programmatic transformation |
| E: Content Validation | Medium | Balanced model (e.g., Claude Sonnet) | Needs semantic understanding but structured validation rules |
| F: Numerical Validation | None | No LLM needed | Pure rule-based, deterministic validation |
| G: Asset Handler | Low | Basic model for categorization | Mostly file operations, minimal reasoning |
**Recommended Model Stack:**
- **High reasoning:** Claude 3.5 Sonnet / GPT-4o (Agents B, E)
- **Low cost:** Claude 3.5 Haiku / GPT-4o-mini (Agents A, C, G)
- **No LLM:** Agents D, F (programmatic only)
**Daily Batch Cost Estimate:**
With daily updates (not continuous), typical daily operations:
- ~10-20 patch note analyses (Agent B): ~$0.10-0.30
- ~50-100 content validations (Agent E): ~$0.50-1.00
- ~100-200 extractions (Agents A, C, G): ~$0.10-0.20
- **Daily total: ~$0.70-1.50** (note: subscription tiers have rate limits and token caps; bulk operations like initial import may temporarily exceed these, requiring pay-per-use fallback)
---
## 5. Content Schema Per Page Type
Each page type has a template with required fields that Agent E validates structurally before semantic validation.
### Ship Page Template
```yaml
page_type: ship
required_fields:
- name: string
- type_id: integer (ESI)
- group: string (e.g., "Interceptor", "Battleship")
- race: string (Caldari, Minmatar, Amarr, Gallente)
- hull_stats:
hp_shield: integer
hp_armor: integer
hp_structure: integer
- fitting_stats:
cpu_output: integer
powergrid_output: integer
high_slots: integer
med_slots: integer
low_slots: integer
rig_slots: integer
- velocity: integer
- skill_requirements: list[{skill_id: integer, level: integer}]
- description: string (prose, sourced from external wiki or generated)
optional_fields:
- role_bonus: string
- ship_bonus: list[string]
- capacitor_capacity: integer
- targeting_range: integer
- drone_bandwidth: integer
- probe_launcher_fitting: boolean
```
### Module Page Template
```yaml
page_type: module
required_fields:
- name: string
- type_id: integer (ESI)
- group: string (e.g., "Shield Booster", "Afterburner")
- slot: string (high, mid, low, rig)
- cpu_usage: integer
- powergrid_usage: integer
- description: string
optional_fields:
- duration: integer
- range: integer
- damage_multiplier: float
- skill_requirements: list[{skill_id: integer, level: integer}]
- meta_level: integer
- tech_level: integer (1 or 2)
```
### Mechanic/Guide Page Template
```yaml
page_type: mechanic
required_fields:
- title: string
- summary: string (1-3 sentences)
- categories: list[string]
- source: string (eve-university | wckg | ccp | generated)
- last_reviewed: date
optional_fields:
- related_ships: list[string]
- related_modules: list[string]
- related_mechanics: list[string]
- difficulty: string (beginner | intermediate | advanced)
```
### Validation Against Schema
Agent E enforces:
1. All `required_fields` present and non-empty
2. All integer fields contain valid integers (no strings, no nulls)
3. All `type_id` fields pass Agent F numerical validation against ESI
4. All `skill_requirements` reference valid typeIDs
5. Page type matches one of the defined templates (reject unknown types)
---
## 6. Agent Health Monitoring
LangGraph provides built-in checkpointing for state persistence, but agent-level health monitoring requires a separate heartbeat system.
**Heartbeat Protocol:**
- Each LangGraph node (agent) sends a `HEARTBEAT` message to Redis every 60 seconds during active operation, every 5 minutes when idle
- Heartbeat payload: `{ node_name, status: healthy|degraded|error, thread_id, last_completed_at, checkpoint_id }`
- Heartbeat registry uses Redis with TTL (3x interval for stale, 10x for dead)
**LangGraph Checkpointing:**
- LangGraph's `MemorySaver` or `PostgresSaver` persists graph state at each step
- Workflows resume exactly where they left off after crashes
- Checkpoint TTL configurable (24-48 hours for batch workflows, session-based for conversational)
**Staleness Detection:**
- If no heartbeat received within 3x the expected interval → mark agent as `stale`
- If no heartbeat received within 10x the expected interval → mark agent as `dead` and trigger critical alert
- Stale nodes: LangGraph checkpoint indicates last state, new invocations wait for recovery
- Dead nodes: halt dependent pipeline stages, escalate alert
**LangSmith Integration:**
- Every LLM call, tool invocation, and state transition emits traces to LangSmith
- QueryLangSmith audit logs for execution history, latency, token usage
- Alerts configured via LangSmith webhooks for validation failures
**Alerting:**
- Agent status transitions emit events to the audit log
- Critical alerts (dead node, repeated validation failures, checkpoint gaps > threshold) notify via configured channel (webhook, email, etc.)
---
## 7. Standard Page Metadata
All pages will include standard frontmatter:
```yaml
source: eve-university | wckg | ccp | esi | generated
source_url: https://...
imported_date: 2026-04-16
last_updated: 2026-04-16
last_validated: 2026-04-16
update_frequency: daily | weekly | monthly
validation_score: 98
categories: [ships, pvp, modules, industry]
```
---
## 8. Implementation Phases
### Phase 0: Pre-Work & Compliance
- Confirm scraping TOS with source wiki maintainers
- Implement rate limiting and proper User-Agent headers
- Define metadata schema and validation rules
- Test content extraction on sample pages
### Phase 1: Foundation
- Deploy PostgreSQL database via Docker (production configuration)
- Deploy Redis instance for LangGraph checkpointing + heartbeat registry
- Deploy Wiki.js via Docker (connected to PostgreSQL)
- **Disable all human write permissions** - configure API-only write access
- Configure Git storage backend for complete change history
- Configure Git sync layer as single source of truth
- Set up HTTPS and domain routing
- Establish automated backup strategy
- Deploy LangGraph with `StateGraph` defining all agent nodes and edges
- Configure LangSmith for observability (tracing, audit logs)
- Deploy agent heartbeat monitoring (Redis TTL registry)
### Phase 2: Content Pipeline
- Deploy Source Harvester Agent
- Deploy Validation Agent
- Deploy Asset Handler Agent
- Deploy ESI Data Collector Agent
- Execute initial import with full validation pipeline
- Establish content quality baseline
### Phase 2.5: Smoke Test
- Run Agent A on 50 representative pages across all page types (ships, modules, mechanics, guides)
- Pass all 50 pages through the full validation pipeline (Agents E + F + G)
- Calibrate validation thresholds based on results (adjust confidence scoring weights)
- Verify merge logic when ESI data and external wiki content overlap on same pages
- Confirm Git sync round-trip: write → Git → Wiki.js render matches expected output
- Identify and fix integration bugs before full import
- Document baseline validation pass rate and failure patterns
### Phase 3: Automated Monitoring
- Deploy Patch Note Monitor Agent
- Implement LLM-based patch parsing and content generation
- Configure validation thresholds
- Test end-to-end update workflow
### Phase 4: External Change Tracking
- Deploy External Wiki Monitor Agent
- Configure source site monitoring
- Implement change detection and merge logic
- Set up system alerting for failures
### Phase 5: Major Expansion Handling
- Create expansion detection webhook (CCP announces expansions 2-4 weeks ahead)
- Build bulk update workflow for expansion releases
- Implement "freeze" mode during expansion deployment (content locked until ESI stabilizes)
- Create post-expansion audit job to verify all affected pages
- Document expansion runbook for manual triggering
**Expansion Workflow:**
1. Expansion announced → Create tracking ticket
2. Expansion deploys → Freeze wiki updates, wait for ESI stability (typically 24-48h)
3. Run bulk ESI sync → Update all ship/module/item pages
4. Run Patch Note Agent → Process expansion notes, generate new pages
5. Run full validation → All pages validated against new ESI data
6. Unfreeze → Resume daily batch updates
---
## 9. Validation Questions
### Wiki Infrastructure
468: 1. **Hosting requirements:** What server/container host will run this? (RAM/CPU allocation)
469: 2. **Access & secrets management:** Plan for storing ESI credentials, Git credentials, and Wiki.js API tokens in a secrets manager (e.g., Vault, AWS Secrets Manager).
470: 3. **Backup requirements:** How many days of backup retention are required?
471: 4. **User access:** Will this wiki be public read-only, or require authentication?
472: 5. **Storage:** How much content do you anticipate? (affects storage planning)
### Content Scope
475: 6. **Priority domains:** Should we prioritize specific game aspects? (PVP, mining, industry, nullsec, etc.)
476: 7. **Content age:** Should imported content include historical versions, or only current state?
477: 8. **Completeness threshold:** What's an acceptable import percentage? (80% of pages vs. all)
### Agent Behavior
480: 9. **Validation threshold:** What minimum validation score should be required for auto-approval? (Recommended: 95%)
481: 10. **Conflict resolution:** If multiple sources have conflicting information, which source takes priority?
482: 11. **Update frequency:** How fresh should content be? (real-time, daily, weekly)
483: 12. **Alerting:** How should the system notify on validation failures or errors?
### Operational
486: 13. **Monitoring access:** Do you have access to the Nginx Proxy Manager instance for SSL/proxy configuration?
487: 14. **Container management:** Will you use Komodo or another container management platform, or manual Docker?
488: 15. **Error handling:** Should the system pause and alert on repeated failures, or continue with skipped items?
### 10. Next Steps
Once questions are answered, I can:
1. Provide detailed Docker Compose configuration for Wiki.js with read-only UI and secrets integration
2. Design the LangGraph StateGraph specification (node definitions, edge conditions, state schema)
3. Define the patch-note-to-wiki mapping schema
4. Create the content import runbook for Agent A
5. Implement the standard metadata schema and validation rules
6. Configure LangSmith dashboards for wiki content monitoring

View File

@@ -0,0 +1,117 @@
# EVE Online Wiki Plan Review - Implementation Blockers
Review completed 2026-04-16. Legal issues excluded per request.
---
## ✅ RESOLVED BLOCKERS
### 1. State Schema Definition
- **Status**: RESOLVED
- **Reference**: [State Schema Definition](docs/schema/state-schema.md)
### 2. Wiki.js API Authentication Flow
- **Status**: RESOLVED
- **Reference**: [Wiki.js API Auth Strategy](docs/infrastructure/wikijs-auth.md)
### 3. ESI Client Specification
- **Status**: RESOLVED
- **Reference**: [ESI Client Design](docs/infrastructure/esi-client.md)
### 4. Content Hash Algorithm
- **Status**: RESOLVED
- **Outcome**: SHA-256 standardized.
### 5. Git Sync Layer Specification
- **Status**: RESOLVED
- **Reference**: [Git Sync Protocol](docs/infrastructure/git-sync.md)
### 6. Validation Agent Scoring Formula
- **Status**: RESOLVED
- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md)
- **Outcome**: Defined 4-category weighted scoring with "Must Pass" logic.
### 7. Page Hierarchy Specification
- **Status**: RESOLVED
- **Reference**: [Wiki Page Hierarchy](docs/schema/hierarchy.md)
### 8. Failure Handling Behavior
- **Status**: RESOLVED
- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md)
- **Outcome**: Established 3-Tier response matrix and feedback loop logic.
### 11. Rate Limiting Specifications
- **Status**: RESOLVED
- **Reference**: [Quality Control Protocol](docs/validation/quality-control.md)
- **Outcome**: Defined specific req/sec limits for ESI, MediaWiki, and Wiki.js.
---
## 🚨 CRITICAL BLOCKERS (Cannot start implementation without these)
*No critical blockers remain.*
---
## 🟡 MEDIUM IMPACT ISSUES (Need resolution before Phase 2)
---
## 🟡 MEDIUM IMPACT ISSUES (Need resolution before Phase 2)
### 9. Cross-link Generation Logic Missing
- **Location**: Agent A (line 114)
- **Issue**: "Generate cross-links between related pages" has no logic defined
- **Missing**: Link extraction rules, entity matching strategy, and placement rules
### 10. Content Merge Strategy Undefined
- **Location**: Agent D (line 172)
- **Issue**: "Merge structured data with human-readable content" has no strategy
- **Impact**: Cannot resolve conflicts between ESI data and wiki content
### 11. No Rate Limiting Specifications
- **Location**: All external agents
- **Issue**: Only Agent A specifies rate limiting (1 request/second)
- **Missing**: Rate limits for:
- ESI API
- External wiki scraping
- Wiki.js API writes
- Git operations
### 12. Page Template Schema Incomplete
- **Location**: Content schema section
- **Missing fields**:
- Ship pages: capacitor, targeting, drone stats marked optional but required for validation
- No schema for skill pages, item pages, or faction pages
---
## 🟢 MINOR ISSUES / CLARIFICATIONS NEEDED
### 13. LangGraph Checkpoint Implementation
- **Location**: Line 361
- **Issue**: Mentions both `MemorySaver` and `PostgresSaver` but doesn't specify which to use
- **Recommendation**: Use `PostgresSaver` for production durability
### 14. Agent Scheduling Mechanism
- **Location**: All agent specifications
- **Issue**: No specification for how scheduled agents will be triggered
- **Options**: Cron jobs, LangGraph timers, external scheduler
### 15. Asset Storage Strategy
- **Location**: Agent G
- **Issue**: "Download images to local storage" doesn't specify storage backend
- **Options**: Wiki.js asset manager, S3, local filesystem
---
## 📋 RECOMMENDED NEXT STEPS
To unblock implementation immediately, resolve these **4 critical items first**:
1. Define the shared `State` schema for LangGraph
2. Specify Wiki.js API authentication strategy
3. Define ESI client implementation requirements
4. Specify content hashing algorithm and storage
No showstopper architectural issues were identified. The design is sound and follows best practices for agent orchestration with LangGraph.