25 KiB
EVE Online Automated Wiki System - High Level Plan
1. Wiki Software Recommendation: Wiki.js
Why Wiki.js:
- Modern, open-source (AGPL-3.0), actively maintained
- First-class Docker support with official image
- REST API built-in for automated content updates
- Markdown-based editing (great for AI-generated content)
- Git-based storage option for complete version control
- Built-in search, analytics, and access controls
- Lightweight (~200MB RAM)
- Perfect for agent-only workflow: Can disable all human editing entirely while retaining API write access
Alternatives considered:
| Software | Pros | Cons |
|---|---|---|
| MediaWiki | Industry standard, massive extension ecosystem | Heavy, PHP, API is less developer-friendly |
| DokuWiki | Flat file, extremely simple | No native API, dated interface |
| BookStack | Structured organization | Less suited for interconnected knowledge |
| Wiki.js | Modern API, Git sync, Docker-native, read-only UI support | Younger project, smaller community |
2. System Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ EVE Online Wiki System │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ LangGraph Orchestration Layer │ │
│ │ ┌───────────────────────────────────────────────────────────────┐ │ │
│ │ │ StateGraph (Main Graph) │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │
│ │ │ │ Source │ │ Patch │ │External │ │ ESI │ │ │ │
│ │ │ │Harvester│ │ Monitor │ │ Monitor │ │Collector│ │ │ │
│ │ │ │ Node │ │ Node │ │ Node │ │ Node │ │ │ │
│ │ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │
│ │ │ └────────────┴─────────────┴─────────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ┌────────▼────────┐ │ │ │
│ │ │ │ Validation │ │ │ │
│ │ │ │ Subgraph │ │ │ │
│ │ │ │ (E → F → G) │ │ │ │
│ │ │ └────────┬────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ┌────────▼────────┐ │ │ │
│ │ │ │ Git Sync │ │ │ │
│ │ │ │ Node │ │ │ │
│ │ │ └────────┬────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ┌────────▼────────┐ │ │ │
│ │ │ │ Wiki.js API │ │ │ │
│ │ │ │ Node │ │ │ │
│ │ │ └─────────────────┘ │ │ │
│ │ └───────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ LangGraph Features: │ │
│ │ • Checkpointing: Durable state persistence across crashes │ │
│ │ • Conditional Edges: Dynamic routing based on validation results │ │
│ │ • Subgraphs: Nested validation pipeline (E→F→G) as single node │ │
│ │ • Streaming: Real-time token output from LLM agents │ │
│ │ • LangSmith: Built-in observability and tracing │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Infrastructure Stack:
- Wiki.js - Content management (read-only UI)
- PostgreSQL - Primary database for Wiki.js (production-grade, required)
- Redis - State persistence backend for LangGraph checkpointing
- LangGraph - Agent orchestration framework (MIT licensed, built on LangChain)
- LangSmith - Observability platform for tracing agent executions
- Git - Content storage backend for version control
Architecture Principles:
- All content flows top-to-bottom, no exceptions
- 100% agent-only editing - no human accounts have write access
- All changes go through full validation pipeline before publication
- Complete audit trail maintained in Git storage backend
- Immutable content sources are isolated from publication layer
- Git sync acts as single source of truth for all content
- LangGraph maintains checkpoint state and full audit log via LangSmith traces
- No direct API writes from edge agents - all writes go through pipeline
- Daily batch updates for prose/content (not continuous) - scheduled at 02:00 UTC
- ESI structured data updates run on independent schedule (hourly for static data, daily for dynamic)
3. Agent Specifications
Agent A: Initial Wiki Construction
Purpose: Seed the wiki with existing content from source sites.
Inputs:
- Source URLs (EVE University wiki, WCKG, CCP Support, ESI API)
- Content scope (what categories/sections to import)
- Deduplication and merge strategy
Process:
- Source Extraction:
- EVE University: Utilize MediaWiki
api.phpfor structured content retrieval (avoids HTML scraping issues). - WCKG: Specialized Google Sites parser for dynamic content rendering.
- CCP Support: Content extraction with browser headers to bypass Cloudflare challenges.
- EVE University: Utilize MediaWiki
- Extract structured content (ships, modules, mechanics)
- Normalize content format to Markdown
- Extract all images and references
- Compute content hash (SHA-256) for each page and skip if unchanged since last import
- Create wiki pages with proper hierarchy (see Page Hierarchy Strategy)
- Tag pages with complete metadata
- Generate cross-links between related pages
- Pass all content to Validation Agent before publication
Scheduling: One-time run (with option to replay)
Agent B: Patch Note Monitor
Purpose: Detect EVE Online patch changes and update affected wiki pages.
Inputs:
- RSS feed:
https://www.eveonline.com/rss/patch-notes(Verified accessible via GET) - Update frequency (recommended: daily)
- Affected page mapping (which pages relate to which game systems)
Process:
- Poll RSS feed on schedule (Standard RSS 2.0 parsing)
- Parse new patch entries using LLM content analysis
- Identify exact content changes required for affected pages
- Generate complete revised page content, not just append sections
- Pass proposed changes to Validation Agent
- If validation passes, apply update automatically
- If validation fails, retry generation or flag for system alert
3.5 Infrastructure Protocols
Git Sync Protocol
- Single Source of Truth: Git repository acts as the primary storage.
- Bi-directional Sync:
- Agent Write -> Git Commit -> Wiki.js Push
- Wiki.js renders directly from the Git-backed storage.
- Repository Structure:
/content: Markdown files mapping to wiki paths (e.g.,content/ships/frigates/condor.md)/assets: Images and files mapping to local paths.
- Commit Format:
[AGENT_ID] update: path/to/page (hash: abc123)
API Authentication
- Strategy: Bearer tokens with minimum scopes (
write:pages,write:assets). - Storage: Managed via environment variables.
- See Wiki.js API Auth Strategy for details.
Shared State
- Schema: Managed via LangGraph
WikiStatePydantic model. - See State Schema Definition for details.
Agent C: External Wiki Monitor
Purpose: Track changes on source wikis and refresh content.
Inputs:
- Monitored URLs and change detection rules
- Check frequency (recommended: weekly)
Process:
- Poll source sites on schedule respecting robots.txt
- Detect new pages or modified content
- Compare against imported content hash in local wiki
- Ignore minor formatting/link changes
- Generate revised page content with merged changes
- Pass proposed changes to Validation Agent
- Apply update automatically on validation pass
Agent D: ESI Data Collector
Purpose: Pull official structured data directly from CCP's ESI API.
Inputs:
- ESI API endpoints for ships, modules, items, skills
- Update frequency: hourly for static data (independent of daily batch), daily for dynamic data (part of 02:00 UTC batch)
Process:
- Poll ESI API on schedule with proper rate limiting
- Extract structured game data
- Generate or update data-driven pages automatically
- Merge structured data with human-readable content from other sources
- Compute content hash and skip if unchanged since last poll
- Pass proposed changes to Validation Agent
Agent E: Content Validation & Review Agent
Purpose: Automated quality control for all content changes. Replaces all human review.
Validation Rules:
- Structural validation: Markdown syntax, page hierarchy, metadata presence
- Content validation: Factual consistency, no broken references, completeness
- Change validation: Diff analysis, only expected changes applied, no unintended modifications
- Cross-reference validation: All internal links resolve correctly
- TOS compliance: Proper attribution included for all imported content
Process:
- Receive proposed change from upstream agent
- Run all validation checks
- Generate confidence score (0-100%)
- If score > 95%: Approve for publication
- If score 70-95%: Request regeneration with feedback
- If score <70%: Reject change and generate system alert
Agent F: Numerical Validation Layer
Purpose: Rule-based validation for game data, separate from LLM validation. Catches LLM hallucinations in structured data.
Validation Categories:
| Data Type | Validation Rules | Source of Truth |
|---|---|---|
| Ship stats | Base HP, velocity, slots, fitting stats within ±0% of ESI | ESI API |
| Module stats | CPU, PG, range, damage multipliers | ESI API |
| Skill requirements | Prerequisites match skill tree | ESI API |
| Fitting calculations | Must pass CPU/PG budget checks | Local calculation |
| Market data | Prices non-negative, volume non-negative | ESI API |
| Links/IDs | All typeIDs resolve to valid entities | ESI API lookup |
Process:
- Extract all numerical values from proposed content
- Cross-reference against ESI API for game data
- Flag any discrepancies >0% for ship/module stats
- Reject content with invalid typeIDs or broken references
- Log all validation results for audit
Override Rules:
- Numerical validation failures = auto-reject (no LLM override possible)
- Historical content (archived ships/modules) flagged for manual review
Agent G: Asset & Reference Handler
Purpose: Centralized management of all images, links, and external references.
Process:
- Receive all extracted images and references from other agents
- Download images to local storage (respecting copyright/attribution)
- Rewrite all image URLs to local wiki paths
- Rewrite all external links to reference original source
- Add source attribution footer to all pages
- Check for broken links on every update
- Maintain asset integrity across all pages
4. Model Intelligence Tiers
Different agents require different levels of LLM reasoning. Using appropriate models reduces cost and improves reliability.
| Agent | Tier | Model Requirements | Justification |
|---|---|---|---|
| A: Source Harvester | Low | Basic extraction model (e.g., GPT-4o-mini, Claude Haiku) | Template-based extraction, structured output format |
| B: Patch Note Monitor | High | Strong reasoning model (e.g., Claude Sonnet, GPT-4o) | Requires understanding game mechanics to map changes to pages |
| C: External Wiki Monitor | Low | Basic extraction model | Simple change detection and content extraction |
| D: ESI Data Collector | None | No LLM needed | Pure API calls, structured data, programmatic transformation |
| E: Content Validation | Medium | Balanced model (e.g., Claude Sonnet) | Needs semantic understanding but structured validation rules |
| F: Numerical Validation | None | No LLM needed | Pure rule-based, deterministic validation |
| G: Asset Handler | Low | Basic model for categorization | Mostly file operations, minimal reasoning |
Recommended Model Stack:
- High reasoning: Claude 3.5 Sonnet / GPT-4o (Agents B, E)
- Low cost: Claude 3.5 Haiku / GPT-4o-mini (Agents A, C, G)
- No LLM: Agents D, F (programmatic only)
Daily Batch Cost Estimate: With daily updates (not continuous), typical daily operations:
- ~10-20 patch note analyses (Agent B): ~$0.10-0.30
- ~50-100 content validations (Agent E): ~$0.50-1.00
- ~100-200 extractions (Agents A, C, G): ~$0.10-0.20
- Daily total: ~$0.70-1.50 (note: subscription tiers have rate limits and token caps; bulk operations like initial import may temporarily exceed these, requiring pay-per-use fallback)
5. Content Schema Per Page Type
Each page type has a template with required fields that Agent E validates structurally before semantic validation.
Ship Page Template
page_type: ship
required_fields:
- name: string
- type_id: integer (ESI)
- group: string (e.g., "Interceptor", "Battleship")
- race: string (Caldari, Minmatar, Amarr, Gallente)
- hull_stats:
hp_shield: integer
hp_armor: integer
hp_structure: integer
- fitting_stats:
cpu_output: integer
powergrid_output: integer
high_slots: integer
med_slots: integer
low_slots: integer
rig_slots: integer
- velocity: integer
- skill_requirements: list[{skill_id: integer, level: integer}]
- description: string (prose, sourced from external wiki or generated)
optional_fields:
- role_bonus: string
- ship_bonus: list[string]
- capacitor_capacity: integer
- targeting_range: integer
- drone_bandwidth: integer
- probe_launcher_fitting: boolean
Module Page Template
page_type: module
required_fields:
- name: string
- type_id: integer (ESI)
- group: string (e.g., "Shield Booster", "Afterburner")
- slot: string (high, mid, low, rig)
- cpu_usage: integer
- powergrid_usage: integer
- description: string
optional_fields:
- duration: integer
- range: integer
- damage_multiplier: float
- skill_requirements: list[{skill_id: integer, level: integer}]
- meta_level: integer
- tech_level: integer (1 or 2)
Mechanic/Guide Page Template
page_type: mechanic
required_fields:
- title: string
- summary: string (1-3 sentences)
- categories: list[string]
- source: string (eve-university | wckg | ccp | generated)
- last_reviewed: date
optional_fields:
- related_ships: list[string]
- related_modules: list[string]
- related_mechanics: list[string]
- difficulty: string (beginner | intermediate | advanced)
Validation Against Schema
Agent E enforces:
- All
required_fieldspresent and non-empty - All integer fields contain valid integers (no strings, no nulls)
- All
type_idfields pass Agent F numerical validation against ESI - All
skill_requirementsreference valid typeIDs - Page type matches one of the defined templates (reject unknown types)
6. Agent Health Monitoring
LangGraph provides built-in checkpointing for state persistence, but agent-level health monitoring requires a separate heartbeat system.
Heartbeat Protocol:
- Each LangGraph node (agent) sends a
HEARTBEATmessage to Redis every 60 seconds during active operation, every 5 minutes when idle - Heartbeat payload:
{ node_name, status: healthy|degraded|error, thread_id, last_completed_at, checkpoint_id } - Heartbeat registry uses Redis with TTL (3x interval for stale, 10x for dead)
LangGraph Checkpointing:
- LangGraph's
MemorySaverorPostgresSaverpersists graph state at each step - Workflows resume exactly where they left off after crashes
- Checkpoint TTL configurable (24-48 hours for batch workflows, session-based for conversational)
Staleness Detection:
- If no heartbeat received within 3x the expected interval → mark agent as
stale - If no heartbeat received within 10x the expected interval → mark agent as
deadand trigger critical alert - Stale nodes: LangGraph checkpoint indicates last state, new invocations wait for recovery
- Dead nodes: halt dependent pipeline stages, escalate alert
LangSmith Integration:
- Every LLM call, tool invocation, and state transition emits traces to LangSmith
- QueryLangSmith audit logs for execution history, latency, token usage
- Alerts configured via LangSmith webhooks for validation failures
Alerting:
- Agent status transitions emit events to the audit log
- Critical alerts (dead node, repeated validation failures, checkpoint gaps > threshold) notify via configured channel (webhook, email, etc.)
7. Standard Page Metadata
All pages will include standard frontmatter:
source: eve-university | wckg | ccp | esi | generated
source_url: https://...
imported_date: 2026-04-16
last_updated: 2026-04-16
last_validated: 2026-04-16
update_frequency: daily | weekly | monthly
validation_score: 98
categories: [ships, pvp, modules, industry]
8. Implementation Phases
Phase 0: Pre-Work & Compliance
- Confirm scraping TOS with source wiki maintainers
- Implement rate limiting and proper User-Agent headers
- Define metadata schema and validation rules
- Test content extraction on sample pages
Phase 1: Foundation
- Deploy PostgreSQL database via Docker (production configuration)
- Deploy Redis instance for LangGraph checkpointing + heartbeat registry
- Deploy Wiki.js via Docker (connected to PostgreSQL)
- Disable all human write permissions - configure API-only write access
- Configure Git storage backend for complete change history
- Configure Git sync layer as single source of truth
- Set up HTTPS and domain routing
- Establish automated backup strategy
- Deploy LangGraph with
StateGraphdefining all agent nodes and edges - Configure LangSmith for observability (tracing, audit logs)
- Deploy agent heartbeat monitoring (Redis TTL registry)
Phase 2: Content Pipeline
- Deploy Source Harvester Agent
- Deploy Validation Agent
- Deploy Asset Handler Agent
- Deploy ESI Data Collector Agent
- Execute initial import with full validation pipeline
- Establish content quality baseline
Phase 2.5: Smoke Test
- Run Agent A on 50 representative pages across all page types (ships, modules, mechanics, guides)
- Pass all 50 pages through the full validation pipeline (Agents E + F + G)
- Calibrate validation thresholds based on results (adjust confidence scoring weights)
- Verify merge logic when ESI data and external wiki content overlap on same pages
- Confirm Git sync round-trip: write → Git → Wiki.js render matches expected output
- Identify and fix integration bugs before full import
- Document baseline validation pass rate and failure patterns
Phase 3: Automated Monitoring
- Deploy Patch Note Monitor Agent
- Implement LLM-based patch parsing and content generation
- Configure validation thresholds
- Test end-to-end update workflow
Phase 4: External Change Tracking
- Deploy External Wiki Monitor Agent
- Configure source site monitoring
- Implement change detection and merge logic
- Set up system alerting for failures
Phase 5: Major Expansion Handling
- Create expansion detection webhook (CCP announces expansions 2-4 weeks ahead)
- Build bulk update workflow for expansion releases
- Implement "freeze" mode during expansion deployment (content locked until ESI stabilizes)
- Create post-expansion audit job to verify all affected pages
- Document expansion runbook for manual triggering
Expansion Workflow:
- Expansion announced → Create tracking ticket
- Expansion deploys → Freeze wiki updates, wait for ESI stability (typically 24-48h)
- Run bulk ESI sync → Update all ship/module/item pages
- Run Patch Note Agent → Process expansion notes, generate new pages
- Run full validation → All pages validated against new ESI data
- Unfreeze → Resume daily batch updates
9. Validation Questions
Wiki Infrastructure
468: 1. Hosting requirements: What server/container host will run this? (RAM/CPU allocation) 469: 2. Access & secrets management: Plan for storing ESI credentials, Git credentials, and Wiki.js API tokens in a secrets manager (e.g., Vault, AWS Secrets Manager). 470: 3. Backup requirements: How many days of backup retention are required? 471: 4. User access: Will this wiki be public read-only, or require authentication? 472: 5. Storage: How much content do you anticipate? (affects storage planning)
Content Scope
475: 6. Priority domains: Should we prioritize specific game aspects? (PVP, mining, industry, nullsec, etc.) 476: 7. Content age: Should imported content include historical versions, or only current state? 477: 8. Completeness threshold: What's an acceptable import percentage? (80% of pages vs. all)
Agent Behavior
480: 9. Validation threshold: What minimum validation score should be required for auto-approval? (Recommended: 95%) 481: 10. Conflict resolution: If multiple sources have conflicting information, which source takes priority? 482: 11. Update frequency: How fresh should content be? (real-time, daily, weekly) 483: 12. Alerting: How should the system notify on validation failures or errors?
Operational
486: 13. Monitoring access: Do you have access to the Nginx Proxy Manager instance for SSL/proxy configuration? 487: 14. Container management: Will you use Komodo or another container management platform, or manual Docker? 488: 15. Error handling: Should the system pause and alert on repeated failures, or continue with skipped items?
10. Next Steps
Once questions are answered, I can:
- Provide detailed Docker Compose configuration for Wiki.js with read-only UI and secrets integration
- Design the LangGraph StateGraph specification (node definitions, edge conditions, state schema)
- Define the patch-note-to-wiki mapping schema
- Create the content import runbook for Agent A
- Implement the standard metadata schema and validation rules
- Configure LangSmith dashboards for wiki content monitoring