b3nw/eve-wiki-content

Fork 0

Files

b3nw 31c3083ff4 Initial wiki structure

2026-04-19 03:57:03 +00:00

25 KiB

Raw Blame History

EVE Online Automated Wiki System - High Level Plan

1. Wiki Software Recommendation: Wiki.js

Why Wiki.js:

Modern, open-source (AGPL-3.0), actively maintained
First-class Docker support with official image
REST API built-in for automated content updates
Markdown-based editing (great for AI-generated content)
Git-based storage option for complete version control
Built-in search, analytics, and access controls
Lightweight (~200MB RAM)
Perfect for agent-only workflow: Can disable all human editing entirely while retaining API write access

Alternatives considered:

Software	Pros	Cons
MediaWiki	Industry standard, massive extension ecosystem	Heavy, PHP, API is less developer-friendly
DokuWiki	Flat file, extremely simple	No native API, dated interface
BookStack	Structured organization	Less suited for interconnected knowledge
Wiki.js	Modern API, Git sync, Docker-native, read-only UI support	Younger project, smaller community

2. System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│ EVE Online Wiki System                                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│ ┌─────────────────────────────────────────────────────────────────────┐    │
│ │                     LangGraph Orchestration Layer                     │    │
│ │  ┌───────────────────────────────────────────────────────────────┐  │    │
│ │  │                    StateGraph (Main Graph)                     │  │    │
│ │  │                                                                │  │    │
│ │  │  ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐      │  │    │
│ │  │  │ Source  │   │ Patch   │   │External │   │   ESI   │      │  │    │
│ │  │  │Harvester│   │ Monitor │   │ Monitor │   │Collector│      │  │    │
│ │  │  │  Node   │   │  Node   │   │  Node   │   │  Node   │      │  │    │
│ │  │  └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘      │  │    │
│ │  │       └────────────┴─────────────┴─────────────┘            │  │    │
│ │  │                        │                                    │  │    │
│ │  │               ┌────────▼────────┐                          │  │    │
│ │  │               │   Validation    │                          │  │    │
│ │  │               │   Subgraph      │                          │  │    │
│ │  │               │ (E → F → G)     │                          │  │    │
│ │  │               └────────┬────────┘                          │  │    │
│ │  │                        │                                    │  │    │
│ │  │               ┌────────▼────────┐                          │  │    │
│ │  │               │    Git Sync     │                          │  │    │
│ │  │               │     Node        │                          │  │    │
│ │  │               └────────┬────────┘                          │  │    │
│ │  │                        │                                    │  │    │
│ │  │               ┌────────▼────────┐                          │  │    │
│ │  │               │   Wiki.js API   │                          │  │    │
│ │  │               │     Node        │                          │  │    │
│ │  │               └─────────────────┘                          │  │    │
│ │  └───────────────────────────────────────────────────────────────┘  │    │
│ │                                                                       │    │
│ │  LangGraph Features:                                                  │    │
│ │  • Checkpointing: Durable state persistence across crashes           │    │
│ │  • Conditional Edges: Dynamic routing based on validation results    │    │
│ │  • Subgraphs: Nested validation pipeline (E→F→G) as single node     │    │
│ │  • Streaming: Real-time token output from LLM agents                │    │
│ │  • LangSmith: Built-in observability and tracing                    │    │
│ └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Infrastructure Stack:

Wiki.js - Content management (read-only UI)
PostgreSQL - Primary database for Wiki.js (production-grade, required)
Redis - State persistence backend for LangGraph checkpointing
LangGraph - Agent orchestration framework (MIT licensed, built on LangChain)
LangSmith - Observability platform for tracing agent executions
Git - Content storage backend for version control

Architecture Principles:

All content flows top-to-bottom, no exceptions
100% agent-only editing - no human accounts have write access
All changes go through full validation pipeline before publication
Complete audit trail maintained in Git storage backend
Immutable content sources are isolated from publication layer
Git sync acts as single source of truth for all content
LangGraph maintains checkpoint state and full audit log via LangSmith traces
No direct API writes from edge agents - all writes go through pipeline
Daily batch updates for prose/content (not continuous) - scheduled at 02:00 UTC
ESI structured data updates run on independent schedule (hourly for static data, daily for dynamic)

3. Agent Specifications

Agent A: Initial Wiki Construction

Purpose: Seed the wiki with existing content from source sites.

Inputs:

Source URLs (EVE University wiki, WCKG, CCP Support, ESI API)
Content scope (what categories/sections to import)
Deduplication and merge strategy

Process:

Source Extraction:
- EVE University: Utilize MediaWiki api.php for structured content retrieval (avoids HTML scraping issues).
- WCKG: Specialized Google Sites parser for dynamic content rendering.
- CCP Support: Content extraction with browser headers to bypass Cloudflare challenges.
Extract structured content (ships, modules, mechanics)
Normalize content format to Markdown
Extract all images and references
Compute content hash (SHA-256) for each page and skip if unchanged since last import
Create wiki pages with proper hierarchy (see Page Hierarchy Strategy)
Tag pages with complete metadata
Generate cross-links between related pages
Pass all content to Validation Agent before publication

Scheduling: One-time run (with option to replay)

Agent B: Patch Note Monitor

Purpose: Detect EVE Online patch changes and update affected wiki pages.

Inputs:

RSS feed: https://www.eveonline.com/rss/patch-notes (Verified accessible via GET)
Update frequency (recommended: daily)
Affected page mapping (which pages relate to which game systems)

Process:

Poll RSS feed on schedule (Standard RSS 2.0 parsing)
Parse new patch entries using LLM content analysis
Identify exact content changes required for affected pages
Generate complete revised page content, not just append sections
Pass proposed changes to Validation Agent
If validation passes, apply update automatically
If validation fails, retry generation or flag for system alert

3.5 Infrastructure Protocols

Git Sync Protocol

Single Source of Truth: Git repository acts as the primary storage.
Bi-directional Sync:
- Agent Write -> Git Commit -> Wiki.js Push
- Wiki.js renders directly from the Git-backed storage.
Repository Structure:
- /content: Markdown files mapping to wiki paths (e.g., content/ships/frigates/condor.md)
- /assets: Images and files mapping to local paths.
Commit Format: [AGENT_ID] update: path/to/page (hash: abc123)

API Authentication

Strategy: Bearer tokens with minimum scopes (write:pages, write:assets).
Storage: Managed via environment variables.
See Wiki.js API Auth Strategy for details.

Shared State

Schema: Managed via LangGraph WikiState Pydantic model.
See State Schema Definition for details.

Agent C: External Wiki Monitor

Purpose: Track changes on source wikis and refresh content.

Inputs:

Monitored URLs and change detection rules
Check frequency (recommended: weekly)

Process:

Poll source sites on schedule respecting robots.txt
Detect new pages or modified content
Compare against imported content hash in local wiki
Ignore minor formatting/link changes
Generate revised page content with merged changes
Pass proposed changes to Validation Agent
Apply update automatically on validation pass

Agent D: ESI Data Collector

Purpose: Pull official structured data directly from CCP's ESI API.

Inputs:

ESI API endpoints for ships, modules, items, skills
Update frequency: hourly for static data (independent of daily batch), daily for dynamic data (part of 02:00 UTC batch)

Process:

Poll ESI API on schedule with proper rate limiting
Extract structured game data
Generate or update data-driven pages automatically
Merge structured data with human-readable content from other sources
Compute content hash and skip if unchanged since last poll
Pass proposed changes to Validation Agent

Agent E: Content Validation & Review Agent

Purpose: Automated quality control for all content changes. Replaces all human review.

Validation Rules:

Structural validation: Markdown syntax, page hierarchy, metadata presence
Content validation: Factual consistency, no broken references, completeness
Change validation: Diff analysis, only expected changes applied, no unintended modifications
Cross-reference validation: All internal links resolve correctly
TOS compliance: Proper attribution included for all imported content

Process:

Receive proposed change from upstream agent
Run all validation checks
Generate confidence score (0-100%)
If score > 95%: Approve for publication
If score 70-95%: Request regeneration with feedback
If score <70%: Reject change and generate system alert

Agent F: Numerical Validation Layer

Purpose: Rule-based validation for game data, separate from LLM validation. Catches LLM hallucinations in structured data.

Validation Categories:

Data Type	Validation Rules	Source of Truth
Ship stats	Base HP, velocity, slots, fitting stats within ±0% of ESI	ESI API
Module stats	CPU, PG, range, damage multipliers	ESI API
Skill requirements	Prerequisites match skill tree	ESI API
Fitting calculations	Must pass CPU/PG budget checks	Local calculation
Market data	Prices non-negative, volume non-negative	ESI API
Links/IDs	All typeIDs resolve to valid entities	ESI API lookup

Process:

Extract all numerical values from proposed content
Cross-reference against ESI API for game data
Flag any discrepancies >0% for ship/module stats
Reject content with invalid typeIDs or broken references
Log all validation results for audit

Override Rules:

Numerical validation failures = auto-reject (no LLM override possible)
Historical content (archived ships/modules) flagged for manual review

Agent G: Asset & Reference Handler

Purpose: Centralized management of all images, links, and external references.

Process:

Receive all extracted images and references from other agents
Download images to local storage (respecting copyright/attribution)
Rewrite all image URLs to local wiki paths
Rewrite all external links to reference original source
Add source attribution footer to all pages
Check for broken links on every update
Maintain asset integrity across all pages

4. Model Intelligence Tiers

Different agents require different levels of LLM reasoning. Using appropriate models reduces cost and improves reliability.

Agent	Tier	Model Requirements	Justification
A: Source Harvester	Low	Basic extraction model (e.g., GPT-4o-mini, Claude Haiku)	Template-based extraction, structured output format
B: Patch Note Monitor	High	Strong reasoning model (e.g., Claude Sonnet, GPT-4o)	Requires understanding game mechanics to map changes to pages
C: External Wiki Monitor	Low	Basic extraction model	Simple change detection and content extraction
D: ESI Data Collector	None	No LLM needed	Pure API calls, structured data, programmatic transformation
E: Content Validation	Medium	Balanced model (e.g., Claude Sonnet)	Needs semantic understanding but structured validation rules
F: Numerical Validation	None	No LLM needed	Pure rule-based, deterministic validation
G: Asset Handler	Low	Basic model for categorization	Mostly file operations, minimal reasoning

Recommended Model Stack:

High reasoning: Claude 3.5 Sonnet / GPT-4o (Agents B, E)
Low cost: Claude 3.5 Haiku / GPT-4o-mini (Agents A, C, G)
No LLM: Agents D, F (programmatic only)

Daily Batch Cost Estimate: With daily updates (not continuous), typical daily operations:

~10-20 patch note analyses (Agent B): ~$0.10-0.30
~50-100 content validations (Agent E): ~$0.50-1.00
~100-200 extractions (Agents A, C, G): ~$0.10-0.20
Daily total: ~$0.70-1.50 (note: subscription tiers have rate limits and token caps; bulk operations like initial import may temporarily exceed these, requiring pay-per-use fallback)

5. Content Schema Per Page Type

Each page type has a template with required fields that Agent E validates structurally before semantic validation.

Ship Page Template

page_type: ship
required_fields:
  - name: string
  - type_id: integer (ESI)
  - group: string (e.g., "Interceptor", "Battleship")
  - race: string (Caldari, Minmatar, Amarr, Gallente)
  - hull_stats:
      hp_shield: integer
      hp_armor: integer
      hp_structure: integer
  - fitting_stats:
      cpu_output: integer
      powergrid_output: integer
      high_slots: integer
      med_slots: integer
      low_slots: integer
      rig_slots: integer
  - velocity: integer
  - skill_requirements: list[{skill_id: integer, level: integer}]
  - description: string (prose, sourced from external wiki or generated)
optional_fields:
  - role_bonus: string
  - ship_bonus: list[string]
  - capacitor_capacity: integer
  - targeting_range: integer
  - drone_bandwidth: integer
  - probe_launcher_fitting: boolean

Module Page Template

page_type: module
required_fields:
  - name: string
  - type_id: integer (ESI)
  - group: string (e.g., "Shield Booster", "Afterburner")
  - slot: string (high, mid, low, rig)
  - cpu_usage: integer
  - powergrid_usage: integer
  - description: string
optional_fields:
  - duration: integer
  - range: integer
  - damage_multiplier: float
  - skill_requirements: list[{skill_id: integer, level: integer}]
  - meta_level: integer
  - tech_level: integer (1 or 2)

Mechanic/Guide Page Template

page_type: mechanic
required_fields:
  - title: string
  - summary: string (1-3 sentences)
  - categories: list[string]
  - source: string (eve-university | wckg | ccp | generated)
  - last_reviewed: date
optional_fields:
  - related_ships: list[string]
  - related_modules: list[string]
  - related_mechanics: list[string]
  - difficulty: string (beginner | intermediate | advanced)

Validation Against Schema

Agent E enforces:

All required_fields present and non-empty
All integer fields contain valid integers (no strings, no nulls)
All type_id fields pass Agent F numerical validation against ESI
All skill_requirements reference valid typeIDs
Page type matches one of the defined templates (reject unknown types)

6. Agent Health Monitoring

LangGraph provides built-in checkpointing for state persistence, but agent-level health monitoring requires a separate heartbeat system.

Heartbeat Protocol:

Each LangGraph node (agent) sends a HEARTBEAT message to Redis every 60 seconds during active operation, every 5 minutes when idle
Heartbeat payload: { node_name, status: healthy|degraded|error, thread_id, last_completed_at, checkpoint_id }
Heartbeat registry uses Redis with TTL (3x interval for stale, 10x for dead)

LangGraph Checkpointing:

LangGraph's MemorySaver or PostgresSaver persists graph state at each step
Workflows resume exactly where they left off after crashes
Checkpoint TTL configurable (24-48 hours for batch workflows, session-based for conversational)

Staleness Detection:

If no heartbeat received within 3x the expected interval → mark agent as stale
If no heartbeat received within 10x the expected interval → mark agent as dead and trigger critical alert
Stale nodes: LangGraph checkpoint indicates last state, new invocations wait for recovery
Dead nodes: halt dependent pipeline stages, escalate alert

LangSmith Integration:

Every LLM call, tool invocation, and state transition emits traces to LangSmith
QueryLangSmith audit logs for execution history, latency, token usage
Alerts configured via LangSmith webhooks for validation failures

Alerting:

Agent status transitions emit events to the audit log
Critical alerts (dead node, repeated validation failures, checkpoint gaps > threshold) notify via configured channel (webhook, email, etc.)

7. Standard Page Metadata

All pages will include standard frontmatter:

source: eve-university | wckg | ccp | esi | generated
source_url: https://...
imported_date: 2026-04-16
last_updated: 2026-04-16
last_validated: 2026-04-16
update_frequency: daily | weekly | monthly
validation_score: 98
categories: [ships, pvp, modules, industry]

8. Implementation Phases

Phase 0: Pre-Work & Compliance

Confirm scraping TOS with source wiki maintainers
Implement rate limiting and proper User-Agent headers
Define metadata schema and validation rules
Test content extraction on sample pages

Phase 1: Foundation

Deploy PostgreSQL database via Docker (production configuration)
Deploy Redis instance for LangGraph checkpointing + heartbeat registry
Deploy Wiki.js via Docker (connected to PostgreSQL)
Disable all human write permissions - configure API-only write access
Configure Git storage backend for complete change history
Configure Git sync layer as single source of truth
Set up HTTPS and domain routing
Establish automated backup strategy
Deploy LangGraph with StateGraph defining all agent nodes and edges
Configure LangSmith for observability (tracing, audit logs)
Deploy agent heartbeat monitoring (Redis TTL registry)

Phase 2: Content Pipeline

Deploy Source Harvester Agent
Deploy Validation Agent
Deploy Asset Handler Agent
Deploy ESI Data Collector Agent
Execute initial import with full validation pipeline
Establish content quality baseline

Phase 2.5: Smoke Test

Run Agent A on 50 representative pages across all page types (ships, modules, mechanics, guides)
Pass all 50 pages through the full validation pipeline (Agents E + F + G)
Calibrate validation thresholds based on results (adjust confidence scoring weights)
Verify merge logic when ESI data and external wiki content overlap on same pages
Confirm Git sync round-trip: write → Git → Wiki.js render matches expected output
Identify and fix integration bugs before full import
Document baseline validation pass rate and failure patterns

Phase 3: Automated Monitoring

Deploy Patch Note Monitor Agent
Implement LLM-based patch parsing and content generation
Configure validation thresholds
Test end-to-end update workflow

Phase 4: External Change Tracking

Deploy External Wiki Monitor Agent
Configure source site monitoring
Implement change detection and merge logic
Set up system alerting for failures

Phase 5: Major Expansion Handling

Create expansion detection webhook (CCP announces expansions 2-4 weeks ahead)
Build bulk update workflow for expansion releases
Implement "freeze" mode during expansion deployment (content locked until ESI stabilizes)
Create post-expansion audit job to verify all affected pages
Document expansion runbook for manual triggering

Expansion Workflow:

Expansion announced → Create tracking ticket
Expansion deploys → Freeze wiki updates, wait for ESI stability (typically 24-48h)
Run bulk ESI sync → Update all ship/module/item pages
Run Patch Note Agent → Process expansion notes, generate new pages
Run full validation → All pages validated against new ESI data
Unfreeze → Resume daily batch updates

9. Validation Questions

Wiki Infrastructure

468: 1. Hosting requirements: What server/container host will run this? (RAM/CPU allocation) 469: 2. Access & secrets management: Plan for storing ESI credentials, Git credentials, and Wiki.js API tokens in a secrets manager (e.g., Vault, AWS Secrets Manager). 470: 3. Backup requirements: How many days of backup retention are required? 471: 4. User access: Will this wiki be public read-only, or require authentication? 472: 5. Storage: How much content do you anticipate? (affects storage planning)

Content Scope

475: 6. Priority domains: Should we prioritize specific game aspects? (PVP, mining, industry, nullsec, etc.) 476: 7. Content age: Should imported content include historical versions, or only current state? 477: 8. Completeness threshold: What's an acceptable import percentage? (80% of pages vs. all)

Agent Behavior

480: 9. Validation threshold: What minimum validation score should be required for auto-approval? (Recommended: 95%) 481: 10. Conflict resolution: If multiple sources have conflicting information, which source takes priority? 482: 11. Update frequency: How fresh should content be? (real-time, daily, weekly) 483: 12. Alerting: How should the system notify on validation failures or errors?

Operational

486: 13. Monitoring access: Do you have access to the Nginx Proxy Manager instance for SSL/proxy configuration? 487: 14. Container management: Will you use Komodo or another container management platform, or manual Docker? 488: 15. Error handling: Should the system pause and alert on repeated failures, or continue with skipped items?

10. Next Steps

Once questions are answered, I can:

Provide detailed Docker Compose configuration for Wiki.js with read-only UI and secrets integration
Design the LangGraph StateGraph specification (node definitions, edge conditions, state schema)
Define the patch-note-to-wiki mapping schema
Create the content import runbook for Agent A
Implement the standard metadata schema and validation rules
Configure LangSmith dashboards for wiki content monitoring

25 KiB Raw Blame History

EVE Online Automated Wiki System - High Level Plan

1. Wiki Software Recommendation: Wiki.js

2. System Architecture

3. Agent Specifications

Agent A: Initial Wiki Construction

Agent B: Patch Note Monitor

3.5 Infrastructure Protocols

Git Sync Protocol

API Authentication

Shared State

Agent C: External Wiki Monitor

Agent D: ESI Data Collector

Agent E: Content Validation & Review Agent

Agent F: Numerical Validation Layer

Agent G: Asset & Reference Handler

4. Model Intelligence Tiers

5. Content Schema Per Page Type

Ship Page Template

Module Page Template

Mechanic/Guide Page Template

Validation Against Schema

6. Agent Health Monitoring

7. Standard Page Metadata

8. Implementation Phases

Phase 0: Pre-Work & Compliance

Phase 1: Foundation

Phase 2: Content Pipeline

Phase 2.5: Smoke Test

Phase 3: Automated Monitoring

Phase 4: External Change Tracking

Phase 5: Major Expansion Handling

9. Validation Questions

Wiki Infrastructure

Content Scope

Agent Behavior

Operational

10. Next Steps

25 KiB

Raw Blame History