eve-wiki-content/eve-online-wiki-plan.md

# EVE Online Automated Wiki System - High Level Plan

## 1. Wiki Software Recommendation: Wiki.js

**Why Wiki.js:**
- Modern, open-source (AGPL-3.0), actively maintained
- First-class Docker support with official image
- REST API built-in for automated content updates
- Markdown-based editing (great for AI-generated content)
- Git-based storage option for complete version control
- Built-in search, analytics, and access controls
- Lightweight (~200MB RAM)
- **Perfect for agent-only workflow:** Can disable all human editing entirely while retaining API write access

**Alternatives considered:**

| Software | Pros | Cons |
|----------|------|------|
| MediaWiki | Industry standard, massive extension ecosystem | Heavy, PHP, API is less developer-friendly |
| DokuWiki | Flat file, extremely simple | No native API, dated interface |
| BookStack | Structured organization | Less suited for interconnected knowledge |
| Wiki.js | Modern API, Git sync, Docker-native, read-only UI support | Younger project, smaller community |

---

## 2. System Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ EVE Online Wiki System                                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│ ┌─────────────────────────────────────────────────────────────────────┐    │
│ │                     LangGraph Orchestration Layer                     │    │
│ │  ┌───────────────────────────────────────────────────────────────┐  │    │
│ │  │                    StateGraph (Main Graph)                     │  │    │
│ │  │                                                                │  │    │
│ │  │  ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐      │  │    │
│ │  │  │ Source  │   │ Patch   │   │External │   │   ESI   │      │  │    │
│ │  │  │Harvester│   │ Monitor │   │ Monitor │   │Collector│      │  │    │
│ │  │  │  Node   │   │  Node   │   │  Node   │   │  Node   │      │  │    │
│ │  │  └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘      │  │    │
│ │  │       └────────────┴─────────────┴─────────────┘            │  │    │
│ │  │                        │                                    │  │    │
│ │  │               ┌────────▼────────┐                          │  │    │
│ │  │               │   Validation    │                          │  │    │
│ │  │               │   Subgraph      │                          │  │    │
│ │  │               │ (E → F → G)     │                          │  │    │
│ │  │               └────────┬────────┘                          │  │    │
│ │  │                        │                                    │  │    │
│ │  │               ┌────────▼────────┐                          │  │    │
│ │  │               │    Git Sync     │                          │  │    │
│ │  │               │     Node        │                          │  │    │
│ │  │               └────────┬────────┘                          │  │    │
│ │  │                        │                                    │  │    │
│ │  │               ┌────────▼────────┐                          │  │    │
│ │  │               │   Wiki.js API   │                          │  │    │
│ │  │               │     Node        │                          │  │    │
│ │  │               └─────────────────┘                          │  │    │
│ │  └───────────────────────────────────────────────────────────────┘  │    │
│ │                                                                       │    │
│ │  LangGraph Features:                                                  │    │
│ │  • Checkpointing: Durable state persistence across crashes           │    │
│ │  • Conditional Edges: Dynamic routing based on validation results    │    │
│ │  • Subgraphs: Nested validation pipeline (E→F→G) as single node     │    │
│ │  • Streaming: Real-time token output from LLM agents                │    │
│ │  • LangSmith: Built-in observability and tracing                    │    │
│ └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

**Infrastructure Stack:**
- **Wiki.js** - Content management (read-only UI)
- **PostgreSQL** - Primary database for Wiki.js (production-grade, required)
- **Redis** - State persistence backend for LangGraph checkpointing
- **LangGraph** - Agent orchestration framework (MIT licensed, built on LangChain)
- **LangSmith** - Observability platform for tracing agent executions
- **Git** - Content storage backend for version control

**Architecture Principles:**
- All content flows top-to-bottom, no exceptions
- 100% agent-only editing - no human accounts have write access
- All changes go through full validation pipeline before publication
- Complete audit trail maintained in Git storage backend
- Immutable content sources are isolated from publication layer
- Git sync acts as single source of truth for all content
- LangGraph maintains checkpoint state and full audit log via LangSmith traces
- No direct API writes from edge agents - all writes go through pipeline
- Daily batch updates for prose/content (not continuous) - scheduled at 02:00 UTC
- ESI structured data updates run on independent schedule (hourly for static data, daily for dynamic)

---

## 3. Agent Specifications

### Agent A: Initial Wiki Construction

**Purpose:** Seed the wiki with existing content from source sites.

**Inputs:**
- Source URLs (EVE University wiki, WCKG, CCP Support, ESI API)
- Content scope (what categories/sections to import)
- Deduplication and merge strategy

**Process:**
1. **Source Extraction:**
   - **EVE University:** Utilize MediaWiki `api.php` for structured content retrieval (avoids HTML scraping issues).
   - **WCKG:** Specialized Google Sites parser for dynamic content rendering.
   - **CCP Support:** Content extraction with browser headers to bypass Cloudflare challenges.
2. Extract structured content (ships, modules, mechanics)
3. Normalize content format to Markdown
4. Extract all images and references
5. Compute content hash (SHA-256) for each page and skip if unchanged since last import
6. Create wiki pages with proper hierarchy (see [Page Hierarchy Strategy](docs/schema/hierarchy.md))
7. Tag pages with complete metadata
8. Generate cross-links between related pages
9. Pass all content to Validation Agent before publication

**Scheduling:** One-time run (with option to replay)

---

### Agent B: Patch Note Monitor

**Purpose:** Detect EVE Online patch changes and update affected wiki pages.

**Inputs:**
- RSS feed: `https://www.eveonline.com/rss/patch-notes` (Verified accessible via GET)
- Update frequency (recommended: daily)
- Affected page mapping (which pages relate to which game systems)

**Process:**
1. Poll RSS feed on schedule (Standard RSS 2.0 parsing)
2. Parse new patch entries using LLM content analysis
3. Identify *exact* content changes required for affected pages
4. Generate complete revised page content, not just append sections
5. Pass proposed changes to Validation Agent
6. If validation passes, apply update automatically
7. If validation fails, retry generation or flag for system alert

---

## 3.5 Infrastructure Protocols

### Git Sync Protocol
- **Single Source of Truth:** Git repository acts as the primary storage.
- **Bi-directional Sync:**
  - Agent Write -> Git Commit -> Wiki.js Push
  - Wiki.js renders directly from the Git-backed storage.
- **Repository Structure:**
  - `/content`: Markdown files mapping to wiki paths (e.g., `content/ships/frigates/condor.md`)
  - `/assets`: Images and files mapping to local paths.
- **Commit Format:** `[AGENT_ID] update: path/to/page (hash: abc123)`

### API Authentication
- **Strategy:** Bearer tokens with minimum scopes (`write:pages`, `write:assets`).
- **Storage:** Managed via environment variables.
- See [Wiki.js API Auth Strategy](docs/infrastructure/wikijs-auth.md) for details.

### Shared State
- **Schema:** Managed via LangGraph `WikiState` Pydantic model.
- See [State Schema Definition](docs/schema/state-schema.md) for details.

---

### Agent C: External Wiki Monitor

**Purpose:** Track changes on source wikis and refresh content.

**Inputs:**
- Monitored URLs and change detection rules
- Check frequency (recommended: weekly)

**Process:**
1. Poll source sites on schedule respecting robots.txt
2. Detect new pages or modified content
3. Compare against imported content hash in local wiki
4. Ignore minor formatting/link changes
5. Generate revised page content with merged changes
6. Pass proposed changes to Validation Agent
7. Apply update automatically on validation pass

---

### Agent D: ESI Data Collector

**Purpose:** Pull official structured data directly from CCP's ESI API.

**Inputs:**
- ESI API endpoints for ships, modules, items, skills
- Update frequency: hourly for static data (independent of daily batch), daily for dynamic data (part of 02:00 UTC batch)

**Process:**
1. Poll ESI API on schedule with proper rate limiting
2. Extract structured game data
3. Generate or update data-driven pages automatically
4. Merge structured data with human-readable content from other sources
5. Compute content hash and skip if unchanged since last poll
6. Pass proposed changes to Validation Agent

---

### Agent E: Content Validation & Review Agent

**Purpose:** Automated quality control for all content changes. **Replaces all human review.**

**Validation Rules:**
1. **Structural validation:** Markdown syntax, page hierarchy, metadata presence
2. **Content validation:** Factual consistency, no broken references, completeness
3. **Change validation:** Diff analysis, only expected changes applied, no unintended modifications
4. **Cross-reference validation:** All internal links resolve correctly
5. **TOS compliance:** Proper attribution included for all imported content

**Process:**
1. Receive proposed change from upstream agent
2. Run all validation checks
3. Generate confidence score (0-100%)
4. If score > 95%: Approve for publication
5. If score 70-95%: Request regeneration with feedback
6. If score <70%: Reject change and generate system alert

---

### Agent F: Numerical Validation Layer

**Purpose:** Rule-based validation for game data, separate from LLM validation. Catches LLM hallucinations in structured data.

**Validation Categories:**

| Data Type | Validation Rules | Source of Truth |
|-----------|------------------|-----------------|
| Ship stats | Base HP, velocity, slots, fitting stats within ±0% of ESI | ESI API |
| Module stats | CPU, PG, range, damage multipliers | ESI API |
| Skill requirements | Prerequisites match skill tree | ESI API |
| Fitting calculations | Must pass CPU/PG budget checks | Local calculation |
| Market data | Prices non-negative, volume non-negative | ESI API |
| Links/IDs | All typeIDs resolve to valid entities | ESI API lookup |

**Process:**
1. Extract all numerical values from proposed content
2. Cross-reference against ESI API for game data
3. Flag any discrepancies >0% for ship/module stats
4. Reject content with invalid typeIDs or broken references
5. Log all validation results for audit

**Override Rules:**
- Numerical validation failures = auto-reject (no LLM override possible)
- Historical content (archived ships/modules) flagged for manual review

---

### Agent G: Asset & Reference Handler

**Purpose:** Centralized management of all images, links, and external references.

**Process:**
1. Receive all extracted images and references from other agents
2. Download images to local storage (respecting copyright/attribution)
3. Rewrite all image URLs to local wiki paths
4. Rewrite all external links to reference original source
5. Add source attribution footer to all pages
6. Check for broken links on every update
7. Maintain asset integrity across all pages

---

## 4. Model Intelligence Tiers

Different agents require different levels of LLM reasoning. Using appropriate models reduces cost and improves reliability.

| Agent | Tier | Model Requirements | Justification |
|-------|------|--------------------|---------------|
| A: Source Harvester | Low | Basic extraction model (e.g., GPT-4o-mini, Claude Haiku) | Template-based extraction, structured output format |
| B: Patch Note Monitor | High | Strong reasoning model (e.g., Claude Sonnet, GPT-4o) | Requires understanding game mechanics to map changes to pages |
| C: External Wiki Monitor | Low | Basic extraction model | Simple change detection and content extraction |
| D: ESI Data Collector | None | No LLM needed | Pure API calls, structured data, programmatic transformation |
| E: Content Validation | Medium | Balanced model (e.g., Claude Sonnet) | Needs semantic understanding but structured validation rules |
| F: Numerical Validation | None | No LLM needed | Pure rule-based, deterministic validation |
| G: Asset Handler | Low | Basic model for categorization | Mostly file operations, minimal reasoning |

**Recommended Model Stack:**
- **High reasoning:** Claude 3.5 Sonnet / GPT-4o (Agents B, E)
- **Low cost:** Claude 3.5 Haiku / GPT-4o-mini (Agents A, C, G)
- **No LLM:** Agents D, F (programmatic only)

**Daily Batch Cost Estimate:**
With daily updates (not continuous), typical daily operations:
- ~10-20 patch note analyses (Agent B): ~$0.10-0.30
- ~50-100 content validations (Agent E): ~$0.50-1.00
- ~100-200 extractions (Agents A, C, G): ~$0.10-0.20
- **Daily total: ~$0.70-1.50** (note: subscription tiers have rate limits and token caps; bulk operations like initial import may temporarily exceed these, requiring pay-per-use fallback)

---

## 5. Content Schema Per Page Type

Each page type has a template with required fields that Agent E validates structurally before semantic validation.

### Ship Page Template
```yaml
page_type: ship
required_fields:
  - name: string
  - type_id: integer (ESI)
  - group: string (e.g., "Interceptor", "Battleship")
  - race: string (Caldari, Minmatar, Amarr, Gallente)
  - hull_stats:
      hp_shield: integer
      hp_armor: integer
      hp_structure: integer
  - fitting_stats:
      cpu_output: integer
      powergrid_output: integer
      high_slots: integer
      med_slots: integer
      low_slots: integer
      rig_slots: integer
  - velocity: integer
  - skill_requirements: list[{skill_id: integer, level: integer}]
  - description: string (prose, sourced from external wiki or generated)
optional_fields:
  - role_bonus: string
  - ship_bonus: list[string]
  - capacitor_capacity: integer
  - targeting_range: integer
  - drone_bandwidth: integer
  - probe_launcher_fitting: boolean
```

### Module Page Template
```yaml
page_type: module
required_fields:
  - name: string
  - type_id: integer (ESI)
  - group: string (e.g., "Shield Booster", "Afterburner")
  - slot: string (high, mid, low, rig)
  - cpu_usage: integer
  - powergrid_usage: integer
  - description: string
optional_fields:
  - duration: integer
  - range: integer
  - damage_multiplier: float
  - skill_requirements: list[{skill_id: integer, level: integer}]
  - meta_level: integer
  - tech_level: integer (1 or 2)
```

### Mechanic/Guide Page Template
```yaml
page_type: mechanic
required_fields:
  - title: string
  - summary: string (1-3 sentences)
  - categories: list[string]
  - source: string (eve-university | wckg | ccp | generated)
  - last_reviewed: date
optional_fields:
  - related_ships: list[string]
  - related_modules: list[string]
  - related_mechanics: list[string]
  - difficulty: string (beginner | intermediate | advanced)
```

### Validation Against Schema
Agent E enforces:
1. All `required_fields` present and non-empty
2. All integer fields contain valid integers (no strings, no nulls)
3. All `type_id` fields pass Agent F numerical validation against ESI
4. All `skill_requirements` reference valid typeIDs
5. Page type matches one of the defined templates (reject unknown types)

---

## 6. Agent Health Monitoring

LangGraph provides built-in checkpointing for state persistence, but agent-level health monitoring requires a separate heartbeat system.

**Heartbeat Protocol:**
- Each LangGraph node (agent) sends a `HEARTBEAT` message to Redis every 60 seconds during active operation, every 5 minutes when idle
- Heartbeat payload: `{ node_name, status: healthy|degraded|error, thread_id, last_completed_at, checkpoint_id }`
- Heartbeat registry uses Redis with TTL (3x interval for stale, 10x for dead)

**LangGraph Checkpointing:**
- LangGraph's `MemorySaver` or `PostgresSaver` persists graph state at each step
- Workflows resume exactly where they left off after crashes
- Checkpoint TTL configurable (24-48 hours for batch workflows, session-based for conversational)

**Staleness Detection:**
- If no heartbeat received within 3x the expected interval → mark agent as `stale`
- If no heartbeat received within 10x the expected interval → mark agent as `dead` and trigger critical alert
- Stale nodes: LangGraph checkpoint indicates last state, new invocations wait for recovery
- Dead nodes: halt dependent pipeline stages, escalate alert

**LangSmith Integration:**
- Every LLM call, tool invocation, and state transition emits traces to LangSmith
- QueryLangSmith audit logs for execution history, latency, token usage
- Alerts configured via LangSmith webhooks for validation failures

**Alerting:**
- Agent status transitions emit events to the audit log
- Critical alerts (dead node, repeated validation failures, checkpoint gaps > threshold) notify via configured channel (webhook, email, etc.)

---

## 7. Standard Page Metadata

All pages will include standard frontmatter:
```yaml
source: eve-university | wckg | ccp | esi | generated
source_url: https://...
imported_date: 2026-04-16
last_updated: 2026-04-16
last_validated: 2026-04-16
update_frequency: daily | weekly | monthly
validation_score: 98
categories: [ships, pvp, modules, industry]
```

---

## 8. Implementation Phases

### Phase 0: Pre-Work & Compliance
- Confirm scraping TOS with source wiki maintainers
- Implement rate limiting and proper User-Agent headers
- Define metadata schema and validation rules
- Test content extraction on sample pages

### Phase 1: Foundation
- Deploy PostgreSQL database via Docker (production configuration)
- Deploy Redis instance for LangGraph checkpointing + heartbeat registry
- Deploy Wiki.js via Docker (connected to PostgreSQL)
- **Disable all human write permissions** - configure API-only write access
- Configure Git storage backend for complete change history
- Configure Git sync layer as single source of truth
- Set up HTTPS and domain routing
- Establish automated backup strategy
- Deploy LangGraph with `StateGraph` defining all agent nodes and edges
- Configure LangSmith for observability (tracing, audit logs)
- Deploy agent heartbeat monitoring (Redis TTL registry)

### Phase 2: Content Pipeline
- Deploy Source Harvester Agent
- Deploy Validation Agent
- Deploy Asset Handler Agent
- Deploy ESI Data Collector Agent
- Execute initial import with full validation pipeline
- Establish content quality baseline

### Phase 2.5: Smoke Test
- Run Agent A on 50 representative pages across all page types (ships, modules, mechanics, guides)
- Pass all 50 pages through the full validation pipeline (Agents E + F + G)
- Calibrate validation thresholds based on results (adjust confidence scoring weights)
- Verify merge logic when ESI data and external wiki content overlap on same pages
- Confirm Git sync round-trip: write → Git → Wiki.js render matches expected output
- Identify and fix integration bugs before full import
- Document baseline validation pass rate and failure patterns

### Phase 3: Automated Monitoring
- Deploy Patch Note Monitor Agent
- Implement LLM-based patch parsing and content generation
- Configure validation thresholds
- Test end-to-end update workflow

### Phase 4: External Change Tracking
- Deploy External Wiki Monitor Agent
- Configure source site monitoring
- Implement change detection and merge logic
- Set up system alerting for failures

### Phase 5: Major Expansion Handling
- Create expansion detection webhook (CCP announces expansions 2-4 weeks ahead)
- Build bulk update workflow for expansion releases
- Implement "freeze" mode during expansion deployment (content locked until ESI stabilizes)
- Create post-expansion audit job to verify all affected pages
- Document expansion runbook for manual triggering

**Expansion Workflow:**
1. Expansion announced → Create tracking ticket
2. Expansion deploys → Freeze wiki updates, wait for ESI stability (typically 24-48h)
3. Run bulk ESI sync → Update all ship/module/item pages
4. Run Patch Note Agent → Process expansion notes, generate new pages
5. Run full validation → All pages validated against new ESI data
6. Unfreeze → Resume daily batch updates

---

## 9. Validation Questions

### Wiki Infrastructure
468: 1. **Hosting requirements:** What server/container host will run this? (RAM/CPU allocation)
469: 2. **Access & secrets management:** Plan for storing ESI credentials, Git credentials, and Wiki.js API tokens in a secrets manager (e.g., Vault, AWS Secrets Manager).
470: 3. **Backup requirements:** How many days of backup retention are required?
471: 4. **User access:** Will this wiki be public read-only, or require authentication?
472: 5. **Storage:** How much content do you anticipate? (affects storage planning)

### Content Scope
475: 6. **Priority domains:** Should we prioritize specific game aspects? (PVP, mining, industry, nullsec, etc.)
476: 7. **Content age:** Should imported content include historical versions, or only current state?
477: 8. **Completeness threshold:** What's an acceptable import percentage? (80% of pages vs. all)

### Agent Behavior
480: 9. **Validation threshold:** What minimum validation score should be required for auto-approval? (Recommended: 95%)
481: 10. **Conflict resolution:** If multiple sources have conflicting information, which source takes priority?
482: 11. **Update frequency:** How fresh should content be? (real-time, daily, weekly)
483: 12. **Alerting:** How should the system notify on validation failures or errors?

### Operational
486: 13. **Monitoring access:** Do you have access to the Nginx Proxy Manager instance for SSL/proxy configuration?
487: 14. **Container management:** Will you use Komodo or another container management platform, or manual Docker?
488: 15. **Error handling:** Should the system pause and alert on repeated failures, or continue with skipped items?

### 10. Next Steps

Once questions are answered, I can:
1. Provide detailed Docker Compose configuration for Wiki.js with read-only UI and secrets integration
2. Design the LangGraph StateGraph specification (node definitions, edge conditions, state schema)
3. Define the patch-note-to-wiki mapping schema
4. Create the content import runbook for Agent A
5. Implement the standard metadata schema and validation rules
6. Configure LangSmith dashboards for wiki content monitoring