Files
zerogravity/docs/architecture.md
Nikketryhard 3d87c04d20 docs: overhaul docs, add architecture and traces, update README/GEMINI
- Add docs/architecture.md with 4 mermaid diagrams
- Add docs/mitm.md with 3 mermaid diagrams (replaces mitm-interception-status)
- Add docs/traces.md documenting per-call trace system
- Rewrite README.md to be concise with mermaid and doc refs
- Rewrite GEMINI.md for core philosophy and agent usage
- Clean extension-server-analysis.md (remove stale debug sections)
- Delete temp docs: standalone-ls-todo, panel-stream-investigation,
  endpoint-gap-analysis, request-comparison
2026-02-18 01:31:18 -06:00

243 lines
11 KiB
Markdown

# Architecture
## System Overview
```mermaid
flowchart LR
Client["Client\n(curl, SDK, etc.)"]
Proxy["Proxy\n:8741"]
LS["Standalone LS\n:random"]
MITM["MITM Proxy\n:8742"]
Google["Google API\ndaily-cloudcode-pa\n.googleapis.com"]
Client -- "OpenAI / Gemini\nHTTP API" --> Proxy
Proxy -- "gRPC\n(protobuf)" --> LS
LS -- "HTTPS :443\n(iptables redirect)" --> MITM
MITM -- "TLS\n(BoringSSL)" --> Google
style Proxy fill:#7c3aed,color:#fff
style MITM fill:#dc2626,color:#fff
style LS fill:#2563eb,color:#fff
style Google fill:#059669,color:#fff
```
The proxy translates OpenAI/Gemini API requests into gRPC calls to a standalone Language Server (LS) binary. A MITM proxy sits between the LS and Google's API to intercept traffic, inject tools/params, and capture real token usage.
---
## Request Lifecycle
```mermaid
sequenceDiagram
participant C as Client
participant P as Proxy
participant S as MitmStore
participant LS as Standalone LS
participant M as MITM Proxy
participant G as Google API
C->>P: POST /v1/chat/completions
P->>P: Parse request, resolve model
P->>S: register_request(cascade_id, tools, params, image)
P->>LS: SendMessage(cascade_id, ".")
Note over P: Waits on MITM channel
LS->>M: HTTPS POST streamGenerateContent
M->>S: take_request(cascade_id)
M->>M: modify_request(inject tools, params, user text)
M->>G: Forward modified request
G-->>M: SSE stream (text deltas + usage)
M->>S: dispatch TextDelta, Usage events
M-->>LS: Forward (original) response
S-->>P: MitmEvent::TextDelta
S-->>P: MitmEvent::Usage
S-->>P: MitmEvent::ResponseComplete
P-->>C: OpenAI-format JSON/SSE response
```
---
## Module Map
```mermaid
graph TD
subgraph "API Layer"
mod_api["api/mod.rs\n(router)"]
completions["completions.rs"]
responses["responses.rs"]
gemini["gemini.rs"]
search["search.rs"]
models["models.rs"]
types["types.rs"]
util["util.rs"]
polling["polling.rs"]
end
subgraph "MITM Layer"
proxy_mitm["proxy.rs\n(TLS termination)"]
h2["h2_handler.rs\n(HTTP/2 framing)"]
intercept["intercept.rs\n(SSE parsing)"]
modify["modify.rs\n(request injection)"]
store["store.rs\n(MitmStore)"]
proto_mitm["proto.rs\n(protobuf codec)"]
ca["ca.rs\n(cert generation)"]
end
subgraph "Core"
main["main.rs"]
backend["backend.rs\n(gRPC client)"]
session["session.rs"]
trace["trace.rs"]
warmup["warmup.rs"]
constants["constants.rs"]
quota["quota.rs"]
end
subgraph "Standalone LS"
spawn["spawn.rs"]
discovery["discovery.rs"]
stub["stub.rs\n(extension server)"]
end
subgraph "Protobuf"
proto_mod["proto/mod.rs"]
wire["proto/wire.rs"]
end
main --> mod_api
main --> backend
main --> store
main --> spawn
mod_api --> completions & responses & gemini & search
completions & responses & gemini --> store
completions & responses & gemini --> backend
store --> intercept
proxy_mitm --> h2 --> intercept & modify
modify --> store
intercept --> store
spawn --> discovery & stub
backend --> proto_mod --> wire
style store fill:#dc2626,color:#fff
style mod_api fill:#7c3aed,color:#fff
style proxy_mitm fill:#ea580c,color:#fff
style main fill:#0d9488,color:#fff
```
---
## Endpoints
| Method | Path | Handler | Description |
| ---------- | ---------------------- | --------------------------------- | --------------------------------------- |
| `POST` | `/v1/responses` | `responses::handle_responses` | OpenAI Responses API (streaming + sync) |
| `POST` | `/v1/chat/completions` | `completions::handle_completions` | OpenAI Chat Completions API |
| `POST` | `/v1/gemini` | `gemini::handle_gemini` | Custom Gemini endpoint |
| `POST` | `/v1beta/{*path}` | `gemini::handle_gemini_v1beta` | Official Gemini v1beta routes |
| `GET/POST` | `/v1/search` | `search::handle_search_*` | Web search via Google grounding |
| `GET` | `/v1/models` | `handle_models` | List available models |
| `GET` | `/v1/sessions` | `handle_list_sessions` | List active sessions |
| `DELETE` | `/v1/sessions/{id}` | `handle_delete_session` | Delete a session |
| `POST` | `/v1/token` | `handle_set_token` | Set OAuth token at runtime |
| `GET` | `/v1/usage` | `handle_usage` | MITM-intercepted token usage |
| `GET` | `/v1/quota` | `handle_quota` | LS quota (credits, rate limits) |
| `GET` | `/health` | `handle_health` | Health check |
---
## MITM Event Flow
```mermaid
stateDiagram-v2
[*] --> Registered: register_request()
Registered --> GateWait: LS sends HTTPS request
GateWait --> Matched: MITM matches cascade_id
Matched --> Modifying: modify_request()
Modifying --> Streaming: Forward to Google
Streaming --> Streaming: TextDelta / ThinkingDelta
Streaming --> UsageCaptured: Usage event
UsageCaptured --> Complete: ResponseComplete
Streaming --> Error: UpstreamError
Streaming --> FnCall: FunctionCall
Complete --> [*]
Error --> [*]
FnCall --> Registered: Tool round (re-register)
```
---
## CLI Flags
| Flag | Default | Description |
| -------------------- | ------- | --------------------------------------------------------- |
| `--port <PORT>` | `8741` | Proxy listen port |
| `--headless` | `true` | Fully standalone — no running Antigravity app needed |
| `--classic` | `false` | Attach to running Antigravity (alias for `--no-headless`) |
| `--no-mitm` | `false` | Disable MITM proxy entirely |
| `--mitm-port <PORT>` | `8742` | MITM proxy port |
| `--no-standalone` | `false` | Attach to real LS instead of spawning standalone |
| `--no-trace` | `false` | Disable per-call debug traces |
| `-v, --verbose` | `false` | Info-level logging |
| `-d, --debug` | `false` | Debug-level logging |
---
## Source Files
| File | Lines | Purpose |
| ------------------------- | ----: | ---------------------------------------------------------- |
| `api/responses.rs` | 1796 | Responses API handler (sync, streaming, multi-turn, tools) |
| `mitm/modify.rs` | 1418 | Request modification (tool/image/param injection) |
| `api/completions.rs` | 1241 | Chat Completions handler (OpenAI compat) |
| `mitm/proxy.rs` | 1165 | TLS-terminating MITM proxy |
| `api/gemini.rs` | 1055 | Gemini API handler (native format) |
| `snapshot.rs` | 695 | State snapshots |
| `backend.rs` | 660 | gRPC client to LS |
| `mitm/store.rs` | 651 | Central state store + event channels |
| `mitm/proto.rs` | 649 | Protobuf encode/decode for MITM |
| `mitm/intercept.rs` | 640 | SSE response parser + usage extraction |
| `main.rs` | 527 | CLI, startup, wiring |
| `trace.rs` | 509 | Per-call debug trace system |
| `mitm/h2_handler.rs` | 477 | HTTP/2 frame handling |
| `standalone/spawn.rs` | 464 | LS process spawning |
| `api/search.rs` | 443 | Web search endpoint |
| `api/types.rs` | 416 | Shared request/response types |
| `standalone/discovery.rs` | 340 | LS config discovery from `/proc` |
| `proto/mod.rs` | 340 | Hand-rolled protobuf encoder |
| `api/polling.rs` | 340 | Cascade polling fallback |
| `standalone/stub.rs` | ~300 | Extension server gRPC stub |
| `proto/wire.rs` | ~200 | Wire-format protobuf helpers |
| `constants.rs` | ~100 | Model IDs, service names |
---
## Models
| Proxy Name | LS Placeholder | Description |
| ------------------- | ----------------------- | ---------------------------------------- |
| `opus-4.6` | `MODEL_PLACEHOLDER_M26` | Claude Opus 4.6 (Thinking) — **default** |
| `opus-4.5` | `MODEL_PLACEHOLDER_M12` | Claude Opus 4.5 (Thinking) |
| `gemini-3-pro-high` | `MODEL_PLACEHOLDER_M8` | Gemini 3 Pro (High quality) |
| `gemini-3-pro` | `MODEL_PLACEHOLDER_M7` | Gemini 3 Pro (Low quality) |
| `gemini-3-flash` | `MODEL_PLACEHOLDER_M18` | Gemini 3 Flash |
---
## Stealth Features
| Feature | Implementation |
| ------------------ | --------------------------------------------------------------- |
| TLS fingerprint | BoringSSL via `wreq` — Chrome JA3/JA4 + H2 fingerprint |
| Protobuf | Hand-rolled encoder producing byte-exact match to real webview |
| Warmup | Mimics real webview startup RPC sequence |
| Heartbeat | Periodic keep-alive matching real webview lifecycle |
| Reactive streaming | `StreamCascadeReactiveUpdates` for real-time state diffs |
| Jitter | Randomized intervals on warmup/heartbeat |
| Session reuse | Cascades reused for multi-turn (matches real webview) |
| Version detection | Auto-detects Chrome/Electron/app versions from installed binary |