zerogravity/docs/architecture.md

# Architecture

## System Overview

```mermaid
flowchart LR
    Client["Client\n(curl, SDK, etc.)"]
    Proxy["Proxy\n:8741"]
    LS["Standalone LS\n:random"]
    MITM["MITM Proxy\n:8742"]
    Google["Google API\ndaily-cloudcode-pa\n.googleapis.com"]

    Client -- "OpenAI / Gemini\nHTTP API" --> Proxy
    Proxy -- "gRPC\n(protobuf)" --> LS
    LS -- "HTTPS :443\n(iptables redirect)" --> MITM
    MITM -- "TLS\n(BoringSSL)" --> Google

    style Proxy fill:#7c3aed,color:#fff
    style MITM fill:#dc2626,color:#fff
    style LS fill:#2563eb,color:#fff
    style Google fill:#059669,color:#fff
```

The proxy translates OpenAI/Gemini API requests into gRPC calls to a standalone Language Server (LS) binary. A MITM proxy sits between the LS and Google's API to intercept traffic, inject tools/params, and capture real token usage.

---

## Request Lifecycle

```mermaid
sequenceDiagram
    participant C as Client
    participant P as Proxy
    participant S as MitmStore
    participant LS as Standalone LS
    participant M as MITM Proxy
    participant G as Google API

    C->>P: POST /v1/chat/completions
    P->>P: Parse request, resolve model
    P->>S: register_request(cascade_id, tools, params, image)
    P->>LS: SendMessage(cascade_id, ".")
    Note over P: Waits on MITM channel

    LS->>M: HTTPS POST streamGenerateContent
    M->>S: take_request(cascade_id)
    M->>M: modify_request(inject tools, params, user text)
    M->>G: Forward modified request
    G-->>M: SSE stream (text deltas + usage)
    M->>S: dispatch TextDelta, Usage events
    M-->>LS: Forward (original) response

    S-->>P: MitmEvent::TextDelta
    S-->>P: MitmEvent::Usage
    S-->>P: MitmEvent::ResponseComplete
    P-->>C: OpenAI-format JSON/SSE response
```

---

## Module Map

```mermaid
graph TD
    subgraph "API Layer"
        mod_api["api/mod.rs\n(router)"]
        completions["completions.rs"]
        responses["responses.rs"]
        gemini["gemini.rs"]
        search["search.rs"]
        models["models.rs"]
        types["types.rs"]
        util["util.rs"]
        polling["polling.rs"]
    end

    subgraph "MITM Layer"
        proxy_mitm["proxy.rs\n(TLS termination)"]
        h2["h2_handler.rs\n(HTTP/2 framing)"]
        intercept["intercept.rs\n(SSE parsing)"]
        modify["modify.rs\n(request injection)"]
        store["store.rs\n(MitmStore)"]
        proto_mitm["proto.rs\n(protobuf codec)"]
        ca["ca.rs\n(cert generation)"]
    end

    subgraph "Core"
        main["main.rs"]
        backend["backend.rs\n(gRPC client)"]
        session["session.rs"]
        trace["trace.rs"]
        warmup["warmup.rs"]
        constants["constants.rs"]
        quota["quota.rs"]
    end

    subgraph "Standalone LS"
        spawn["spawn.rs"]
        discovery["discovery.rs"]
        stub["stub.rs\n(extension server)"]
    end

    subgraph "Protobuf"
        proto_mod["proto/mod.rs"]
        wire["proto/wire.rs"]
    end

    main --> mod_api
    main --> backend
    main --> store
    main --> spawn
    mod_api --> completions & responses & gemini & search
    completions & responses & gemini --> store
    completions & responses & gemini --> backend
    store --> intercept
    proxy_mitm --> h2 --> intercept & modify
    modify --> store
    intercept --> store
    spawn --> discovery & stub
    backend --> proto_mod --> wire

    style store fill:#dc2626,color:#fff
    style mod_api fill:#7c3aed,color:#fff
    style proxy_mitm fill:#ea580c,color:#fff
    style main fill:#0d9488,color:#fff
```

---

## Endpoints

| Method     | Path                   | Handler                           | Description                             |
| ---------- | ---------------------- | --------------------------------- | --------------------------------------- |
| `POST`     | `/v1/responses`        | `responses::handle_responses`     | OpenAI Responses API (streaming + sync) |
| `POST`     | `/v1/chat/completions` | `completions::handle_completions` | OpenAI Chat Completions API             |
| `POST`     | `/v1/gemini`           | `gemini::handle_gemini`           | Custom Gemini endpoint                  |
| `POST`     | `/v1beta/{*path}`      | `gemini::handle_gemini_v1beta`    | Official Gemini v1beta routes           |
| `GET/POST` | `/v1/search`           | `search::handle_search_*`         | Web search via Google grounding         |
| `GET`      | `/v1/models`           | `handle_models`                   | List available models                   |
| `GET`      | `/v1/sessions`         | `handle_list_sessions`            | List active sessions                    |
| `DELETE`   | `/v1/sessions/{id}`    | `handle_delete_session`           | Delete a session                        |
| `POST`     | `/v1/token`            | `handle_set_token`                | Set OAuth token at runtime              |
| `GET`      | `/v1/usage`            | `handle_usage`                    | MITM-intercepted token usage            |
| `GET`      | `/v1/quota`            | `handle_quota`                    | LS quota (credits, rate limits)         |
| `GET`      | `/health`              | `handle_health`                   | Health check                            |

---

## MITM Event Flow

```mermaid
stateDiagram-v2
    [*] --> Registered: register_request()

    Registered --> GateWait: LS sends HTTPS request
    GateWait --> Matched: MITM matches cascade_id

    Matched --> Modifying: modify_request()
    Modifying --> Streaming: Forward to Google

    Streaming --> Streaming: TextDelta / ThinkingDelta
    Streaming --> UsageCaptured: Usage event
    UsageCaptured --> Complete: ResponseComplete
    Streaming --> Error: UpstreamError
    Streaming --> FnCall: FunctionCall

    Complete --> [*]
    Error --> [*]
    FnCall --> Registered: Tool round (re-register)
```

---

## CLI Flags

| Flag                 | Default | Description                                               |
| -------------------- | ------- | --------------------------------------------------------- |
| `--port <PORT>`      | `8741`  | Proxy listen port                                         |
| `--headless`         | `true`  | Fully standalone — no running Antigravity app needed      |
| `--classic`          | `false` | Attach to running Antigravity (alias for `--no-headless`) |
| `--no-mitm`          | `false` | Disable MITM proxy entirely                               |
| `--mitm-port <PORT>` | `8742`  | MITM proxy port                                           |
| `--no-standalone`    | `false` | Attach to real LS instead of spawning standalone          |
| `--no-trace`         | `false` | Disable per-call debug traces                             |
| `-v, --verbose`      | `false` | Info-level logging                                        |
| `-d, --debug`        | `false` | Debug-level logging                                       |

---

## Source Files

| File                      | Lines | Purpose                                                    |
| ------------------------- | ----: | ---------------------------------------------------------- |
| `api/responses.rs`        |  1796 | Responses API handler (sync, streaming, multi-turn, tools) |
| `mitm/modify.rs`          |  1418 | Request modification (tool/image/param injection)          |
| `api/completions.rs`      |  1241 | Chat Completions handler (OpenAI compat)                   |
| `mitm/proxy.rs`           |  1165 | TLS-terminating MITM proxy                                 |
| `api/gemini.rs`           |  1055 | Gemini API handler (native format)                         |
| `snapshot.rs`             |   695 | State snapshots                                            |
| `backend.rs`              |   660 | gRPC client to LS                                          |
| `mitm/store.rs`           |   651 | Central state store + event channels                       |
| `mitm/proto.rs`           |   649 | Protobuf encode/decode for MITM                            |
| `mitm/intercept.rs`       |   640 | SSE response parser + usage extraction                     |
| `main.rs`                 |   527 | CLI, startup, wiring                                       |
| `trace.rs`                |   509 | Per-call debug trace system                                |
| `mitm/h2_handler.rs`      |   477 | HTTP/2 frame handling                                      |
| `standalone/spawn.rs`     |   464 | LS process spawning                                        |
| `api/search.rs`           |   443 | Web search endpoint                                        |
| `api/types.rs`            |   416 | Shared request/response types                              |
| `standalone/discovery.rs` |   340 | LS config discovery from `/proc`                           |
| `proto/mod.rs`            |   340 | Hand-rolled protobuf encoder                               |
| `api/polling.rs`          |   340 | Cascade polling fallback                                   |
| `standalone/stub.rs`      |  ~300 | Extension server gRPC stub                                 |
| `proto/wire.rs`           |  ~200 | Wire-format protobuf helpers                               |
| `constants.rs`            |  ~100 | Model IDs, service names                                   |

---

## Models

| Proxy Name          | LS Placeholder          | Description                              |
| ------------------- | ----------------------- | ---------------------------------------- |
| `opus-4.6`          | `MODEL_PLACEHOLDER_M26` | Claude Opus 4.6 (Thinking) — **default** |
| `opus-4.5`          | `MODEL_PLACEHOLDER_M12` | Claude Opus 4.5 (Thinking)               |
| `gemini-3-pro-high` | `MODEL_PLACEHOLDER_M8`  | Gemini 3 Pro (High quality)              |
| `gemini-3-pro`      | `MODEL_PLACEHOLDER_M7`  | Gemini 3 Pro (Low quality)               |
| `gemini-3-flash`    | `MODEL_PLACEHOLDER_M18` | Gemini 3 Flash                           |

---

## Stealth Features

| Feature            | Implementation                                                  |
| ------------------ | --------------------------------------------------------------- |
| TLS fingerprint    | BoringSSL via `wreq` — Chrome JA3/JA4 + H2 fingerprint          |
| Protobuf           | Hand-rolled encoder producing byte-exact match to real webview  |
| Warmup             | Mimics real webview startup RPC sequence                        |
| Heartbeat          | Periodic keep-alive matching real webview lifecycle             |
| Reactive streaming | `StreamCascadeReactiveUpdates` for real-time state diffs        |
| Jitter             | Randomized intervals on warmup/heartbeat                        |
| Session reuse      | Cascades reused for multi-turn (matches real webview)           |
| Version detection  | Auto-detects Chrome/Electron/app versions from installed binary |