docs: overhaul docs, add architecture and traces, update README/GEMINI

- Add docs/architecture.md with 4 mermaid diagrams - Add docs/mitm.md with 3 mermaid diagrams (replaces mitm-interception-status) - Add docs/traces.md documenting per-call trace system - Rewrite README.md to be concise with mermaid and doc refs - Rewrite GEMINI.md for core philosophy and agent usage - Clean extension-server-analysis.md (remove stale debug sections) - Delete temp docs: standalone-ls-todo, panel-stream-investigation, endpoint-gap-analysis, request-comparison
2026-02-18 01:31:18 -06:00
parent 28d3296c87
commit 3d87c04d20
11 changed files with 679 additions and 1305 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,242 @@
+# Architecture
+
+## System Overview
+
+```mermaid
+flowchart LR
+    Client["Client\n(curl, SDK, etc.)"]
+    Proxy["Proxy\n:8741"]
+    LS["Standalone LS\n:random"]
+    MITM["MITM Proxy\n:8742"]
+    Google["Google API\ndaily-cloudcode-pa\n.googleapis.com"]
+
+    Client -- "OpenAI / Gemini\nHTTP API" --> Proxy
+    Proxy -- "gRPC\n(protobuf)" --> LS
+    LS -- "HTTPS :443\n(iptables redirect)" --> MITM
+    MITM -- "TLS\n(BoringSSL)" --> Google
+
+    style Proxy fill:#7c3aed,color:#fff
+    style MITM fill:#dc2626,color:#fff
+    style LS fill:#2563eb,color:#fff
+    style Google fill:#059669,color:#fff
+```
+
+The proxy translates OpenAI/Gemini API requests into gRPC calls to a standalone Language Server (LS) binary. A MITM proxy sits between the LS and Google's API to intercept traffic, inject tools/params, and capture real token usage.
+
+---
+
+## Request Lifecycle
+
+```mermaid
+sequenceDiagram
+    participant C as Client
+    participant P as Proxy
+    participant S as MitmStore
+    participant LS as Standalone LS
+    participant M as MITM Proxy
+    participant G as Google API
+
+    C->>P: POST /v1/chat/completions
+    P->>P: Parse request, resolve model
+    P->>S: register_request(cascade_id, tools, params, image)
+    P->>LS: SendMessage(cascade_id, ".")
+    Note over P: Waits on MITM channel
+
+    LS->>M: HTTPS POST streamGenerateContent
+    M->>S: take_request(cascade_id)
+    M->>M: modify_request(inject tools, params, user text)
+    M->>G: Forward modified request
+    G-->>M: SSE stream (text deltas + usage)
+    M->>S: dispatch TextDelta, Usage events
+    M-->>LS: Forward (original) response
+
+    S-->>P: MitmEvent::TextDelta
+    S-->>P: MitmEvent::Usage
+    S-->>P: MitmEvent::ResponseComplete
+    P-->>C: OpenAI-format JSON/SSE response
+```
+
+---
+
+## Module Map
+
+```mermaid
+graph TD
+    subgraph "API Layer"
+        mod_api["api/mod.rs\n(router)"]
+        completions["completions.rs"]
+        responses["responses.rs"]
+        gemini["gemini.rs"]
+        search["search.rs"]
+        models["models.rs"]
+        types["types.rs"]
+        util["util.rs"]
+        polling["polling.rs"]
+    end
+
+    subgraph "MITM Layer"
+        proxy_mitm["proxy.rs\n(TLS termination)"]
+        h2["h2_handler.rs\n(HTTP/2 framing)"]
+        intercept["intercept.rs\n(SSE parsing)"]
+        modify["modify.rs\n(request injection)"]
+        store["store.rs\n(MitmStore)"]
+        proto_mitm["proto.rs\n(protobuf codec)"]
+        ca["ca.rs\n(cert generation)"]
+    end
+
+    subgraph "Core"
+        main["main.rs"]
+        backend["backend.rs\n(gRPC client)"]
+        session["session.rs"]
+        trace["trace.rs"]
+        warmup["warmup.rs"]
+        constants["constants.rs"]
+        quota["quota.rs"]
+    end
+
+    subgraph "Standalone LS"
+        spawn["spawn.rs"]
+        discovery["discovery.rs"]
+        stub["stub.rs\n(extension server)"]
+    end
+
+    subgraph "Protobuf"
+        proto_mod["proto/mod.rs"]
+        wire["proto/wire.rs"]
+    end
+
+    main --> mod_api
+    main --> backend
+    main --> store
+    main --> spawn
+    mod_api --> completions & responses & gemini & search
+    completions & responses & gemini --> store
+    completions & responses & gemini --> backend
+    store --> intercept
+    proxy_mitm --> h2 --> intercept & modify
+    modify --> store
+    intercept --> store
+    spawn --> discovery & stub
+    backend --> proto_mod --> wire
+
+    style store fill:#dc2626,color:#fff
+    style mod_api fill:#7c3aed,color:#fff
+    style proxy_mitm fill:#ea580c,color:#fff
+    style main fill:#0d9488,color:#fff
+```
+
+---
+
+## Endpoints
+
+| Method     | Path                   | Handler                           | Description                             |
+| ---------- | ---------------------- | --------------------------------- | --------------------------------------- |
+| `POST`     | `/v1/responses`        | `responses::handle_responses`     | OpenAI Responses API (streaming + sync) |
+| `POST`     | `/v1/chat/completions` | `completions::handle_completions` | OpenAI Chat Completions API             |
+| `POST`     | `/v1/gemini`           | `gemini::handle_gemini`           | Custom Gemini endpoint                  |
+| `POST`     | `/v1beta/{*path}`      | `gemini::handle_gemini_v1beta`    | Official Gemini v1beta routes           |
+| `GET/POST` | `/v1/search`           | `search::handle_search_*`         | Web search via Google grounding         |
+| `GET`      | `/v1/models`           | `handle_models`                   | List available models                   |
+| `GET`      | `/v1/sessions`         | `handle_list_sessions`            | List active sessions                    |
+| `DELETE`   | `/v1/sessions/{id}`    | `handle_delete_session`           | Delete a session                        |
+| `POST`     | `/v1/token`            | `handle_set_token`                | Set OAuth token at runtime              |
+| `GET`      | `/v1/usage`            | `handle_usage`                    | MITM-intercepted token usage            |
+| `GET`      | `/v1/quota`            | `handle_quota`                    | LS quota (credits, rate limits)         |
+| `GET`      | `/health`              | `handle_health`                   | Health check                            |
+
+---
+
+## MITM Event Flow
+
+```mermaid
+stateDiagram-v2
+    [*] --> Registered: register_request()
+
+    Registered --> GateWait: LS sends HTTPS request
+    GateWait --> Matched: MITM matches cascade_id
+
+    Matched --> Modifying: modify_request()
+    Modifying --> Streaming: Forward to Google
+
+    Streaming --> Streaming: TextDelta / ThinkingDelta
+    Streaming --> UsageCaptured: Usage event
+    UsageCaptured --> Complete: ResponseComplete
+    Streaming --> Error: UpstreamError
+    Streaming --> FnCall: FunctionCall
+
+    Complete --> [*]
+    Error --> [*]
+    FnCall --> Registered: Tool round (re-register)
+```
+
+---
+
+## CLI Flags
+
+| Flag                 | Default | Description                                               |
+| -------------------- | ------- | --------------------------------------------------------- |
+| `--port <PORT>`      | `8741`  | Proxy listen port                                         |
+| `--headless`         | `true`  | Fully standalone — no running Antigravity app needed      |
+| `--classic`          | `false` | Attach to running Antigravity (alias for `--no-headless`) |
+| `--no-mitm`          | `false` | Disable MITM proxy entirely                               |
+| `--mitm-port <PORT>` | `8742`  | MITM proxy port                                           |
+| `--no-standalone`    | `false` | Attach to real LS instead of spawning standalone          |
+| `--no-trace`         | `false` | Disable per-call debug traces                             |
+| `-v, --verbose`      | `false` | Info-level logging                                        |
+| `-d, --debug`        | `false` | Debug-level logging                                       |
+
+---
+
+## Source Files
+
+| File                      | Lines | Purpose                                                    |
+| ------------------------- | ----: | ---------------------------------------------------------- |
+| `api/responses.rs`        |  1796 | Responses API handler (sync, streaming, multi-turn, tools) |
+| `mitm/modify.rs`          |  1418 | Request modification (tool/image/param injection)          |
+| `api/completions.rs`      |  1241 | Chat Completions handler (OpenAI compat)                   |
+| `mitm/proxy.rs`           |  1165 | TLS-terminating MITM proxy                                 |
+| `api/gemini.rs`           |  1055 | Gemini API handler (native format)                         |
+| `snapshot.rs`             |   695 | State snapshots                                            |
+| `backend.rs`              |   660 | gRPC client to LS                                          |
+| `mitm/store.rs`           |   651 | Central state store + event channels                       |
+| `mitm/proto.rs`           |   649 | Protobuf encode/decode for MITM                            |
+| `mitm/intercept.rs`       |   640 | SSE response parser + usage extraction                     |
+| `main.rs`                 |   527 | CLI, startup, wiring                                       |
+| `trace.rs`                |   509 | Per-call debug trace system                                |
+| `mitm/h2_handler.rs`      |   477 | HTTP/2 frame handling                                      |
+| `standalone/spawn.rs`     |   464 | LS process spawning                                        |
+| `api/search.rs`           |   443 | Web search endpoint                                        |
+| `api/types.rs`            |   416 | Shared request/response types                              |
+| `standalone/discovery.rs` |   340 | LS config discovery from `/proc`                           |
+| `proto/mod.rs`            |   340 | Hand-rolled protobuf encoder                               |
+| `api/polling.rs`          |   340 | Cascade polling fallback                                   |
+| `standalone/stub.rs`      |  ~300 | Extension server gRPC stub                                 |
+| `proto/wire.rs`           |  ~200 | Wire-format protobuf helpers                               |
+| `constants.rs`            |  ~100 | Model IDs, service names                                   |
+
+---
+
+## Models
+
+| Proxy Name          | LS Placeholder          | Description                              |
+| ------------------- | ----------------------- | ---------------------------------------- |
+| `opus-4.6`          | `MODEL_PLACEHOLDER_M26` | Claude Opus 4.6 (Thinking) — **default** |
+| `opus-4.5`          | `MODEL_PLACEHOLDER_M12` | Claude Opus 4.5 (Thinking)               |
+| `gemini-3-pro-high` | `MODEL_PLACEHOLDER_M8`  | Gemini 3 Pro (High quality)              |
+| `gemini-3-pro`      | `MODEL_PLACEHOLDER_M7`  | Gemini 3 Pro (Low quality)               |
+| `gemini-3-flash`    | `MODEL_PLACEHOLDER_M18` | Gemini 3 Flash                           |
+
+---
+
+## Stealth Features
+
+| Feature            | Implementation                                                  |
+| ------------------ | --------------------------------------------------------------- |
+| TLS fingerprint    | BoringSSL via `wreq` — Chrome JA3/JA4 + H2 fingerprint          |
+| Protobuf           | Hand-rolled encoder producing byte-exact match to real webview  |
+| Warmup             | Mimics real webview startup RPC sequence                        |
+| Heartbeat          | Periodic keep-alive matching real webview lifecycle             |
+| Reactive streaming | `StreamCascadeReactiveUpdates` for real-time state diffs        |
+| Jitter             | Randomized intervals on warmup/heartbeat                        |
+| Session reuse      | Cascades reused for multi-turn (matches real webview)           |
+| Version detection  | Auto-detects Chrome/Electron/app versions from installed binary |
--- a/docs/endpoint-gap-analysis.md
+++ b/docs/endpoint-gap-analysis.md
@@ -1,130 +0,0 @@
-# Endpoint Gap Analysis
-
-> **Updated:** 2026-02-15  
-> **Sources:** [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create), [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses), [Gemini Thinking Mode](https://ai.google.dev/gemini-api/docs/thinking-mode), proxy source code  
-> **Method:** Full source audit cross-referenced against context7 OpenAI API docs
-
---
-
-## What's Implemented
-
-### All Endpoints
-
- ✅ Sync + streaming modes
- ✅ Model selection + validation
- ✅ OAuth auth check
- ✅ Timeout control
- ✅ Tool definitions, tool choice, tool results (OpenAI → Gemini auto-conversion)
- ✅ MITM bypass path for custom tools
- ✅ Thinking/reasoning in both sync and streaming
- ✅ Generation params forwarded via MITM (`temperature`, `top_p`, `top_k`, `max_output_tokens`, `stop_sequences`, `frequency_penalty`, `presence_penalty`)
- ✅ `reasoning_effort` / `thinkingLevel` — forwarded as `generationConfig.thinkingConfig.thinkingLevel`
- ✅ `response_format: {type: "json_object"}` — injected as `responseMimeType: "application/json"`
- ✅ Google Search grounding — `web_search: true` (Completions), `tools: [{type: "web_search_preview"}]` (Responses), `google_search: true` (Gemini)
- ✅ `/v1/search` endpoint — dedicated web search via Google Search grounding, returns structured results + citations
- ✅ Image uploads — `input_image` / `image_url` with base64 data URIs, injected via MITM as `inlineData`
- ✅ Upstream error propagation — Google API errors (400, 429, 500) returned to client instantly instead of hanging
-
-### Reasoning Effort → Thinking Level Mapping
-
-| OpenAI `reasoning_effort` | Google `thinkingLevel` | Gemini 3 Pro | Gemini 3 Flash |
-| :-----------------------: | :--------------------: | :----------: | :------------: |
-|          `"low"`          |        `"low"`         |      ✅      |       ✅       |
-|        `"medium"`         |       `"medium"`       |      ❌      |       ✅       |
-|         `"high"`          |        `"high"`        | ✅ (default) |  ✅ (default)  |
-|             —             |      `"minimal"`       |      ❌      |       ✅       |
-
-### Completions-Specific
-
- ✅ `stream_options.include_usage` — final chunk with usage before `[DONE]`
- ✅ `completion_tokens_details.reasoning_tokens` — thinking token count
- ✅ `prompt_tokens_details.cached_tokens` — cache read tokens
- ✅ `temperature`, `top_p`, `max_tokens`, `max_completion_tokens`, `frequency_penalty`, `presence_penalty`
- ✅ `reasoning_effort`
- ✅ `stop` — string or array, forwarded as `generationConfig.stopSequences`
- ✅ `response_format: {type: "json_object"}` — injects `responseMimeType`
- ✅ `response_format: {type: "json_schema", json_schema: {...}}` — injects `responseMimeType` + `responseSchema` via MITM
- ✅ `n` (multiple choices) — fires N parallel cascades, collects into `choices[]` (sync only, capped at 5)
- ✅ `conversation` — session ID for multi-turn cascade reuse (custom extension)
- ✅ `reasoning_content` — thinking text in assistant message
- ✅ `system_fingerprint` — `fp_<version>` in sync + all streaming chunks
- ✅ `service_tier` — `"default"` in sync + all streaming chunks
- ✅ `logprobs: null` — in every choice (sync + streaming)
- ✅ `metadata` — accepted in request, ignored
- ✅ `finish_reason` — correctly maps Google's `MAX_TOKENS`→`"length"`, `SAFETY`→`"content_filter"`, etc.
- ✅ Full `messages[]` history — all user, assistant, system, tool messages forwarded
-
-### Responses-Specific
-
- ✅ Full streaming event set (all `response.*` events including reasoning summary)
- ✅ `temperature`, `top_p`, `max_output_tokens`
- ✅ `reasoning_effort` — echoed from client request
- ✅ `thinking_signature` for multi-turn thinking chains
- ✅ `instructions`, `metadata`, `user` — echoed in response
- ✅ Usage with MITM-intercepted real tokens
- ✅ `max_tool_calls` — limits tool calls returned per response
- ✅ `conversation` — session reuse
- ✅ `previous_response_id`, `store`, `parallel_tool_calls`, `truncation`, `text.format`, `tool_choice` — echoed
- ✅ `tools` — echoed from client request (was previously always `[]`)
- ✅ `text.format` — `{format: {type: "json_schema", ...}}` injects `responseMimeType` + `responseSchema` via MITM, echoed in response
-
-### Gemini-Specific
-
- ✅ Native tool format (no conversion needed)
- ✅ `usageMetadata` in sync **and streaming** responses
- ✅ `temperature`, `topP`, `topK`, `maxOutputTokens`, `stopSequences`
- ✅ `thinkingLevel`
- ✅ Session/conversation reuse
- ✅ Array/multipart `input` — strings, string arrays, `{text: "..."}` object arrays
-
---
-
-## Fixed Bugs
-
-| #   | Bug                              | Fix                                                                                                                                         |
-| --- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
-| B1  | Messages history dropped         | `extract_chat_input` now calls `build_conversation_with_tools` with ALL messages — full multi-turn via `messages[]` works.                  |
-| B2  | `finish_reason` never `"length"` | `google_to_openai_finish_reason()` helper maps `MAX_TOKENS`→`"length"`, `SAFETY`/`RECITATION`/etc→`"content_filter"`. Applied to all paths. |
-| B3  | `reasoning` always null          | `build_response_object` now echoes client's `reasoning_effort` from `RequestParams`.                                                        |
-| B4  | `tool_choice` always `"auto"`    | Changed from `&'static str` to `serde_json::Value`. Echoes whatever the client sent.                                                        |
-| B5  | `tools` always `[]`              | Echoes the client's tools array in the response.                                                                                            |
-| B7  | `temperature`/`top_p` wrong      | Already defaults to `1.0` via `unwrap_or(1.0)`. Was a false positive — no fix needed.                                                       |
-
-### Acceptable / Won't Fix
-
-| #   | Bug                                       | Status                                                                                                      |
-| --- | ----------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
-| B6  | `Usage::estimate` fake tokens as fallback | Only triggers on timeout/error paths. Heuristic `len/4` is reasonable for timeouts where output tokens = 0. |
-
---
-
-## TODO — New Features
-
-### Trivial (all done ✅)
-
-All trivial response shape fixes have been implemented.
-
-### Medium (schema injection via MITM) — all done ✅
-
-All structured output features have been implemented.
-
-### Hard (new features)
-
-| #   | Gap                       | API  | Notes                                                      |
-| --- | ------------------------- | ---- | ---------------------------------------------------------- |
-| 7   | **`parallel_tool_calls`** | Both | Accept param, echo in response. Can't enforce server-side. |
-
-### Stretch (research needed)
-
-| #   | Gap             | API  | Notes                                                            |
-| --- | --------------- | ---- | ---------------------------------------------------------------- |
-| 12  | **Audio input** | Both | Audio modalities not yet supported. Vision/images work via MITM. |
-
---
-
-## Won't Implement
-
-| #   | Gap                             | Reason                                                                   |
-| --- | ------------------------------- | ------------------------------------------------------------------------ |
-| 9   | `prediction` (Predicted Output) | Inference-level speculative decoding optimization. No Gemini equivalent. |
-| 10  | `logprobs` / `top_logprobs`     | Gemini never exposes token-level log probabilities.                      |
--- a/docs/extension-server-analysis.md
+++ b/docs/extension-server-analysis.md
@@ -304,47 +304,3 @@ Both use `Connect-Protocol-Version: 1` header.

 5. All other methods — return empty success
   - `GetChromeDevtoolsMcpUrl`, `ShowAnnotation`, `OpenFilePointer`, etc.
-
---
-
-## Current Stub Issues (from latest debug log)
-
-### Issue 1: "key not found"
-
-```
-E0215 20:05:56.311541 server.go:558] Failed to get OAuth token: key not found
-```
-
-The `GetSecretValue` response doesn't match what the LS expects. The LS calls `GetSecretValue` with a specific key, but our stub ignores the key and always returns the token. The "key not found" error suggests the LS's state sync layer caches by key and doesn't find the expected entry.
-
-**Root cause**: The LS doesn't just call `GetSecretValue` — it goes through the `UnifiedStateSyncClient` which uses `GetRow(key)`. The state sync is a key-value store. The LS looks up a specific key in state sync, and the state sync client calls `GetSecretValue` on the extension server. Since our stub returns an empty protobuf for everything except `GetSecretValue`, the state sync's initial `SubscribeToUnifiedStateSyncTopic` gets no data, and subsequent `GetRow()` calls return "key not found".
-
-### Issue 2: "unknown model key MODEL_PLACEHOLDER_M18"
-
-```
-E0215 20:05:56.358443 interceptor.go:74] SendUserCascadeMessage: unknown model key MODEL_PLACEHOLDER_M18
-```
-
-The model configuration isn't loaded because `Cache(loadCodeAssistResponse)` failed. This cache depends on `userInfo` which depends on the OAuth token. Fix the token flow and this should resolve.
-
-### Issue 3: "mkdir permission denied"
-
-```
-E0215 20:05:56.311614 log.go:380] Failed to create artifacts directory...mkdir /tmp/antigravity-standalone/.gemini/antigravity-standalone/brain/.../: permission denied
-```
-
-The LS tries to create directories under the `gemini_dir`. This is non-fatal but noisy.
-
---
-
-## Recommended Fix Strategy
-
-The current approach of parsing individual methods won't scale — ALL 53+ methods are ServerStream and need envelope framing.
-
-**Better approach**: Instead of understanding every method, ensure:
-
-1. **Every response** uses Connect streaming envelope framing (`0x02 + len + {}` minimum)
-2. **GetSecretValue** returns the token in a data envelope before the end-of-stream
-3. **Content-Type** is always `application/connect+proto`
-4. **Connection: close** to avoid HTTP keep-alive issues
-5. Create the `gemini_dir` with proper permissions before spawning the LS
--- a/docs/mitm-interception-status.md
+++ b/docs/mitm-interception-status.md
@@ -1,159 +0,0 @@
-# MITM Traffic Interception — Status
-
-## Status: ✅ FULLY WORKING (Standalone Mode)
-
-MITM interception is operational for the standalone LS. The proxy intercepts,
-decrypts, and parses all LLM API traffic with per-model token usage capture.
-
-## How It Works
-
-```
-Client → Proxy (8741) → Standalone LS (as antigravity-ls user)
-                           ↓ (port 443 traffic)
-                        iptables REDIRECT (UID-scoped)
-                           ↓
-                        MITM Proxy (8742)
-                           ↓ (TLS decrypt + parse SSE)
-                        Google API (daily-cloudcode-pa.googleapis.com)
-```
-
-### Components
-
-1. **UID-scoped iptables** (`scripts/mitm-redirect.sh`)
-   - Creates `antigravity-ls` system user
-   - iptables rule: redirect UID's port-443 → MITM port
-   - Only the standalone LS is affected — no side effects on other software
-
-2. **Combined CA bundle** (`src/standalone.rs`)
-   - Go's `SSL_CERT_FILE` replaces (not appends) the system trust store
-   - Proxy concatenates system CAs + MITM CA → `/tmp/antigravity-mitm-combined-ca.pem`
-   - Set as `SSL_CERT_FILE` on the standalone LS process
-
-3. **`sudo -u` spawning** (`src/standalone.rs`)
-   - If `antigravity-ls` user exists, LS is spawned via `sudo -n -u antigravity-ls`
-   - Env vars passed via `/usr/bin/env KEY=VALUE` args
-   - Falls back to current user if the dedicated user doesn't exist
-
-4. **Google SSE parser** (`src/mitm/intercept.rs`)
-   - Parses `data: {"response": {"usageMetadata": {...}}}` events
-   - Extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`
-   - Handles both Google and Anthropic SSE formats
-
-5. **Transparent proxy** (`src/mitm/proxy.rs`)
-   - Detects iptables-redirected connections via TLS ClientHello SNI
-   - Terminates TLS with dynamically generated certs
-   - Forwards HTTP/1.1 requests upstream with real DNS resolution (`dig @8.8.8.8`)
-   - Chunked response detection for fast completion
-
-6. **Request modification** (`src/mitm/modify.rs`)
-   - Strips LS system instructions down to `<identity>` block only
-   - Removes stale conversation history (keeps only last user message)
-   - Injects client tools, tool configs, generation params
-   - Injects images as `inlineData` (base64) into user message parts
-   - Injects tool results as `functionResponse` parts
-   - Enables Google Search grounding when requested
-   - Updates `Content-Length` header after body modification
-
-7. **Upstream error capture** (`src/mitm/store.rs`)
-   - Captures Google API error responses (HTTP 400, 429, 500, etc.)
-   - Parses error JSON for message and status fields
-   - Stores in `MitmStore` for immediate forwarding to client
-   - Prevents request hangs on upstream failures
-
-## What We Tried (Historical)
-
-### 1. Extension Patch — `detectAndUseProxy` ✅ Still Active
-
-Patches `detectAndUseProxy=1` in the extension JS. Makes auxiliary traffic
-(Unleash, etc.) honor `HTTPS_PROXY`. Harmless, still applied.
-
-### 2. MITM Wrapper (`mitm-wrapper.sh`) ⚠️ Superseded
-
-Sets env vars on the main LS process. Works for routing but the main LS's
-LLM client ignores `HTTPS_PROXY`. Superseded by standalone mode.
-
-### 3. iptables REDIRECT (All Traffic) ❌ Abandoned
-
-Redirected ALL port-443 traffic. Caused redirect loops, broke other HTTPS
-traffic. Replaced by UID-scoped redirect.
-
-### 4. DNS Redirect (`/etc/hosts`) ❌ Abandoned
-
-Same TLS trust issue as #3. Unnecessary with UID-scoped iptables.
-
-### 5. Standalone LS + UID-scoped iptables ✅ WORKING
-
-Current solution. Full MITM interception with zero side effects.
-
-## The Original Blocker (SOLVED)
-
-> The LS's Go LLM HTTP client uses a custom `tls.Config` that does NOT read
-> from `SSL_CERT_FILE` or the system CA store.
-
-**This turned out to be wrong.** The Go client DOES honor `SSL_CERT_FILE` when:
-
- The env var is set BEFORE the process starts (not injected later)
- The value contains a combined bundle (system CAs + custom CA)
- `SSL_CERT_DIR` is set to `/dev/null` to force exclusive use of `SSL_CERT_FILE`
-
-The standalone LS gives us full control over the process environment at spawn
-time, which is why this approach works while the wrapper approach didn't.
-
-## Technical Details
-
-### API Endpoint
-
-`POST https://daily-cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse`
-
-### SSE Response Format
-
-```
-data: {"response": {"candidates": [{"content": {"role": "model", "parts": [{"text": "..."}]}}],
-       "usageMetadata": {"promptTokenCount": 1514, "candidatesTokenCount": 25,
-                         "totalTokenCount": 1539, "thoughtsTokenCount": 52},
-       "modelVersion": "gemini-3-flash"}, "traceId": "...", "metadata": {}}
-```
-
-Last event includes `"finishReason": "STOP"` in the candidate.
-
-### Other Intercepted Endpoints
-
-| Endpoint                    | Type     | Content          |
-| --------------------------- | -------- | ---------------- |
-| `fetchUserInfo`             | Protobuf | User info        |
-| `loadCodeAssist`            | Protobuf | Extension config |
-| `fetchAvailableModels`      | Protobuf | Model catalog    |
-| `webDocsOptions`            | Protobuf | Docs config      |
-| `streamGenerateContent`     | SSE/JSON | LLM responses ✅ |
-| `recordCodeAssistMetrics`   | Protobuf | Telemetry        |
-| `recordTrajectoryAnalytics` | Protobuf | Telemetry        |
-
-### Model IDs
-
-| Placeholder             | Model               |
-| ----------------------- | ------------------- |
-| `MODEL_PLACEHOLDER_M18` | Gemini 3 Flash      |
-| `MODEL_PLACEHOLDER_M8`  | Gemini 3 Pro (High) |
-| `MODEL_PLACEHOLDER_M7`  | Gemini 3 Pro (Low)  |
-| `MODEL_PLACEHOLDER_M26` | Claude Opus 4.6     |
-| `MODEL_PLACEHOLDER_M12` | Claude Opus 4.5     |
-
-### Setup
-
-```bash
-# One-time setup (creates user + iptables rule)
-sudo ./scripts/mitm-redirect.sh install
-
-# Run proxy (standalone + MITM are default)
-RUST_LOG=info ./target/release/antigravity-proxy
-
-# Check usage
-curl -s http://localhost:8741/v1/usage | jq .
-```
-
-### Cleanup
-
-```bash
-# Remove iptables rule + user
-sudo ./scripts/mitm-redirect.sh uninstall
-```
--- a/docs/mitm.md
+++ b/docs/mitm.md
@@ -0,0 +1,167 @@
+# MITM Proxy
+
+## Overview
+
+The built-in MITM proxy intercepts all traffic between the standalone LS and Google's API. It decrypts TLS, parses SSE responses, captures real token usage, and modifies requests to inject tools, parameters, and images.
+
+```mermaid
+sequenceDiagram
+    participant LS as Standalone LS
+    participant IPT as iptables
+    participant MITM as MITM Proxy :8742
+    participant Store as MitmStore
+    participant G as Google API
+
+    LS->>IPT: HTTPS :443
+    IPT->>MITM: REDIRECT (UID-scoped)
+    MITM->>MITM: TLS terminate (dynamic cert)
+    MITM->>Store: Match request by cascade_id
+    Store-->>MITM: RequestContext (tools, params, image)
+    MITM->>MITM: modify_request()
+    MITM->>G: Forward modified request
+    G-->>MITM: SSE stream
+    MITM->>MITM: Parse SSE, extract usage
+    MITM->>Store: Dispatch events (TextDelta, Usage, etc.)
+    MITM-->>LS: Forward original response
+```
+
+---
+
+## Components
+
+```mermaid
+graph TD
+    subgraph "MITM Module"
+        proxy["proxy.rs\nTLS termination\nSNI-based routing"]
+        h2["h2_handler.rs\nHTTP/2 frame handling"]
+        intercept["intercept.rs\nSSE parser\nUsage extraction"]
+        modify["modify.rs\nRequest injection\n(tools, params, images)"]
+        store["store.rs\nMitmStore\nEvent channels"]
+        proto["proto.rs\nProtobuf codec"]
+        ca["ca.rs\nCA + dynamic certs"]
+    end
+
+    proxy --> h2
+    h2 --> intercept
+    h2 --> modify
+    modify --> store
+    intercept --> store
+    proxy --> ca
+    modify --> proto
+
+    style store fill:#dc2626,color:#fff
+    style proxy fill:#ea580c,color:#fff
+```
+
+| File            | Purpose                                                                                       |
+| --------------- | --------------------------------------------------------------------------------------------- |
+| `proxy.rs`      | Accepts iptables-redirected connections, terminates TLS via SNI, manages connection lifecycle |
+| `h2_handler.rs` | HTTP/2 frame-level handling for CONNECT-style proxying                                        |
+| `intercept.rs`  | Parses Google's SSE `data:` lines, extracts `usageMetadata`, detects `finishReason`           |
+| `modify.rs`     | Injects tools, generation params, images, tool results, Google Search grounding into requests |
+| `store.rs`      | Central state — `RequestContext` registry, event channels (`MitmEvent`), usage accumulation   |
+| `proto.rs`      | Protobuf encode/decode for intercepted request/response bodies                                |
+| `ca.rs`         | Generates CA certificate and per-domain leaf certs for TLS termination                        |
+
+---
+
+## Request Modification
+
+When the MITM proxy intercepts an outgoing request from the LS, it applies modifications from the `RequestContext` stored by the API handler:
+
+```mermaid
+flowchart TD
+    A["Original LS Request"] --> B{"Has tools?"}
+    B -- Yes --> C["Inject tool definitions\n+ toolConfig"]
+    B -- No --> D{"Has generation params?"}
+    C --> D
+    D -- Yes --> E["Inject temperature, top_p,\nmax_output_tokens, stop_sequences,\nfrequency/presence_penalty"]
+    D -- No --> F{"Has image?"}
+    E --> F
+    F -- Yes --> G["Inject inlineData\n(base64) into user parts"]
+    F -- No --> H{"Has tool results?"}
+    G --> H
+    H -- Yes --> I["Inject functionResponse\nparts"]
+    H -- No --> J{"Google Search?"}
+    I --> J
+    J -- Yes --> K["Enable Google Search\ngrounding tool"]
+    J -- No --> L["Replace user text\nwith real input"]
+    K --> L
+    L --> M["Update Content-Length"]
+    M --> N["Forward to Google"]
+
+    style A fill:#2563eb,color:#fff
+    style N fill:#059669,color:#fff
+```
+
+---
+
+## SSE Response Format
+
+Google's API returns SSE events:
+
+```
+data: {"response": {"candidates": [{"content": {"role": "model", "parts": [{"text": "..."}]}}],
+       "usageMetadata": {"promptTokenCount": 1514, "candidatesTokenCount": 25,
+                         "totalTokenCount": 1539, "thoughtsTokenCount": 52},
+       "modelVersion": "gemini-3-flash"}, "traceId": "...", "metadata": {}}
+```
+
+The last event includes `"finishReason": "STOP"` in the candidate.
+
+---
+
+## MitmEvent Channel
+
+Events dispatched through `tokio::sync::mpsc` channels from MITM → API handlers:
+
+| Event                   | Source         | Data                                          |
+| ----------------------- | -------------- | --------------------------------------------- |
+| `TextDelta(String)`     | `intercept.rs` | Incremental text from model                   |
+| `ThinkingDelta(String)` | `intercept.rs` | Thinking/reasoning text                       |
+| `Usage(ApiUsage)`       | `intercept.rs` | Token counts (input, output, thinking, cache) |
+| `FunctionCall(Vec)`     | `intercept.rs` | Tool calls from model                         |
+| `Grounding(Value)`      | `intercept.rs` | Google Search grounding metadata              |
+| `ResponseComplete`      | `intercept.rs` | Stream finished                               |
+| `UpstreamError(Value)`  | `intercept.rs` | Google API error (400, 429, 500)              |
+
+---
+
+## Setup
+
+### UID-Scoped iptables (Classic Mode)
+
+```bash
+# One-time setup — creates antigravity-ls user + iptables rule
+sudo ./scripts/mitm-redirect.sh install
+
+# Run proxy (standalone LS + MITM both enabled by default)
+RUST_LOG=info ./target/release/antigravity-proxy
+
+# Check intercepted usage
+curl -s http://localhost:8741/v1/usage | jq .
+
+# Cleanup
+sudo ./scripts/mitm-redirect.sh uninstall
+```
+
+### Headless Mode
+
+No iptables or sudo needed. The LS connects through `HTTPS_PROXY` instead:
+
+```bash
+RUST_LOG=info ./target/release/antigravity-proxy --headless
+```
+
+---
+
+## Intercepted Endpoints
+
+| Endpoint                    | Type     | Content                   |
+| --------------------------- | -------- | ------------------------- |
+| `streamGenerateContent`     | SSE/JSON | LLM responses ✅ (parsed) |
+| `fetchUserInfo`             | Protobuf | User info                 |
+| `loadCodeAssist`            | Protobuf | Extension config          |
+| `fetchAvailableModels`      | Protobuf | Model catalog             |
+| `recordCodeAssistMetrics`   | Protobuf | Telemetry (ignored)       |
+| `recordTrajectoryAnalytics` | Protobuf | Telemetry (ignored)       |
--- a/docs/panel-stream-investigation.md
+++ b/docs/panel-stream-investigation.md
@@ -1,93 +0,0 @@
-# Panel Stream Investigation — Dead End
-
-## Summary
-
-Investigated `StreamCascadePanelReactiveUpdates` RPC as a potential source for
-progressive thinking text. **Result: dead end.** The panel state only contains
-UI metadata (`plan_status`, `user_settings`), not thinking content.
-
-## What We Tried
-
-### 1. Subscribe with Cascade ID
-
-Attempted to subscribe to `StreamCascadePanelReactiveUpdates` using the cascade
-ID as the reactive component identifier:
-
-```json
-{ "protocolVersion": 1, "id": "<cascade-id>" }
-```
-
-**Result:** `"reactive component <cascade-id> not found"`
-
-### 2. Retry with Delays
-
-Added retry logic (3 attempts, 500ms/1s/1.5s delays) to handle the possibility
-that the panel state is created asynchronously after cascade start.
-
-**Result:** Same error on all attempts. The panel state uses a different
-identifier than the cascade ID.
-
-### 3. InitializeCascadePanelState Analysis
-
-Examined the RPC that creates panel state:
-
-```js
-await this.client.initializeCascadePanelState({ metadata: e, userStatus: t });
-```
-
-Takes workspace metadata + user status, not cascade ID. Panel state is
-workspace-scoped, not cascade-scoped.
-
-## CascadePanelState Proto Definition
-
-```
-exa.cortex_pb.CascadePanelState:
-  field 1: plan_status  (PlanStatus)
-  field 2: user_settings (UserSettings)
-```
-
-Only 2 fields — neither contains thinking text.
-
-## Where Thinking Text Actually Lives
-
-Thinking text flows through **`StreamCascadeReactiveUpdates`** (the cascade
-reactive diffs that we already subscribe to):
-
-```
-CascadeState (jetski_cortex_pb)
-  └─ field 2: trajectory (gemini_coder.Trajectory)
-       └─ field 2: steps[] (gemini_coder.Step)
-            └─ field 20: planner_response (CortexStepPlannerResponse)
-                 ├─ field 1: response (string — streams progressively)
-                 ├─ field 3: thinking (string — raw thinking text)
-                 ├─ field 8: modified_response (string)
-                 └─ field 11: thinking_duration (Duration)
-```
-
-### Observed Behavior (gemini-3-flash)
-
- Thinking text arrives as a **single atomic diff** (341 chars, one shot)
- Response text streams progressively across many diffs (26 → 1796 chars)
- Total diffs per request: ~20
-
-### Current Proxy Approach
-
-The proxy already captures thinking text correctly through polling
-`GetCascadeTrajectory` + `extract_thinking_content()`. No reactive diff
-parsing needed for current functionality.
-
-### Future: Progressive Thinking for Extended-Thinking Models
-
-For Opus models with extended thinking, the thinking text _might_ arrive
-progressively across multiple reactive diffs. If needed:
-
-1. Parse reactive diff JSON for field 3 changes within field 20
-2. Diff the thinking text between updates for incremental deltas
-3. Emit `response.reasoning_summary_text.delta` events as thinking grows
-
-## Cleanup
-
- Removed `stream_cascade_panel_updates()` from `backend.rs`
- Removed panel stream subscription + retry code from `responses.rs`
- `StreamCascadeReactiveUpdates` (cascade diffs) is still used for
-  real-time notification of state changes (with polling as fallback)
--- a/docs/standalone-ls-todo.md
+++ b/docs/standalone-ls-todo.md
@@ -1,93 +0,0 @@
-# Standalone LS for Proxy Isolation
-
-## Status: ✅ FULLY IMPLEMENTED (incl. headless mode + MITM)
-
-Two modes available:
-
- **Normal standalone** (default) — steals config from running Antigravity, optional UID isolation
- **Headless** (`--headless`) — fully independent, no running Antigravity required
-
-## Headless Mode
-
-Pass `--headless` to the proxy. This:
-
-1. Generates its own CSRF token (random UUID)
-2. Passes `-extension_server_port=0` to the LS (disables extension server callbacks)
-3. Passes `-standalone=true` to the LS binary (built-in standalone flag)
-4. Uses `HTTPS_PROXY` env var for MITM (no iptables/sudo required)
-5. No `/proc` scanning, no dependency on running Antigravity
-
-```bash
-# Headless (no Antigravity needed)
-RUST_LOG=info ./target/release/antigravity-proxy --headless
-
-# With MITM disabled
-./target/release/antigravity-proxy --headless --no-mitm
-```
-
-## Normal Standalone Mode
-
-The default mode (disable with `--no-standalone`):
-
-1. Discovers `extension_server_port` and `csrf_token` from the real LS (via `/proc/PID/cmdline`)
-2. Picks a random free port
-3. Builds init metadata protobuf (via `proto::build_init_metadata()`)
-4. Spawns the LS binary with correct args and env vars
-5. Feeds init metadata via stdin, then closes it
-6. Waits for TCP readiness (retry loop)
-7. Kills the child on proxy shutdown (via `Drop`)
-
-### UID Isolation (MITM mode)
-
-When `scripts/mitm-redirect.sh install` has been run:
-
-1. The `antigravity-ls` system user exists
-2. iptables redirects that UID's port-443 traffic → MITM proxy port
-3. The proxy spawns the LS via `sudo -n -u antigravity-ls`
-4. Environment variables (`SSL_CERT_FILE`, etc.) are passed via `/usr/bin/env`
-5. A combined CA bundle (system CAs + MITM CA) is written to `/tmp/antigravity-mitm-combined-ca.pem`
-6. Only the standalone LS traffic is intercepted — no impact on other software
-
-## LS Binary Flags (Reference)
-
-From `language_server_linux_x64 --help`:
-
-| Flag                     | Default | Description                           |
-| ------------------------ | ------- | ------------------------------------- |
-| `-standalone`            | `false` | Whether to run in standalone mode     |
-| `-extension_server_port` | `0`     | Extension server port. If 0, not used |
-| `-csrf_token`            | `""`    | CSRF token for RPC auth               |
-| `-server_port`           | `42100` | Port for LS ↔ extension               |
-| `-enable_lsp`            | `false` | Enable LSP protocol                   |
-| `-cloud_code_endpoint`   | `""`    | CCPA API URL                          |
-| `-parent_pipe_path`      | `""`    | Monitors parent process liveness      |
-
-## Key Technical Details
-
- Init metadata protobuf field 34 = `detect_and_use_proxy` (1=ENABLED)
- Model IDs: M18=Flash, M8=Pro-High, M7=Pro-Low, M26=Opus4.6, M12=Opus4.5
- LS binary: `/usr/share/antigravity/resources/app/extensions/antigravity/bin/language_server_linux_x64`
- API endpoint: `daily-cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse`
-
-## Test Results (2026-02-15)
-
-| Endpoint                          | Result                      |
-| --------------------------------- | --------------------------- |
-| `GET /health`                     | OK                          |
-| `GET /v1/models`                  | OK, 5 models                |
-| `GET /v1/sessions`                | OK                          |
-| `GET /v1/quota`                   | OK, real plan/credits       |
-| `GET /v1/usage`                   | OK, real MITM tokens        |
-| `POST /v1/responses` (sync)       | OK                          |
-| `POST /v1/responses` (stream)     | OK, full SSE event set      |
-| `POST /v1/responses` (multi-turn) | OK, context preserved       |
-| `POST /v1/responses` (tools)      | OK, function calls captured |
-| `POST /v1/responses` (images)     | OK, MITM injection          |
-| `POST /v1/chat/completions`       | OK                          |
-| `POST /v1/gemini`                 | OK                          |
-| `GET/POST /v1/search`             | OK, grounding + citations   |
-| MITM interception                 | OK, TLS decrypt + parse     |
-| MITM request modification         | OK, tools/images/params     |
-| MITM usage capture                | OK, per-model token counts  |
-| MITM error capture                | OK, instant client feedback |
-| UID isolation                     | OK, no side effects         |
--- a/docs/traces.md
+++ b/docs/traces.md
@@ -0,0 +1,118 @@
+# Trace System
+
+Per-call debug traces for inspecting request/response flow. Every API call writes a structured trace directory.
+
+## Location
+
+```
+~/.config/antigravity-proxy/traces/{YYYY-MM-DD}/{HH-MM-SS.sss}_{cascade_short}/
+```
+
+Disable with `--no-trace`.
+
+## Files Per Trace
+
+| File            | Purpose                                                    |
+| --------------- | ---------------------------------------------------------- |
+| `meta.txt`      | One-line grep-friendly summary                             |
+| `summary.md`    | Human-readable trace overview with tables                  |
+| `request.json`  | Client request metadata (message count, preview, tools)    |
+| `response.json` | Token usage (input, output, thinking, cache)               |
+| `turns.json`    | Per-turn details (MITM match, gate wait, response preview) |
+
+## Data Flow
+
+```mermaid
+sequenceDiagram
+    participant H as API Handler
+    participant T as TraceHandle
+    participant D as Disk
+
+    H->>T: trace.start(cascade_id, endpoint, model)
+    H->>T: set_client_request(preview, tool_count, ...)
+    Note over H: Request processing...
+    H->>T: start_turn()
+    H->>T: record_mitm_match(gate_wait_ms)
+    Note over H: Response arrives...
+    H->>T: record_response(text_len, preview, finish_reason)
+    H->>T: set_usage(input, output, thinking, cache)
+    H->>T: finish("completed")
+    T->>D: Write meta.txt, summary.md, request.json, response.json, turns.json
+```
+
+## Example: meta.txt
+
+```
+cascade=e57e3ddf endpoint=POST gemini model=gemini-3-flash outcome=completed duration=1865ms stream=false
+```
+
+## Example: request.json
+
+```json
+{
+  "message_count": 2,
+  "tool_count": 3,
+  "tool_round_count": 0,
+  "user_text_len": 46,
+  "user_text_preview": "You are a pirate.\n\nSay ahoy in exactly 3 words",
+  "system_prompt": true,
+  "has_image": false
+}
+```
+
+## Example: turns.json
+
+```json
+[
+  {
+    "turn": 0,
+    "mitm_matched": true,
+    "gate_wait_ms": 90,
+    "response": {
+      "text_len": 18,
+      "thinking_len": 0,
+      "text_preview": "Ahoy there, matey!",
+      "finish_reason": "stop",
+      "grounding": false
+    }
+  }
+]
+```
+
+## Example: response.json
+
+```json
+{
+  "usage": {
+    "input_tokens": 284,
+    "output_tokens": 13,
+    "thinking_tokens": 37,
+    "cache_read": 0
+  }
+}
+```
+
+## Outcomes
+
+| Outcome          | When                              |
+| ---------------- | --------------------------------- |
+| `completed`      | Normal response received          |
+| `tool_call`      | Model returned function calls     |
+| `upstream_error` | Google API returned an error      |
+| `timeout`        | No response within timeout window |
+| `mitm_timeout`   | MITM gate match timed out         |
+
+## Agent Usage
+
+Traces are designed for LLM consumption. To inspect the last trace:
+
+```bash
+# Find latest trace
+ls -t ~/.config/antigravity-proxy/traces/$(date +%Y-%m-%d)/ | head -1
+
+# Read the summary
+cat ~/.config/antigravity-proxy/traces/$(date +%Y-%m-%d)/$(ls -t ~/.config/antigravity-proxy/traces/$(date +%Y-%m-%d)/ | head -1)/summary.md
+
+# Grep for failures
+grep 'outcome=.*error\|outcome=.*timeout' ~/.config/antigravity-proxy/traces/$(date +%Y-%m-%d)/*/meta.txt
+```