docs: overhaul docs, add architecture and traces, update README/GEMINI

- Add docs/architecture.md with 4 mermaid diagrams - Add docs/mitm.md with 3 mermaid diagrams (replaces mitm-interception-status) - Add docs/traces.md documenting per-call trace system - Rewrite README.md to be concise with mermaid and doc refs - Rewrite GEMINI.md for core philosophy and agent usage - Clean extension-server-analysis.md (remove stale debug sections) - Delete temp docs: standalone-ls-todo, panel-stream-investigation, endpoint-gap-analysis, request-comparison
2026-02-18 01:31:18 -06:00
parent 28d3296c87
commit 3d87c04d20
11 changed files with 679 additions and 1305 deletions
--- a/GEMINI.md
+++ b/GEMINI.md
@@ -2,288 +2,125 @@

 OpenAI-compatible proxy that intercepts and relays requests to Google's Antigravity language server, impersonating the real Electron webview.

-## Quick Start
+## Core Philosophy

-```bash
-# Headless mode (no running Antigravity app needed)
-RUST_LOG=info ./target/release/antigravity-proxy --headless
+### Stealth Goal

-# Classic mode (requires running Antigravity + sudo setup for MITM)
-sudo ./scripts/mitm-redirect.sh install
-proxyctl start
+The primary objective is to make Google's upstream API unable to distinguish proxy requests from real Antigravity webview traffic. Unlike `cliProxyApi` or other known proxy patterns, this proxy:

-# Or run directly
-RUST_LOG=info ./target/release/antigravity-proxy
-```
+- Produces **byte-exact protobuf** matching real webview format
+- Uses **BoringSSL TLS fingerprinting** with Chrome JA3/JA4 + H2 signatures (version auto-detected)
+- Performs **warmup and heartbeat RPCs** mimicking real webview lifecycle
+- Applies **jitter** to all intervals to avoid automation fingerprints
+- **Reuses cascades** for multi-turn just like the real webview

-Default port: **8741**
+### Stability Approach

-## CLI Tools
+The Language Server (LS) binary is a closed-source Go program with many unknown mechanics. To avoid instability:
+
+1. **Send dummy prompts to the LS** — the proxy sends `"."` as the cascade message. The LS receives minimal input to reduce the chance of panics or unexpected behavior.
+2. **All real content goes through MITM** — the MITM proxy intercepts the LS's outgoing request and replaces the dummy prompt with the real user input, injects tools, images, generation params, etc.
+3. **Never send results back to the LS** — tool results, function responses, and follow-ups are injected into the _next_ MITM-intercepted request. The LS is used as a dumb relay that triggers API calls — nothing more.
+4. **Pass as little as possible** — the LS only needs a cascade ID and a dummy message. Everything else is handled by the MITM layer.
+
+This "LS as dumb relay" pattern keeps the LS interactions minimal and predictable, avoiding the many unknown edge cases in its internal state machine.
+
+## Agent Quick Reference

 ### `proxyctl` — Daemon Manager

-Symlinked to `~/.local/bin/proxyctl` for global access. Manages the proxy as a systemd user service.
-
-| Command               | Description                             |
-| --------------------- | --------------------------------------- |
-| `proxyctl start`      | Start the proxy daemon                  |
-| `proxyctl stop`       | Stop the proxy daemon                   |
-| `proxyctl restart`    | Rebuild + restart                       |
-| `proxyctl rebuild`    | Build release binary only               |
-| `proxyctl status`     | Service status + quota + usage          |
-| `proxyctl logs [N]`   | Tail last N lines (default 30) + follow |
-| `proxyctl logs-all`   | Full log dump (no follow)               |
-| `proxyctl test [msg]` | Quick test request (gemini-3-flash)     |
-| `proxyctl health`     | Health check                            |
-
-### `mitm-redirect.sh` — MITM Setup
-
-One-time setup script for UID-scoped iptables traffic redirection.
+`proxyctl` commands exit immediately (not foreground) — safe for agent use via fast-bash MCP.

 ```bash
-sudo ./scripts/mitm-redirect.sh install    # create user + iptables rule
-sudo ./scripts/mitm-redirect.sh uninstall  # remove user + iptables rule
-sudo ./scripts/mitm-redirect.sh status     # check current state
+# Rebuild and restart after code changes
+proxyctl restart
+
+# Quick test
+proxyctl test "say hi in 3 words"
+
+# Check status
+proxyctl status
+
+# Check health
+proxyctl health
 ```

+| Command               | Description                         |
+| --------------------- | ----------------------------------- |
+| `proxyctl start`      | Start the proxy daemon              |
+| `proxyctl stop`       | Stop the proxy daemon               |
+| `proxyctl restart`    | Rebuild + restart                   |
+| `proxyctl rebuild`    | Build release binary only           |
+| `proxyctl status`     | Service status + quota + usage      |
+| `proxyctl logs [N]`   | Tail last N lines + follow          |
+| `proxyctl logs-all`   | Full log dump (no follow)           |
+| `proxyctl test [msg]` | Quick test request (gemini-3-flash) |
+| `proxyctl health`     | Health check                        |
+
+### Testing After Changes
+
+```bash
+# 1. Rebuild + restart
+proxyctl restart
+
+# 2. Test an endpoint
+curl -s http://localhost:8741/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "gemini-3-flash", "messages": [{"role": "user", "content": "Say hi"}]}' | jq .
+
+# 3. Inspect latest trace
+TRACE_DIR=~/.config/antigravity-proxy/traces/$(date +%Y-%m-%d)
+cat "$TRACE_DIR/$(ls -t "$TRACE_DIR" | head -1)/summary.md"
+```
+
+### Dev vs Production Models
+
+- **`gemini-3-flash`** — use for all development and testing
+- **`opus-4.6`** — production only, has quota limits
+
 ## Endpoints

-| Method     | Path                   | Description                                                 |
-| ---------- | ---------------------- | ----------------------------------------------------------- |
-| `POST`     | `/v1/responses`        | **Responses API** (primary) — supports `stream: true/false` |
-| `POST`     | `/v1/chat/completions` | Chat Completions API (OpenAI compat shim)                   |
-| `GET/POST` | `/v1/search`           | **Web Search** — Google Search grounding, returns results   |
-| `GET`      | `/v1/models`           | List available models                                       |
-| `GET`      | `/v1/sessions`         | List active sessions                                        |
-| `DELETE`   | `/v1/sessions/:id`     | Delete a session                                            |
-| `POST`     | `/v1/token`            | Set OAuth token at runtime                                  |
-| `GET`      | `/v1/usage`            | MITM-intercepted token usage stats                          |
-| `GET`      | `/v1/quota`            | LS quota — credits, per-model rate limits, reset timers     |
-| `GET`      | `/health`              | Health check                                                |
-
-## Available Models
-
-| Name                | Label                                    |
-| ------------------- | ---------------------------------------- |
-| `opus-4.6`          | Claude Opus 4.6 (Thinking) — **default** |
-| `opus-4.5`          | Claude Opus 4.5 (Thinking)               |
-| `gemini-3-pro-high` | Gemini 3 Pro (High)                      |
-| `gemini-3-pro`      | Gemini 3 Pro (Low)                       |
-| `gemini-3-flash`    | Gemini 3 Flash                           |
-
-## Development & Testing
-
- **Dev/testing model**: `gemini-3-flash` — use this for all development, debugging, and iterative testing
- **Production model**: `opus-4.6` — use sparingly for real-world validation only (has quota limit)
- See `docs/ls-binary-analysis.md` for full reverse-engineered model catalog and proto enum mappings
-
-## Example: Responses API
-
-### Sync
-
-```bash
-curl -s http://localhost:8741/v1/responses \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "gemini-3-flash",
-    "input": "Say hello in exactly 3 words",
-    "stream": false,
-    "timeout": 60
-  }' | jq .
-```
-
-### Streaming
-
-```bash
-curl -N http://localhost:8741/v1/responses \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "gemini-3-flash",
-    "input": "Say hello in exactly 3 words",
-    "stream": true,
-    "timeout": 60
-  }'
-```
-
-### Multi-turn (session reuse)
-
-```bash
-curl -s http://localhost:8741/v1/responses \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "gemini-3-flash",
-    "input": "What is 2+2?",
-    "conversation": "my-session-1",
-    "stream": false
-  }' | jq .
-
-# Follow-up in same cascade:
-curl -s http://localhost:8741/v1/responses \\
-  -H "Content-Type: application/json" \\
-  -d '{
-    "model": "gemini-3-flash",
-    "input": "Now multiply that by 10",
-    "conversation": "my-session-1",
-    "stream": false
-  }' | jq .
-```
-
-## Web Search
-
-The proxy supports Google Search grounding in two ways:
-
-### 1. Dedicated Search Endpoint (`/v1/search`)
-
-Returns structured search results with citations:
-
-```bash
-# Quick GET search
-curl -s 'http://localhost:8741/v1/search?q=latest+rust+news' | jq .
-
-# Full POST search with options
-curl -s http://localhost:8741/v1/search \\
-  -H "Content-Type: application/json" \\
-  -d '{
-    "query": "latest Rust programming news",
-    "model": "gemini-3-flash",
-    "timeout": 30
-  }' | jq .
-```
-
-Response includes `summary`, `results[]` (title + URL), `citations[]`, and raw `grounding_metadata`.
-
-### 2. Inline Grounding (on any endpoint)
-
-Enable Google Search grounding on regular requests:
-
-```bash
-# Completions API
-curl -s http://localhost:8741/v1/chat/completions \\
-  -H "Content-Type: application/json" \\
-  -d '{
-    "model": "gemini-3-flash",
-    "messages": [{"role": "user", "content": "What happened in tech today?"}],
-    "web_search": true
-  }' | jq .
-
-# Responses API (OpenAI-style tool)
-curl -s http://localhost:8741/v1/responses \\
-  -H "Content-Type: application/json" \\
-  -d '{
-    "model": "gemini-3-flash",
-    "input": "What happened in tech today?",
-    "tools": [{"type": "web_search_preview"}],
-    "stream": false
-  }' | jq .
-
-# Gemini API
-curl -s http://localhost:8741/v1/gemini \\
-  -H "Content-Type: application/json" \\
-  -d '{
-    "model": "gemini-3-flash",
-    "message": "What happened in tech today?",
-    "google_search": true
-  }' | jq .
-```
+| Method     | Path                              | Description                          |
+| ---------- | --------------------------------- | ------------------------------------ |
+| `POST`     | `/v1/responses`                   | Responses API (sync + streaming)     |
+| `POST`     | `/v1/chat/completions`            | Chat Completions API (OpenAI compat) |
+| `POST`     | `/v1/gemini`                      | Native Gemini API                    |
+| `POST`     | `/v1beta/models/{model}:{action}` | Official Gemini v1beta routes        |
+| `GET/POST` | `/v1/search`                      | Web Search via Google grounding      |
+| `GET`      | `/v1/models`                      | List available models                |
+| `GET`      | `/v1/sessions`                    | List active sessions                 |
+| `DELETE`   | `/v1/sessions/{id}`               | Delete a session                     |
+| `POST`     | `/v1/token`                       | Set OAuth token at runtime           |
+| `GET`      | `/v1/usage`                       | MITM-intercepted token usage         |
+| `GET`      | `/v1/quota`                       | LS quota and rate limits             |
+| `GET`      | `/health`                         | Health check                         |

 ## Authentication

-The proxy needs an OAuth token. Three ways to provide it:
+The proxy needs an OAuth token:

-1. **Environment variable**: `export ANTIGRAVITY_OAUTH_TOKEN=ya29.xxx`
-2. **Token file**: `echo 'ya29.xxx' > ~/.config/antigravity-proxy-token`
-3. **Runtime API**: `curl -X POST http://localhost:8741/v1/token -d '{"token":"ya29.xxx"}'`
+1. **Env var**: `ANTIGRAVITY_OAUTH_TOKEN=ya29.xxx`
+2. **Token file**: `~/.config/antigravity-proxy-token`
+3. **Runtime**: `curl -X POST http://localhost:8741/v1/token -d '{"token":"ya29.xxx"}'`

-## Version Detection
+## CLI Flags

-Version strings (Antigravity, Chrome, Electron, Client) are **auto-detected** at startup from the installed Antigravity app:
+| Flag                 | Default | Description                                               |
+| -------------------- | ------- | --------------------------------------------------------- |
+| `--headless`         | `true`  | Fully standalone — no running Antigravity app needed      |
+| `--classic`          | `false` | Attach to running Antigravity (alias for `--no-headless`) |
+| `--port <PORT>`      | `8741`  | Proxy listen port                                         |
+| `--no-mitm`          | `false` | Disable MITM proxy                                        |
+| `--mitm-port <PORT>` | `8742`  | MITM proxy port                                           |
+| `--no-standalone`    | `false` | Attach to real LS instead of spawning standalone          |
+| `--no-trace`         | `false` | Disable per-call debug traces                             |

- `product.json` → app version + client/IDE version
- Binary → Chrome + Electron versions via `strings`
+## Documentation

-Falls back to hardcoded values if the app isn't installed. No manual updates needed when Antigravity updates.
+See `docs/` for detailed documentation:

-## Standalone LS
-
-By default, the proxy spawns its own Language Server instance for full isolation.
-
-### Headless Mode (`--headless`)
-
-Fully independent — no running Antigravity app, no sudo, no iptables:
-
-1. Generates its own CSRF token (random UUID)
-2. Passes `-standalone=true` and `-extension_server_port=0` to the LS binary
-3. Uses `HTTPS_PROXY` for MITM (no iptables required)
-4. Only needs the LS binary installed at the standard path
-
-### Classic Mode (default)
-
-1. Discovers the main LS config (`extension_server_port`, `csrf_token`) from the running Antigravity app
-2. Spawns a standalone LS binary on a random port
-3. Builds init metadata protobuf (model config, `detect_and_use_proxy=ENABLED`)
-4. If MITM is active, spawns as `antigravity-ls` user for UID-scoped traffic interception
-5. Kills the child on proxy shutdown
-
-Disable with `--no-standalone` to attach to the real LS instead.
-
-**Module:** `src/standalone.rs`
-
-## Stealth Features
-
- **TLS fingerprint**: BoringSSL with Chrome JA3/JA4 + H2 fingerprint via `wreq` (version auto-detected)
- **Protobuf**: Hand-rolled encoder producing byte-exact match to real webview traffic
- **Warmup**: Mimics real webview startup RPC calls
- **Heartbeat**: Periodic keep-alive matching real webview lifecycle
- **Reactive streaming**: `StreamCascadeReactiveUpdates` for real-time state diffs (polling fallback)
- **Jitter**: Randomized intervals to avoid automation fingerprint
- **Session reuse**: Cascades reused for multi-turn, matching real webview behavior
- **MITM proxy**: TLS-intercepting proxy for real token usage capture
-
-## MITM Proxy
-
-Built-in MITM proxy intercepts LS ↔ Google API traffic to capture **real** token usage (input, output, thinking tokens). Enabled by default with the standalone LS. Disable with `--no-mitm`.
-
-### How It Works
-
-```
-Client → Proxy (8741) → Standalone LS (as antigravity-ls user)
-                           ↓ (port 443 traffic)
-                        iptables REDIRECT (UID-scoped)
-                           ↓
-                        MITM Proxy (8742)
-                           ↓ (TLS decrypt + parse SSE)
-                        Google API (daily-cloudcode-pa.googleapis.com)
-```
-
-### Setup
-
-```bash
-# One-time setup (creates user + iptables rule)
-sudo ./scripts/mitm-redirect.sh install
-
-# Run proxy (standalone LS + MITM are both on by default)
-RUST_LOG=info ./target/release/antigravity-proxy
-
-# Check intercepted usage
-curl -s http://localhost:8741/v1/usage | jq .
-
-# Cleanup
-sudo ./scripts/mitm-redirect.sh uninstall
-```
-
-### Details
-
- **UID-scoped iptables**: Only the standalone LS's traffic is intercepted (no side effects)
- **Combined CA bundle**: System CAs + MITM CA → `/tmp/antigravity-mitm-combined-ca.pem`
- **Google SSE parsing**: Extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`
- **Init metadata**: Protobuf field 34 `detect_and_use_proxy` set to ENABLED (1)
- See `docs/mitm-interception-status.md` for full technical details
- See `docs/ls-binary-analysis.md` for proto enum mappings and model IDs
-
-### CLI Flags
-
- `--headless`: Fully standalone — no running Antigravity app required
- `--no-mitm`: Disable MITM proxy entirely
- `--no-standalone`: Attach to existing LS instead of spawning standalone
- `--mitm-port <PORT>`: Override MITM proxy port (default: auto-assign)
- `--port <PORT>`: Override proxy listen port (default: 8741)
+- `architecture.md` — system overview, module map, request lifecycle (mermaid diagrams)
+- `mitm.md` — MITM proxy internals, event flow, request modification
+- `traces.md` — per-call debug trace system
+- `extension-server-analysis.md` — extension server protocol reverse engineering
+- `ls-binary-analysis.md` — LS binary reverse engineering, model catalog, gRPC services