feat: MITM interception for standalone LS with UID isolation

- Spawn standalone LS as dedicated 'antigravity-ls' user via sudo - UID-scoped iptables redirect (port 443 → MITM proxy) via mitm-redirect.sh - Combined CA bundle (system CAs + MITM CA) for Go TLS trust - Transparent TLS interception with chunked response detection - Google SSE parser for streamGenerateContent usage extraction - Timeouts on all MITM operations (TLS handshake, upstream, idle) - Forward response data immediately (no buffering) - Per-model token usage capture (input, output, thinking) - Update docs and known issues to reflect resolved TLS blocker
2026-02-14 17:50:12 -06:00
parent 6842bfeaa5
commit d4de436856
10 changed files with 1156 additions and 478 deletions
--- a/KNOWN_ISSUES.md
+++ b/KNOWN_ISSUES.md
@@ -1,92 +1,62 @@
 # Known Issues & Future Work

-All fixable issues from the original report have been resolved. The remaining
-items require either architectural changes, new features, or deep investigation
-of the Go language server binary.
+All critical blockers have been resolved. MITM interception is fully working
+in standalone mode with UID-scoped iptables redirection.

 ---

-## 🔴 Blockers (Require Deep Investigation)
+## ✅ Resolved

-### 1. LS Go LLM Client Ignores System TLS Trust Store
+### ~~LS Go LLM Client Ignores System TLS Trust Store~~

-**File:** `docs/mitm-interception-status.md`
+**Status: SOLVED (2026-02-14)**

-The LS binary's Go HTTP client for LLM API calls uses a custom `tls.Config` that
-does **not** trust system CAs or honor `SSL_CERT_FILE`. Our MITM proxy can route
-traffic but not decrypt it.
+Previously the #1 blocker. The standalone LS (`--standalone` flag) now routes
+all LLM API traffic through the MITM proxy with full decryption.

-**Investigation status:** All practical approaches have been tried and failed:
+**Solution:**

- iptables REDIRECT → redirect loop + broke all HTTPS traffic
- DNS redirect → same TLS trust failure
- LD_PRELOAD → Go doesn't use libc for syscalls
- SSLKEYLOGFILE → Go doesn't support it
+1. **UID-scoped iptables** — `scripts/mitm-redirect.sh` creates an `antigravity-ls`
+   system user. iptables redirects only that UID's port-443 traffic → MITM port.
+2. **Combined CA bundle** — The Go client honors `SSL_CERT_FILE` when set on
+   the standalone process. A combined bundle (system CAs + MITM CA) is written
+   to `/tmp/antigravity-mitm-combined-ca.pem`.
+3. **`sudo -u` spawning** — The proxy spawns the LS as the `antigravity-ls` user,
+   so only the standalone LS traffic is intercepted. No impact on other software.
+4. **Google SSE parsing** — MITM parses `streamGenerateContent?alt=sse` responses
+   and extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`.

-**Remaining options (untried):**
-
- Binary patching Go TLS verification (fragile, breaks on updates)
- Full standalone LS control (see issue #2)
- eBPF/ptrace syscall interception (complex)
- Network namespace isolation (complex setup)
-
-**Confidence: <30%** — all easy paths exhausted. Requires reverse engineering the Go binary's TLS setup.
-
-**See:** `docs/mitm-interception-status.md` for full analysis
+**Verified:** `/v1/usage` returns per-model token usage from intercepted traffic.

 ---

-### 2. Standalone LS Cascades Silently Fail
+## 🟡 Medium (Architecture / Future Work)

-**File:** `docs/standalone-ls-todo.md`
-
-Standalone LS (outside Antigravity) accepts `StartCascade` RPCs without error
-but cascade never progresses. No output.
-
-**Suspected blockers:**
-
- Missing auth context (OAuth token propagation)
- Different Unleash feature flags between main and standalone instances
- Missing initialization steps (`LoadCodeAssist`, `OnboardUser`)
- Missing extension server callbacks (`WriteCascadeEdit`, `ExecuteCommand`)
-
-**Confidence: <30%** — too many unknowns. Needs systematic debugging with the standalone LS.
-
-**See:** `docs/standalone-ls-todo.md` for investigation plan
-
---
-
-## Medium (Architecture / Future Work)
-
-### 3. Cascade Correlation Is Heuristic
+### 1. Cascade Correlation Is Heuristic

 **File:** `src/mitm/intercept.rs` — `extract_cascade_hint()`

-The MITM proxy matches intercepted API traffic to cascade IDs heuristically:
+The MITM proxy matches intercepted API traffic to cascade IDs heuristically.
+Currently all intercepted usage is stored under `_latest` because the Google
+SSE request body is empty (`content_length=0` — the LS sends the request body
+via chunked encoding that isn't captured in the hint extractor).

- HTTP/1.1 path: scans JSON body for `metadata.user_id` or `workspace_id`
- gRPC/H2 path: recursively searches proto fields for UUID strings
-
-If neither method finds a match, usage is stored under `_latest` but never
-consumed (since `take_usage()` requires exact cascade ID match).
-
-**Confidence: <50%** — can't test without working MITM interception (blocked by issue #1). The heuristic is reasonable but unverified against real traffic.
+**Impact:** Usage shows up in `/v1/usage` aggregate stats but isn't correlated
+to specific cascades. Not blocking — aggregate usage is the primary use case.

 ---

-### 4. Request Modification Not Implemented
+### 2. Request Modification Not Implemented

 **File:** `src/mitm/proxy.rs` — `modify_requests: bool`

-The `MitmConfig.modify_requests` flag is plumbed through the entire call chain
-but hardcoded to `false`. No modification logic exists. This is intentional
-scaffolding for future use.
-
-**Status:** Not a bug — reserved for potential request mutation features.
+The `MitmConfig.modify_requests` flag is plumbed through but hardcoded to `false`.
+Reserved for future request mutation features (e.g., injecting custom system
+prompts, modifying model selection).

 ---

-### 5. Polling-Based Cascade Updates vs Streaming RPC
+### 3. Polling-Based Cascade Updates vs Streaming RPC

 **File:** `src/api/polling.rs`

@@ -94,23 +64,26 @@ We poll `GetCascadeTrajectorySteps` on a timer. The LS has a
 `StreamCascadeReactiveUpdates` streaming gRPC method that pushes updates
 in real-time. Polling works but adds latency.

-**Status:** Functional but suboptimal. Switching to streaming requires
-implementing a gRPC streaming client with reconnection handling. Not blocking.
+**Status:** Functional but suboptimal.

 ---

 ## 🟢 Low

-### 6. No Integration Tests for MITM Module
+### 4. MITM Integration Tests

-Unit tests cover protobuf decoding and intercept parsing (17 tests pass), but
-no integration tests for:
+Unit tests cover protobuf decoding and intercept parsing (18 tests pass).
+Integration tests for the full MITM pipeline (TLS interception, response
+parsing, usage recording) would be valuable now that interception works.

- TLS interception end-to-end with the generated CA
- Full HTTP/1.1 request/response cycle through the proxy
- gRPC (HTTP/2) request/response cycle through `h2_handler`
- Store recording and retrieval under concurrency
+### 5. MITM for Main Antigravity Session

-**Status:** The MITM can't intercept real traffic anyway (blocked by issue #1),
-so integration tests would be somewhat hypothetical. Worth adding when the TLS
-blocker is resolved.
+The current MITM only works for the standalone LS (`--standalone` mode).
+Intercepting the main Antigravity session's LS is harder because:
+
+- The main LS is managed by the Antigravity app, not by us
+- UID-scoped iptables can't target it without affecting all user traffic
+- The `mitm-wrapper.sh` approach sets env vars but the LLM client ignores
+  `HTTPS_PROXY` unless `detect_and_use_proxy` is ENABLED via init metadata
+
+**Workaround:** Use `--standalone` mode for all proxy traffic.