feat: MITM interception for standalone LS with UID isolation

- Spawn standalone LS as dedicated 'antigravity-ls' user via sudo
- UID-scoped iptables redirect (port 443 → MITM proxy) via mitm-redirect.sh
- Combined CA bundle (system CAs + MITM CA) for Go TLS trust
- Transparent TLS interception with chunked response detection
- Google SSE parser for streamGenerateContent usage extraction
- Timeouts on all MITM operations (TLS handshake, upstream, idle)
- Forward response data immediately (no buffering)
- Per-model token usage capture (input, output, thinking)
- Update docs and known issues to reflect resolved TLS blocker
This commit is contained in:
Nikketryhard
2026-02-14 17:50:12 -06:00
parent 6842bfeaa5
commit d4de436856
10 changed files with 1156 additions and 478 deletions

View File

@@ -1,92 +1,62 @@
# Known Issues & Future Work
All fixable issues from the original report have been resolved. The remaining
items require either architectural changes, new features, or deep investigation
of the Go language server binary.
All critical blockers have been resolved. MITM interception is fully working
in standalone mode with UID-scoped iptables redirection.
---
## 🔴 Blockers (Require Deep Investigation)
## ✅ Resolved
### 1. LS Go LLM Client Ignores System TLS Trust Store
### ~~LS Go LLM Client Ignores System TLS Trust Store~~
**File:** `docs/mitm-interception-status.md`
**Status: SOLVED (2026-02-14)**
The LS binary's Go HTTP client for LLM API calls uses a custom `tls.Config` that
does **not** trust system CAs or honor `SSL_CERT_FILE`. Our MITM proxy can route
traffic but not decrypt it.
Previously the #1 blocker. The standalone LS (`--standalone` flag) now routes
all LLM API traffic through the MITM proxy with full decryption.
**Investigation status:** All practical approaches have been tried and failed:
**Solution:**
- iptables REDIRECT → redirect loop + broke all HTTPS traffic
- DNS redirect → same TLS trust failure
- LD_PRELOAD → Go doesn't use libc for syscalls
- SSLKEYLOGFILE → Go doesn't support it
1. **UID-scoped iptables**`scripts/mitm-redirect.sh` creates an `antigravity-ls`
system user. iptables redirects only that UID's port-443 traffic → MITM port.
2. **Combined CA bundle** — The Go client honors `SSL_CERT_FILE` when set on
the standalone process. A combined bundle (system CAs + MITM CA) is written
to `/tmp/antigravity-mitm-combined-ca.pem`.
3. **`sudo -u` spawning** — The proxy spawns the LS as the `antigravity-ls` user,
so only the standalone LS traffic is intercepted. No impact on other software.
4. **Google SSE parsing** — MITM parses `streamGenerateContent?alt=sse` responses
and extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`.
**Remaining options (untried):**
- Binary patching Go TLS verification (fragile, breaks on updates)
- Full standalone LS control (see issue #2)
- eBPF/ptrace syscall interception (complex)
- Network namespace isolation (complex setup)
**Confidence: <30%** — all easy paths exhausted. Requires reverse engineering the Go binary's TLS setup.
**See:** `docs/mitm-interception-status.md` for full analysis
**Verified:** `/v1/usage` returns per-model token usage from intercepted traffic.
---
### 2. Standalone LS Cascades Silently Fail
## 🟡 Medium (Architecture / Future Work)
**File:** `docs/standalone-ls-todo.md`
Standalone LS (outside Antigravity) accepts `StartCascade` RPCs without error
but cascade never progresses. No output.
**Suspected blockers:**
- Missing auth context (OAuth token propagation)
- Different Unleash feature flags between main and standalone instances
- Missing initialization steps (`LoadCodeAssist`, `OnboardUser`)
- Missing extension server callbacks (`WriteCascadeEdit`, `ExecuteCommand`)
**Confidence: <30%** — too many unknowns. Needs systematic debugging with the standalone LS.
**See:** `docs/standalone-ls-todo.md` for investigation plan
---
## Medium (Architecture / Future Work)
### 3. Cascade Correlation Is Heuristic
### 1. Cascade Correlation Is Heuristic
**File:** `src/mitm/intercept.rs``extract_cascade_hint()`
The MITM proxy matches intercepted API traffic to cascade IDs heuristically:
The MITM proxy matches intercepted API traffic to cascade IDs heuristically.
Currently all intercepted usage is stored under `_latest` because the Google
SSE request body is empty (`content_length=0` — the LS sends the request body
via chunked encoding that isn't captured in the hint extractor).
- HTTP/1.1 path: scans JSON body for `metadata.user_id` or `workspace_id`
- gRPC/H2 path: recursively searches proto fields for UUID strings
If neither method finds a match, usage is stored under `_latest` but never
consumed (since `take_usage()` requires exact cascade ID match).
**Confidence: <50%** — can't test without working MITM interception (blocked by issue #1). The heuristic is reasonable but unverified against real traffic.
**Impact:** Usage shows up in `/v1/usage` aggregate stats but isn't correlated
to specific cascades. Not blocking — aggregate usage is the primary use case.
---
### 4. Request Modification Not Implemented
### 2. Request Modification Not Implemented
**File:** `src/mitm/proxy.rs``modify_requests: bool`
The `MitmConfig.modify_requests` flag is plumbed through the entire call chain
but hardcoded to `false`. No modification logic exists. This is intentional
scaffolding for future use.
**Status:** Not a bug — reserved for potential request mutation features.
The `MitmConfig.modify_requests` flag is plumbed through but hardcoded to `false`.
Reserved for future request mutation features (e.g., injecting custom system
prompts, modifying model selection).
---
### 5. Polling-Based Cascade Updates vs Streaming RPC
### 3. Polling-Based Cascade Updates vs Streaming RPC
**File:** `src/api/polling.rs`
@@ -94,23 +64,26 @@ We poll `GetCascadeTrajectorySteps` on a timer. The LS has a
`StreamCascadeReactiveUpdates` streaming gRPC method that pushes updates
in real-time. Polling works but adds latency.
**Status:** Functional but suboptimal. Switching to streaming requires
implementing a gRPC streaming client with reconnection handling. Not blocking.
**Status:** Functional but suboptimal.
---
## 🟢 Low
### 6. No Integration Tests for MITM Module
### 4. MITM Integration Tests
Unit tests cover protobuf decoding and intercept parsing (17 tests pass), but
no integration tests for:
Unit tests cover protobuf decoding and intercept parsing (18 tests pass).
Integration tests for the full MITM pipeline (TLS interception, response
parsing, usage recording) would be valuable now that interception works.
- TLS interception end-to-end with the generated CA
- Full HTTP/1.1 request/response cycle through the proxy
- gRPC (HTTP/2) request/response cycle through `h2_handler`
- Store recording and retrieval under concurrency
### 5. MITM for Main Antigravity Session
**Status:** The MITM can't intercept real traffic anyway (blocked by issue #1),
so integration tests would be somewhat hypothetical. Worth adding when the TLS
blocker is resolved.
The current MITM only works for the standalone LS (`--standalone` mode).
Intercepting the main Antigravity session's LS is harder because:
- The main LS is managed by the Antigravity app, not by us
- UID-scoped iptables can't target it without affecting all user traffic
- The `mitm-wrapper.sh` approach sets env vars but the LLM client ignores
`HTTPS_PROXY` unless `detect_and_use_proxy` is ENABLED via init metadata
**Workaround:** Use `--standalone` mode for all proxy traffic.