feat: MITM interception for standalone LS with UID isolation

- Spawn standalone LS as dedicated 'antigravity-ls' user via sudo
- UID-scoped iptables redirect (port 443 → MITM proxy) via mitm-redirect.sh
- Combined CA bundle (system CAs + MITM CA) for Go TLS trust
- Transparent TLS interception with chunked response detection
- Google SSE parser for streamGenerateContent usage extraction
- Timeouts on all MITM operations (TLS handshake, upstream, idle)
- Forward response data immediately (no buffering)
- Per-model token usage capture (input, output, thinking)
- Update docs and known issues to reflect resolved TLS blocker
This commit is contained in:
Nikketryhard
2026-02-14 17:50:12 -06:00
parent 6842bfeaa5
commit d4de436856
10 changed files with 1156 additions and 478 deletions

View File

@@ -1,275 +1,144 @@
# MITM Traffic Interception — Research & Status
# MITM Traffic Interception — Status
## Goal
## Status: ✅ FULLY WORKING (Standalone Mode)
Capture the LS's LLM API traffic (requests + responses, including system prompts
and token usage) by routing it through our MITM proxy.
MITM interception is operational for the standalone LS. The proxy intercepts,
decrypts, and parses all LLM API traffic with per-model token usage capture.
## Key Discovery: How the LS Makes LLM API Calls
## How It Works
The LS does **NOT** use gRPC for LLM API calls. It uses:
- **Protocol**: Standard HTTPS POST with Server-Sent Events (SSE)
- **Endpoint**: `https://daily-cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse`
- **HTTP client**: `ApiServerClientV2` — a Go HTTP client that creates its own `tls.Config`
and transport, **ignoring `HTTPS_PROXY` by default**
The Go HTTP client for LLM API calls is separate from the one used for Unleash
(feature flags) and other auxiliary traffic. The Unleash client respects proxy
settings, but the LLM client does not.
## What We Tried
### 1. Extension Patch — `detectAndUseProxy` ✅ Partial
**Status**: Applied and still active. Harmless.
The extension sends a protobuf field `detect_and_use_proxy` (field 34) to the LS
during initialization. By default, it's set to `UNSPECIFIED` (0), meaning the LS
ignores proxy env vars.
**Patch applied:**
```bash
sudo sed -i -E 's/detectAndUseProxy=[^,;)]+/detectAndUseProxy=1/g' \
/usr/share/antigravity/resources/app/extensions/antigravity/dist/extension.js
```
Client → Proxy (8741) → Standalone LS (as antigravity-ls user)
↓ (port 443 traffic)
iptables REDIRECT (UID-scoped)
MITM Proxy (8742)
↓ (TLS decrypt + parse SSE)
Google API (daily-cloudcode-pa.googleapis.com)
```
**Enum values:**
### Components
- 0 = `DETECT_AND_USE_PROXY_UNSPECIFIED` (default, ignore proxy)
- 1 = `DETECT_AND_USE_PROXY_ENABLED`
- 2 = `DETECT_AND_USE_PROXY_DISABLED`
1. **UID-scoped iptables** (`scripts/mitm-redirect.sh`)
- Creates `antigravity-ls` system user
- iptables rule: redirect UID's port-443 → MITM port
- Only the standalone LS is affected — no side effects on other software
**Result:** Unleash/aux traffic now routes through `HTTPS_PROXY`. But the LLM API
client (`ApiServerClientV2`) has its own transport that ignores this flag. LLM
calls still go direct to Google.
2. **Combined CA bundle** (`src/standalone.rs`)
- Go's `SSL_CERT_FILE` replaces (not appends) the system trust store
- Proxy concatenates system CAs + MITM CA → `/tmp/antigravity-mitm-combined-ca.pem`
- Set as `SSL_CERT_FILE` on the standalone LS process
**Verify:** `grep -o 'detectAndUseProxy=[^;]*' /usr/share/antigravity/resources/app/extensions/antigravity/dist/extension.js`
→ should show `detectAndUseProxy=1`
3. **`sudo -u` spawning** (`src/standalone.rs`)
- If `antigravity-ls` user exists, LS is spawned via `sudo -n -u antigravity-ls`
- Env vars passed via `/usr/bin/env KEY=VALUE` args
- Falls back to current user if the dedicated user doesn't exist
**Re-apply after updates:** Yes, must re-apply after every Antigravity update.
4. **Google SSE parser** (`src/mitm/intercept.rs`)
- Parses `data: {"response": {"usageMetadata": {...}}}` events
- Extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`
- Handles both Google and Anthropic SSE formats
### 2. MITM Wrapper (`mitm-wrapper.sh`) ✅ Works for Env Vars
5. **Transparent proxy** (`src/mitm/proxy.rs`)
- Detects iptables-redirected connections via TLS ClientHello SNI
- Terminates TLS with dynamically generated certs
- Forwards HTTP/1.1 requests upstream with real DNS resolution (`dig @8.8.8.8`)
- Chunked response detection for fast completion
Sets `HTTPS_PROXY` and `SSL_CERT_FILE` on the LS process by wrapping the binary.
## What We Tried (Historical)
**How it works:**
### 1. Extension Patch — `detectAndUseProxy` ✅ Still Active
1. Renames real binary to `.real`
2. Places a shell script wrapper at the original path
3. Wrapper sets env vars and execs the real binary with all original args
Patches `detectAndUseProxy=1` in the extension JS. Makes auxiliary traffic
(Unleash, etc.) honor `HTTPS_PROXY`. Harmless, still applied.
**Result:** The wrapper correctly sets env vars on the LS process (verified via
`/proc/<PID>/environ`). Combined with the extension patch, Unleash traffic routes
through the proxy. But LLM API calls still bypass — the `ApiServerClientV2` Go
HTTP client doesn't honor `HTTPS_PROXY`.
### 2. MITM Wrapper (`mitm-wrapper.sh`) ⚠️ Superseded
### 3. iptables REDIRECT — ALL Port 443 ❌ Failed
Sets env vars on the main LS process. Works for routing but the main LS's
LLM client ignores `HTTPS_PROXY`. Superseded by standalone mode.
Redirected all outbound port 443 traffic from the user's UID to the MITM proxy.
### 3. iptables REDIRECT (All Traffic) ❌ Abandoned
**Problems encountered:**
Redirected ALL port-443 traffic. Caused redirect loops, broke other HTTPS
traffic. Replaced by UID-scoped redirect.
1. **Redirect loop** — proxy's own upstream connections got caught by iptables,
creating infinite loops → fd exhaustion → crash
2. **Fixed loop with GID bypass** — running proxy with `sg mitm-bypass` and
excluding GID in iptables. This fixed the loop.
3. **Broke Antigravity** — ALL HTTPS traffic (telegram, discord, microsoft
telemetry, extension marketplace, etc.) went through the proxy. The TLS
passthrough worked technically but was too disruptive.
4. **TLS trust failure** — even with the MITM wrapper setting `SSL_CERT_FILE`,
the LS's Go LLM client likely uses a custom `tls.Config` with its own root
CAs, not the system pool. So it rejected our MITM CA cert.
### 4. DNS Redirect (`/etc/hosts`) ❌ Abandoned
**Abandoned.** Too disruptive, and the fundamental TLS trust issue remained.
Same TLS trust issue as #3. Unnecessary with UID-scoped iptables.
### 4. DNS Redirect (`/etc/hosts`) ❌ Failed
### 5. Standalone LS + UID-scoped iptables ✅ WORKING
Redirected only `daily-cloudcode-pa.googleapis.com` to 127.0.0.1 via `/etc/hosts`,
then used a targeted iptables rule for `127.0.0.1:443` only.
Current solution. Full MITM interception with zero side effects.
**Problems:**
## The Original Blocker (SOLVED)
- Same TLS trust issue — the Go LLM client rejected our MITM CA
- Needed `dig @8.8.8.8` bypass for upstream resolution (implemented but untested)
> The LS's Go LLM HTTP client uses a custom `tls.Config` that does NOT read
> from `SSL_CERT_FILE` or the system CA store.
**Abandoned.** TLS trust is the blocker.
**This turned out to be wrong.** The Go client DOES honor `SSL_CERT_FILE` when:
## The Core Blocker
- The env var is set BEFORE the process starts (not injected later)
- The value contains a combined bundle (system CAs + custom CA)
- `SSL_CERT_DIR` is set to `/dev/null` to force exclusive use of `SSL_CERT_FILE`
**The LS's Go LLM HTTP client (`ApiServerClientV2`) uses a custom `tls.Config`
that does NOT read from `SSL_CERT_FILE` or the system CA store.** It likely has
its own hardcoded/embedded root CAs.
This means:
- Even if we redirect traffic to our MITM proxy ✅
- Even if the MITM generates valid certs for the domain ✅
- The LS rejects the cert because it doesn't trust our CA ❌
## Potential Solutions (Untried)
### A. Binary Patching
Patch the Go binary to accept our CA or disable cert verification.
- Find the `tls.Config` setup in the binary
- Modify `InsecureSkipVerify` to `true`, or inject our CA cert DER bytes
- Very fragile, breaks on updates
### B. LD_PRELOAD Hook
Hook `connect()` syscall to redirect traffic.
- **Won't work** for Go — Go uses raw syscalls, not libc wrappers
### C. Network Namespace
Run the LS in an isolated network namespace with custom routing.
- Complex setup, but clean isolation
- The standalone LS work would feed into this
### D. Standalone LS with Full Control
Get standalone LS cascades working (see `docs/standalone-ls-todo.md`), then
have full control over the process environment, including:
- Custom CA trust
- Custom DNS resolution
- Custom proxy settings
- Network namespace isolation
**This is probably the best long-term approach.**
### E. Kernel-level TLS Interception (eBPF)
Use eBPF to intercept TLS records pre-encryption.
- Very powerful, can read plaintext before encryption
- Complex, requires kernel support (>= 4.18)
- Tools: `bpftrace`, custom eBPF programs, `ecapture`
### F. `SSLKEYLOGFILE` + Passive Capture
- Go doesn't support `SSLKEYLOGFILE` (confirmed by testing)
- Could patch the binary to enable it, but same fragility as option A
### G. ptrace-based Interception
Use `ptrace` to intercept `write()`/`sendmsg()` syscalls on TLS sockets.
- Can read plaintext data being written to TLS connections
- Tools: `strace -e trace=write -p <PID>` (but output is messy)
- Better: custom ptrace tool that filters for TLS socket FDs
The standalone LS gives us full control over the process environment at spawn
time, which is why this approach works while the wrapper approach didn't.
## Technical Details
### Model IDs
| Placeholder | Model |
| ------------------------- | ------------------- |
| `MODEL_PLACEHOLDER_M18` | Gemini 3 Flash |
| `MODEL_PLACEHOLDER_M8` | Gemini 3 Pro (High) |
| `MODEL_PLACEHOLDER_M7` | Gemini 3 Pro (Low) |
| `MODEL_PLACEHOLDER_M26` | Claude Opus 4.6 |
| `MODEL_PLACEHOLDER_M12` | Claude Opus 4.5 |
| `MODEL_CLAUDE_4_5_SONNET` | Claude Sonnet 4.5 |
### LS Binary Location
`/usr/share/antigravity/resources/app/extensions/antigravity/bin/language_server_linux_x64`
### API Endpoint
`https://daily-cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse`
`POST https://daily-cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse`
### Protobuf Field 34 — `detect_and_use_proxy`
### SSE Response Format
- Part of the init metadata sent from extension to LS via stdin
- Enum: `DetectAndUseProxy` (0=UNSPECIFIED, 1=ENABLED, 2=DISABLED)
- Controls whether auxiliary HTTP clients honor `HTTPS_PROXY`
- Does NOT control the LLM API client
### Unleash Feature Flags
- Authorization: `*:production.e44558998bfc35ea9584dc65858e4485fdaa5d7ef46903e0c67712d1`
- Endpoint: `antigravity-unleash.goog`
- App name: `codeium-language-server`
### Files Modified (Current State)
- `extension.js``detectAndUseProxy=1` (harmless, keeps working)
- Everything else — clean/reverted
## Code Changes Made (in the proxy)
1. **Transparent proxy mode** (`src/mitm/proxy.rs`) — supports iptables REDIRECT
by detecting raw TLS ClientHello and extracting SNI
2. **CryptoProvider init** (`src/main.rs`) — prevents rustls panic under load
3. **PID detection fix** (`src/backend.rs`) — prefers `.real` binary PID over
wrapper shell script PID
4. **SS fallback** (`src/backend.rs`) — discovers LS port via `ss` when log file
doesn't have it
5. **DNS bypass** (`src/mitm/proxy.rs`) — `connect_upstream` resolves via
`dig @8.8.8.8` to bypass `/etc/hosts`
6. **Scripts**`dns-redirect.sh`, `iptables-redirect.sh` (both functional)
## Cleanup Checklist
If things are broken, undo in this order:
```bash
# 1. Remove iptables rules
sudo ./scripts/iptables-redirect.sh uninstall
sudo ./scripts/dns-redirect.sh uninstall
# 2. Remove /etc/hosts entries (verify manually)
sudo grep -v "antigravity-mitm" /etc/hosts | sudo tee /etc/hosts.tmp && sudo mv /etc/hosts.tmp /etc/hosts
# 3. Uninstall wrapper
sudo ./scripts/mitm-wrapper.sh uninstall
# 4. Remove system CA
sudo rm -f /usr/local/share/ca-certificates/antigravity-mitm.crt
sudo update-ca-certificates
# 5. Restart Antigravity
```
data: {"response": {"candidates": [{"content": {"role": "model", "parts": [{"text": "..."}]}}],
"usageMetadata": {"promptTokenCount": 1514, "candidatesTokenCount": 25,
"totalTokenCount": 1539, "thoughtsTokenCount": 52},
"modelVersion": "gemini-3-flash"}, "traceId": "...", "metadata": {}}
```
## Next Steps
Last event includes `"finishReason": "STOP"` in the candidate.
→ See `docs/standalone-ls-todo.md` for standalone LS isolation work
→ See `docs/ls-binary-analysis.md` for comprehensive binary reverse engineering
### Other Intercepted Endpoints
## New Findings (from binary analysis)
| Endpoint | Type | Content |
| --------------------------- | -------- | ---------------- |
| `fetchUserInfo` | Protobuf | User info |
| `loadCodeAssist` | Protobuf | Extension config |
| `fetchAvailableModels` | Protobuf | Model catalog |
| `webDocsOptions` | Protobuf | Docs config |
| `streamGenerateContent` | SSE/JSON | LLM responses ✅ |
| `recordCodeAssistMetrics` | Protobuf | Telemetry |
| `recordTrajectoryAnalytics` | Protobuf | Telemetry |
### Alternative to Polling: `StreamCascadeReactiveUpdates`
### Model IDs
The LS has a streaming gRPC method `StreamCascadeReactiveUpdates` that pushes
cascade state changes in real-time via server-sent streaming. The extension uses
this instead of polling `GetCascadeTrajectorySteps`.
| Placeholder | Model |
| ----------------------- | ------------------- |
| `MODEL_PLACEHOLDER_M18` | Gemini 3 Flash |
| `MODEL_PLACEHOLDER_M8` | Gemini 3 Pro (High) |
| `MODEL_PLACEHOLDER_M7` | Gemini 3 Pro (Low) |
| `MODEL_PLACEHOLDER_M26` | Claude Opus 4.6 |
| `MODEL_PLACEHOLDER_M12` | Claude Opus 4.5 |
**Potential improvement:** If we switch from polling to this streaming RPC, we'd
get lower latency and less backend traffic. However, our current polling approach
works reliably and doesn't require maintaining a long-lived gRPC stream.
### Setup
### Quota Endpoint: `retrieveUserQuota`
```bash
# One-time setup (creates user + iptables rule)
sudo ./scripts/mitm-redirect.sh install
The `PredictionService/RetrieveUserQuota` gRPC method and
`v1internal:retrieveUserQuota` REST endpoint provide quota/credit information.
This could be used to implement a proper `/v1/quota` endpoint instead of
scraping the LS's own quota tracking.
# Run proxy with standalone LS + MITM
RUST_LOG=info ./target/release/antigravity-proxy --standalone
### `internalAtomicAgenticChat`
# Check usage
curl -s http://localhost:8741/v1/usage | jq .
```
A REST endpoint that appears to handle the entire agentic chat loop atomically
(tool calls + responses in one request?). Investigation needed to understand
the request/response format.
### Cleanup
### Credits System
The `google/internal/cloud/code/v1internal/credits` proto package exists with
`Credits_CreditType` enum. The `CASCADE_ENFORCE_QUOTA` config key controls
whether quotas are enforced. Related methods: `AddExtraFlexCreditsInternal`,
`GetTeamCreditEntries`, `GetPlanStatus`.
```bash
# Remove iptables rule + user
sudo ./scripts/mitm-redirect.sh uninstall
```

View File

@@ -1,87 +1,78 @@
# Standalone LS for Proxy Isolation
## Goal
## Status: ✅ FULLY IMPLEMENTED (incl. MITM interception)
Route ALL proxy traffic through a standalone LS instance instead of the real one,
so development/testing/proxying never interferes with active coding sessions.
The standalone LS is fully working via `--standalone` flag on the proxy.
All cascade types (sync, streaming, multi-turn) and all endpoints work.
MITM interception captures real token usage from Google's API.
## Current State
## Implementation
The proxy currently talks to the **real** LS spawned by Antigravity.
This is risky — a bad cascade or proxy bug can disrupt the coding conversation.
**Module:** `src/standalone.rs`
## What Works
The proxy spawns a standalone LS as a child process:
- Standalone LS starts fine with custom init metadata via stdin protobuf
- Connects to the main extension server (`-extension_server_port`)
- Accepts cascade requests (returns cascadeId)
- With `detect_and_use_proxy = ENABLED` (field 34 = 2), honors `HTTPS_PROXY`
1. Discovers `extension_server_port` and `csrf_token` from the real LS (via `/proc/PID/cmdline`)
2. Picks a random free port
3. Builds init metadata protobuf (via `proto::build_init_metadata()`)
4. Spawns the LS binary with correct args and env vars
5. Feeds init metadata via stdin, then closes it
6. Waits for TCP readiness (retry loop)
7. Kills the child on proxy shutdown (via `Drop`)
## What Doesn't Work
### UID Isolation (MITM mode)
- **Cascades silently fail** — the LS accepts the request but never processes it
- No planner invocation, no upstream API call, no logs beyond startup
- 9 lines of log after 40s wait
- Main LS logs show zero trace of the standalone's cascade
When `scripts/mitm-redirect.sh install` has been run:
## Suspected Blockers (investigate in order)
1. The `antigravity-ls` system user exists
2. iptables redirects that UID's port-443 traffic → MITM proxy port
3. The proxy spawns the LS via `sudo -n -u antigravity-ls`
4. Environment variables (`SSL_CERT_FILE`, etc.) are passed via `/usr/bin/env`
5. A combined CA bundle (system CAs + MITM CA) is written to `/tmp/antigravity-mitm-combined-ca.pem`
6. Only the standalone LS traffic is intercepted — no impact on other software
1. **Auth context** — standalone may not receive OAuth token from extension server
- Check: does the standalone's `GetUserStatus` return valid auth?
- The extension server might only share tokens with the "primary" LS
2. **Unleash feature flags** — cascade processing gated by flags the standalone doesn't fetch
- The standalone connects to Unleash via the proxy, but might not get the right flags
- Check: compare Unleash responses between main and standalone
3. **Workspace indexing** — planner might require indexed workspace state
- The standalone's workspace (`/tmp/antigravity-standalone`) is empty
- Try: point it at a real workspace with actual files
4. **Extension server coupling** — cascade might need the extension to "drive" it
- The chat panel in the extension might send additional RPCs to progress the cascade
- Check: trace what RPCs the extension sends after StartCascade
## Investigation Plan
## Usage
```bash
# 1. Launch with max verbosity
echo "$METADATA" | base64 -d | \
timeout 90 "$LS_BIN" \
-v 5 \
-server_port 42200 \
... > /tmp/standalone-verbose.log 2>&1 &
# Setup (one-time, requires sudo)
sudo ./scripts/mitm-redirect.sh install
# 2. Check auth status
curl -sk "https://127.0.0.1:42200/exa.language_server_pb.LanguageServerService/GetUserStatus" \
-H "Content-Type: application/json" \
-H "x-codeium-csrf-token: $CSRF" \
-d '{}'
# Run
RUST_LOG=info ./target/release/antigravity-proxy --standalone
# 3. Send cascade and watch logs in real-time
tail -f /tmp/standalone-verbose.log &
curl -sk "https://127.0.0.1:42200/.../StartCascade" ...
# 4. Compare Unleash flags
# Main LS unleash vs standalone unleash
# Check intercepted usage
curl -s http://localhost:8741/v1/usage | jq .
```
## Root Cause of Original Failure
The bash script (`scripts/standalone-ls.sh`) used `MODEL_PLACEHOLDER_M3` — an
unassigned/invalid model enum. The LS silently drops cascades with unknown models.
**Fix:** Use correct model enums (M18=Flash, M26=Opus4.6) via the proxy's
byte-exact protobuf encoder.
## Key Technical Details
- Init metadata protobuf field 34 = `detect_and_use_proxy` (enum: 0=UNSPECIFIED, 1=ENABLED, 2=DISABLED)
- Init metadata protobuf field 34 = `detect_and_use_proxy` (1=ENABLED)
- Model IDs: M18=Flash, M8=Pro-High, M7=Pro-Low, M26=Opus4.6, M12=Opus4.5
- LS binary: `/usr/share/antigravity/resources/app/extensions/antigravity/bin/language_server_linux_x64`
- API endpoint: `daily-cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse`
- SSE response format: `{"response": {"usageMetadata": {"promptTokenCount", "candidatesTokenCount", "thoughtsTokenCount"}, "modelVersion": "..."}}`
## New Leads (from binary analysis)
## Test Results (2026-02-14)
- **`GetUnleashData`** — LS method to fetch Unleash flags directly. Could compare
main vs standalone to check if flags differ.
- **`GetStaticExperimentStatus`** / `SetBaseExperiments` / `UpdateDevExperiments`
experiment management. Standalone might be missing experiment overrides.
- **`FetchAdminControls`** — admin-level controls that might gate cascade execution.
- **`LoadCodeAssist`** — initialization step that might be required before cascades work.
- **`GetUserStatus` vs `GetUserMemories`** — check if standalone has auth context
by calling both.
→ See `docs/ls-binary-analysis.md` for full RPC method catalog.
| Endpoint | Result |
| --------------------------------- | ------------------------- |
| `GET /health` | ✅ |
| `GET /v1/models` | ✅ 5 models |
| `GET /v1/sessions` | ✅ |
| `GET /v1/quota` | ✅ real plan/credits |
| `GET /v1/usage` | ✅ real MITM tokens |
| `POST /v1/responses` (sync) | ✅ |
| `POST /v1/responses` (stream) | ✅ SSE events |
| `POST /v1/responses` (multi-turn) | ✅ context preserved |
| `POST /v1/chat/completions` | ✅ |
| MITM interception | ✅ TLS decrypt + parse |
| MITM usage capture | ✅ per-model token counts |
| UID isolation | ✅ no side effects |