feat: MITM interception for standalone LS with UID isolation

- Spawn standalone LS as dedicated 'antigravity-ls' user via sudo
- UID-scoped iptables redirect (port 443 → MITM proxy) via mitm-redirect.sh
- Combined CA bundle (system CAs + MITM CA) for Go TLS trust
- Transparent TLS interception with chunked response detection
- Google SSE parser for streamGenerateContent usage extraction
- Timeouts on all MITM operations (TLS handshake, upstream, idle)
- Forward response data immediately (no buffering)
- Per-model token usage capture (input, output, thinking)
- Update docs and known issues to reflect resolved TLS blocker
This commit is contained in:
Nikketryhard
2026-02-14 17:50:12 -06:00
parent 6842bfeaa5
commit d4de436856
10 changed files with 1156 additions and 478 deletions

View File

@@ -1,87 +1,78 @@
# Standalone LS for Proxy Isolation
## Goal
## Status: ✅ FULLY IMPLEMENTED (incl. MITM interception)
Route ALL proxy traffic through a standalone LS instance instead of the real one,
so development/testing/proxying never interferes with active coding sessions.
The standalone LS is fully working via `--standalone` flag on the proxy.
All cascade types (sync, streaming, multi-turn) and all endpoints work.
MITM interception captures real token usage from Google's API.
## Current State
## Implementation
The proxy currently talks to the **real** LS spawned by Antigravity.
This is risky — a bad cascade or proxy bug can disrupt the coding conversation.
**Module:** `src/standalone.rs`
## What Works
The proxy spawns a standalone LS as a child process:
- Standalone LS starts fine with custom init metadata via stdin protobuf
- Connects to the main extension server (`-extension_server_port`)
- Accepts cascade requests (returns cascadeId)
- With `detect_and_use_proxy = ENABLED` (field 34 = 2), honors `HTTPS_PROXY`
1. Discovers `extension_server_port` and `csrf_token` from the real LS (via `/proc/PID/cmdline`)
2. Picks a random free port
3. Builds init metadata protobuf (via `proto::build_init_metadata()`)
4. Spawns the LS binary with correct args and env vars
5. Feeds init metadata via stdin, then closes it
6. Waits for TCP readiness (retry loop)
7. Kills the child on proxy shutdown (via `Drop`)
## What Doesn't Work
### UID Isolation (MITM mode)
- **Cascades silently fail** — the LS accepts the request but never processes it
- No planner invocation, no upstream API call, no logs beyond startup
- 9 lines of log after 40s wait
- Main LS logs show zero trace of the standalone's cascade
When `scripts/mitm-redirect.sh install` has been run:
## Suspected Blockers (investigate in order)
1. The `antigravity-ls` system user exists
2. iptables redirects that UID's port-443 traffic → MITM proxy port
3. The proxy spawns the LS via `sudo -n -u antigravity-ls`
4. Environment variables (`SSL_CERT_FILE`, etc.) are passed via `/usr/bin/env`
5. A combined CA bundle (system CAs + MITM CA) is written to `/tmp/antigravity-mitm-combined-ca.pem`
6. Only the standalone LS traffic is intercepted — no impact on other software
1. **Auth context** — standalone may not receive OAuth token from extension server
- Check: does the standalone's `GetUserStatus` return valid auth?
- The extension server might only share tokens with the "primary" LS
2. **Unleash feature flags** — cascade processing gated by flags the standalone doesn't fetch
- The standalone connects to Unleash via the proxy, but might not get the right flags
- Check: compare Unleash responses between main and standalone
3. **Workspace indexing** — planner might require indexed workspace state
- The standalone's workspace (`/tmp/antigravity-standalone`) is empty
- Try: point it at a real workspace with actual files
4. **Extension server coupling** — cascade might need the extension to "drive" it
- The chat panel in the extension might send additional RPCs to progress the cascade
- Check: trace what RPCs the extension sends after StartCascade
## Investigation Plan
## Usage
```bash
# 1. Launch with max verbosity
echo "$METADATA" | base64 -d | \
timeout 90 "$LS_BIN" \
-v 5 \
-server_port 42200 \
... > /tmp/standalone-verbose.log 2>&1 &
# Setup (one-time, requires sudo)
sudo ./scripts/mitm-redirect.sh install
# 2. Check auth status
curl -sk "https://127.0.0.1:42200/exa.language_server_pb.LanguageServerService/GetUserStatus" \
-H "Content-Type: application/json" \
-H "x-codeium-csrf-token: $CSRF" \
-d '{}'
# Run
RUST_LOG=info ./target/release/antigravity-proxy --standalone
# 3. Send cascade and watch logs in real-time
tail -f /tmp/standalone-verbose.log &
curl -sk "https://127.0.0.1:42200/.../StartCascade" ...
# 4. Compare Unleash flags
# Main LS unleash vs standalone unleash
# Check intercepted usage
curl -s http://localhost:8741/v1/usage | jq .
```
## Root Cause of Original Failure
The bash script (`scripts/standalone-ls.sh`) used `MODEL_PLACEHOLDER_M3` — an
unassigned/invalid model enum. The LS silently drops cascades with unknown models.
**Fix:** Use correct model enums (M18=Flash, M26=Opus4.6) via the proxy's
byte-exact protobuf encoder.
## Key Technical Details
- Init metadata protobuf field 34 = `detect_and_use_proxy` (enum: 0=UNSPECIFIED, 1=ENABLED, 2=DISABLED)
- Init metadata protobuf field 34 = `detect_and_use_proxy` (1=ENABLED)
- Model IDs: M18=Flash, M8=Pro-High, M7=Pro-Low, M26=Opus4.6, M12=Opus4.5
- LS binary: `/usr/share/antigravity/resources/app/extensions/antigravity/bin/language_server_linux_x64`
- API endpoint: `daily-cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse`
- SSE response format: `{"response": {"usageMetadata": {"promptTokenCount", "candidatesTokenCount", "thoughtsTokenCount"}, "modelVersion": "..."}}`
## New Leads (from binary analysis)
## Test Results (2026-02-14)
- **`GetUnleashData`** — LS method to fetch Unleash flags directly. Could compare
main vs standalone to check if flags differ.
- **`GetStaticExperimentStatus`** / `SetBaseExperiments` / `UpdateDevExperiments`
experiment management. Standalone might be missing experiment overrides.
- **`FetchAdminControls`** — admin-level controls that might gate cascade execution.
- **`LoadCodeAssist`** — initialization step that might be required before cascades work.
- **`GetUserStatus` vs `GetUserMemories`** — check if standalone has auth context
by calling both.
→ See `docs/ls-binary-analysis.md` for full RPC method catalog.
| Endpoint | Result |
| --------------------------------- | ------------------------- |
| `GET /health` | ✅ |
| `GET /v1/models` | ✅ 5 models |
| `GET /v1/sessions` | ✅ |
| `GET /v1/quota` | ✅ real plan/credits |
| `GET /v1/usage` | ✅ real MITM tokens |
| `POST /v1/responses` (sync) | ✅ |
| `POST /v1/responses` (stream) | ✅ SSE events |
| `POST /v1/responses` (multi-turn) | ✅ context preserved |
| `POST /v1/chat/completions` | ✅ |
| MITM interception | ✅ TLS decrypt + parse |
| MITM usage capture | ✅ per-model token counts |
| UID isolation | ✅ no side effects |