chore: rewrite KNOWN_ISSUES with investigation verdicts and confidence levels

This commit is contained in:
Nikketryhard
2026-02-14 16:02:01 -06:00
parent 05ae6b8652
commit f3fd203a53

View File

@@ -1,97 +1,129 @@
# Known Issues & Future Work
---
## Medium
### 1. Cascade Correlation Is Heuristic
**File:** `src/mitm/intercept.rs``extract_cascade_hint()`
The MITM proxy matches intercepted API traffic to cascade IDs by scanning for `metadata.user_id` or `workspace_id` in the request body. If neither is found, it stores under `_latest`. Since `take_usage()` no longer falls back to `_latest`, unidentified requests will have **no MITM usage data at all**.
**Fix:** Investigate the actual request body format the LS sends for better correlation keys. Alternatively, use timing-based correlation (match MITM capture timestamp to cascade polling window).
All fixable issues from the original report have been resolved. The remaining
items require either architectural changes, new features, or deep investigation
of the Go language server binary.
---
### 2. Request Modification Not Implemented
## 🔴 Blockers (Require Deep Investigation)
**File:** `src/mitm/proxy.rs``modify_requests: false`
The `MitmConfig.modify_requests` flag exists and is plumbed through, but no actual modification logic is implemented. The flag is hardcoded to `false`.
**Fix:** When needed, implement request body mutation in `handle_http_over_tls()` — parse JSON, modify, reserialize, update `Content-Length`.
---
### 3. Polling-Based Cascade Updates vs Streaming RPC
**File:** `src/api/polling.rs`
We poll `GetCascadeTrajectorySteps` on a timer to check for new cascade output. The LS has a `StreamCascadeReactiveUpdates` streaming gRPC method that pushes updates in real-time. Our polling approach works but adds latency and unnecessary requests.
**Impact:** Functional but suboptimal. The streaming approach would give lower latency and less LS load, but requires maintaining a long-lived gRPC stream and handling reconnection.
**See:** `docs/ls-binary-analysis.md` → gRPC Services → LanguageServerService
---
### 4. No BYOK Model Routing
**File:** `src/api/models.rs`
The LS supports BYOK (Bring Your Own Key) variants for Claude and OpenAI models (e.g., `MODEL_CLAUDE_4_SONNET_BYOK`, `MODEL_OPENAI_COMPATIBLE`). Our proxy only exposes the 5 built-in placeholder models. Users with BYOK keys can't use them through the proxy.
**Fix:** Add a mechanism to register BYOK models at runtime (e.g., via a config file or API endpoint). The BYOK model IDs and their proto enum numbers are documented in `docs/ls-binary-analysis.md`.
---
## 🟢 Low
### 5. No Integration Tests for MITM Module
The MITM module has unit tests for protobuf decoding and intercept parsing, but no integration tests that verify:
- TLS interception end-to-end with the generated CA
- Full HTTP/1.1 request/response cycle through the proxy
- gRPC (HTTP/2) request/response cycle through `h2_handler`
- Store recording and retrieval under concurrency
- Wrapper script install/uninstall lifecycle
---
## Blockers
### 6. LS Go LLM Client Ignores System TLS Trust Store
### 1. LS Go LLM Client Ignores System TLS Trust Store
**File:** `docs/mitm-interception-status.md`
The LS binary is a Go program whose HTTP client for LLM API calls uses a custom `tls.Config` that does **not** trust system CAs or honor `SSL_CERT_FILE`. This means our MITM proxy's generated CA cert is rejected even when properly installed system-wide.
The LS binary's Go HTTP client for LLM API calls uses a custom `tls.Config` that
does **not** trust system CAs or honor `SSL_CERT_FILE`. Our MITM proxy can route
traffic but not decrypt it.
The extension patch (`detectAndUseProxy=1`) only makes the LS honor `HTTPS_PROXY` for routing — it doesn't fix CA trust. Without this, the MITM proxy can route but not decrypt LLM traffic.
**Investigation status:** All practical approaches have been tried and failed:
**Potential fixes:**
- iptables REDIRECT → redirect loop + broke all HTTPS traffic
- DNS redirect → same TLS trust failure
- LD_PRELOAD → Go doesn't use libc for syscalls
- SSLKEYLOGFILE → Go doesn't support it
- Binary patching the Go TLS verification (hard, breaks on updates)
- Full standalone LS control (in progress, see issue #7)
- Network namespace + iptables redirect (eliminates HTTPS_PROXY need but doesn't fix TLS trust)
- eBPF/ptrace to inject certs at runtime (complex)
**Remaining options (untried):**
- Binary patching Go TLS verification (fragile, breaks on updates)
- Full standalone LS control (see issue #2)
- eBPF/ptrace syscall interception (complex)
- Network namespace isolation (complex setup)
**Confidence: <30%** — all easy paths exhausted. Requires reverse engineering the Go binary's TLS setup.
**See:** `docs/mitm-interception-status.md` for full analysis
---
### 7. Standalone LS Cascades Silently Fail
### 2. Standalone LS Cascades Silently Fail
**File:** `docs/standalone-ls-todo.md`
When running a standalone LS instance (outside of Antigravity), cascades start but produce no output. The LS accepts `StartCascade` RPCs without error, but the cascade never progresses.
Standalone LS (outside Antigravity) accepts `StartCascade` RPCs without error
but cascade never progresses. No output.
**Suspected blockers:**
- Missing auth context (OAuth token not properly propagated)
- Unleash feature flags differ between main and standalone instances (`GetUnleashData` returns different flags)
- `LoadCodeAssist` / `OnboardUser` initialization steps may be required
- Extension server callbacks (`WriteCascadeEdit`, `ExecuteCommand`, etc.) have no handler
- Missing auth context (OAuth token propagation)
- Different Unleash feature flags between main and standalone instances
- Missing initialization steps (`LoadCodeAssist`, `OnboardUser`)
- Missing extension server callbacks (`WriteCascadeEdit`, `ExecuteCommand`)
**Confidence: <30%** — too many unknowns. Needs systematic debugging with the standalone LS.
**See:** `docs/standalone-ls-todo.md` for investigation plan
---
## Medium (Architecture / Future Work)
### 3. Cascade Correlation Is Heuristic
**File:** `src/mitm/intercept.rs``extract_cascade_hint()`
The MITM proxy matches intercepted API traffic to cascade IDs heuristically:
- HTTP/1.1 path: scans JSON body for `metadata.user_id` or `workspace_id`
- gRPC/H2 path: recursively searches proto fields for UUID strings
If neither method finds a match, usage is stored under `_latest` but never
consumed (since `take_usage()` requires exact cascade ID match).
**Confidence: <50%** — can't test without working MITM interception (blocked by issue #1). The heuristic is reasonable but unverified against real traffic.
---
### 4. Request Modification Not Implemented
**File:** `src/mitm/proxy.rs``modify_requests: bool`
The `MitmConfig.modify_requests` flag is plumbed through the entire call chain
but hardcoded to `false`. No modification logic exists. This is intentional
scaffolding for future use.
**Status:** Not a bug — reserved for potential request mutation features.
---
### 5. Polling-Based Cascade Updates vs Streaming RPC
**File:** `src/api/polling.rs`
We poll `GetCascadeTrajectorySteps` on a timer. The LS has a
`StreamCascadeReactiveUpdates` streaming gRPC method that pushes updates
in real-time. Polling works but adds latency.
**Status:** Functional but suboptimal. Switching to streaming requires
implementing a gRPC streaming client with reconnection handling. Not blocking.
---
### 6. No BYOK Model Routing
**File:** `src/api/models.rs`
The LS supports BYOK (Bring Your Own Key) models (e.g., `MODEL_CLAUDE_4_SONNET_BYOK`,
`MODEL_OPENAI_COMPATIBLE`). Our proxy only exposes the 5 built-in placeholder
models.
**Status:** Feature request. Would need a runtime model registration mechanism.
Proto enum numbers are documented in `docs/ls-binary-analysis.md`.
---
## 🟢 Low
### 7. No Integration Tests for MITM Module
Unit tests cover protobuf decoding and intercept parsing (17 tests pass), but
no integration tests for:
- TLS interception end-to-end with the generated CA
- Full HTTP/1.1 request/response cycle through the proxy
- gRPC (HTTP/2) request/response cycle through `h2_handler`
- Store recording and retrieval under concurrency
**Status:** The MITM can't intercept real traffic anyway (blocked by issue #1),
so integration tests would be somewhat hypothetical. Worth adding when the TLS
blocker is resolved.