fix: resolve cascade correlation, update KNOWN_ISSUES

- MitmStore: added active_cascade_id field with set/get/clear methods
- record_usage() now falls back to active_cascade_id when the heuristic
  cascade hint is absent (fixes usage always going to _latest)
- All three API handlers set active cascade before send_message
- KNOWN_ISSUES: moved 3 issues to resolved:
  - Request modification (already true, was stale entry)
  - Cascade correlation (fixed via active_cascade_id)
  - Progressive thinking streaming (fixed via MITM bypass)
This commit is contained in:
Nikketryhard
2026-02-15 01:10:34 -06:00
parent b3af73cebd
commit 981fb3b18d
5 changed files with 59 additions and 26 deletions

View File

@@ -2,6 +2,8 @@
All critical blockers have been resolved. Standalone LS with MITM interception
is fully working. Reactive streaming is implemented with polling fallback.
All three API endpoints (Responses, Completions, Gemini) now bypass the LS
when custom tools are active, reading directly from MitmStore.
---
@@ -45,43 +47,48 @@ content (see `docs/panel-stream-investigation.md`).
thinking text. The panel reactive component uses a workspace-scoped ID, not
cascade IDs. See `docs/panel-stream-investigation.md`.
---
### ~~Request Modification Not Implemented~~
## 🟡 Medium (Architecture / Future Work)
**Status: SOLVED (2026-02-15)**
### 1. Cascade Correlation Is Heuristic
`MitmConfig.modify_requests` is now `true` by default. Used for:
**File:** `src/mitm/intercept.rs``extract_cascade_hint()`
- Tool/function call injection into LS requests (Gemini `functionDeclarations`)
- Tool result injection as `functionResponse` parts
- LS bypass when custom tools are active (response captured directly from MITM)
The MITM proxy matches intercepted API traffic to cascade IDs heuristically.
Currently all intercepted usage is stored under `_latest` because the Google
SSE request body is empty (`content_length=0` — the LS sends the request body
via chunked encoding that isn't captured in the hint extractor).
### ~~Cascade Correlation Is Heuristic~~
**Impact:** Usage shows up in `/v1/usage` aggregate stats but isn't correlated
to specific cascades. Not blocking — aggregate usage is the primary use case.
**Status: SOLVED (2026-02-15)**
---
Previously, MITM usage was keyed under `_latest` because `extract_cascade_hint()`
couldn't parse the chunked-encoded Google SSE request body.
### 2. Request Modification Not Implemented
**Fix:** API handlers now call `mitm_store.set_active_cascade(cascade_id)` before
sending messages. `record_usage()` falls back to this active cascade ID when the
heuristic hint is absent, properly correlating usage to cascades.
**File:** `src/mitm/proxy.rs``modify_requests: bool`
### ~~Progressive Thinking Streaming~~
The `MitmConfig.modify_requests` flag is plumbed through but hardcoded to `false`.
Reserved for future request mutation features (e.g., injecting custom system
prompts, modifying model selection).
**Status: SOLVED (2026-02-15)**
The MITM proxy now captures `thinking_text` from `StreamingAccumulator` into
`MitmStore` as SSE chunks arrive. The Responses API streaming handler reads
thinking deltas from MitmStore and emits `response.reasoning_summary_text.delta`
events in real-time. This works for both Google (`thought: true` parts) and
Anthropic (`thinking_delta`) formats.
---
## 🟢 Low
### 3. MITM Integration Tests
### 1. MITM Integration Tests
Unit tests cover protobuf decoding and intercept parsing (18 tests pass).
Integration tests for the full MITM pipeline (TLS interception, response
parsing, usage recording) would be valuable now that interception works.
### 4. MITM for Main Antigravity Session
### 2. MITM for Main Antigravity Session
The current MITM only works for the standalone LS (default mode).
Intercepting the main Antigravity session's LS is harder because:
@@ -92,10 +99,3 @@ Intercepting the main Antigravity session's LS is harder because:
`HTTPS_PROXY` unless `detect_and_use_proxy` is ENABLED via init metadata
**Workaround:** Use standalone mode (default) for all proxy traffic.
### 5. Progressive Thinking Streaming
For extended-thinking models (Opus), thinking text may arrive progressively
across multiple reactive diffs. Currently thinking is captured atomically via
polling. Progressive streaming would require parsing reactive diff field numbers
to extract incremental thinking deltas. See `docs/panel-stream-investigation.md`.