diff --git a/KNOWN_ISSUES.md b/KNOWN_ISSUES.md
index 08f59b9..24c2c64 100644
--- a/KNOWN_ISSUES.md
+++ b/KNOWN_ISSUES.md
@@ -2,6 +2,8 @@
 
 All critical blockers have been resolved. Standalone LS with MITM interception
 is fully working. Reactive streaming is implemented with polling fallback.
+All three API endpoints (Responses, Completions, Gemini) now bypass the LS
+when custom tools are active, reading directly from MitmStore.
 
 ---
 
@@ -45,43 +47,48 @@ content (see `docs/panel-stream-investigation.md`).
 thinking text. The panel reactive component uses a workspace-scoped ID, not
 cascade IDs. See `docs/panel-stream-investigation.md`.
 
----
+### ~~Request Modification Not Implemented~~
 
-## 🟡 Medium (Architecture / Future Work)
+**Status: SOLVED (2026-02-15)**
 
-### 1. Cascade Correlation Is Heuristic
+`MitmConfig.modify_requests` is now `true` by default. Used for:
 
-**File:** `src/mitm/intercept.rs` — `extract_cascade_hint()`
+- Tool/function call injection into LS requests (Gemini `functionDeclarations`)
+- Tool result injection as `functionResponse` parts
+- LS bypass when custom tools are active (response captured directly from MITM)
 
-The MITM proxy matches intercepted API traffic to cascade IDs heuristically.
-Currently all intercepted usage is stored under `_latest` because the Google
-SSE request body is empty (`content_length=0` — the LS sends the request body
-via chunked encoding that isn't captured in the hint extractor).
+### ~~Cascade Correlation Is Heuristic~~
 
-**Impact:** Usage shows up in `/v1/usage` aggregate stats but isn't correlated
-to specific cascades. Not blocking — aggregate usage is the primary use case.
+**Status: SOLVED (2026-02-15)**
 
----
+Previously, MITM usage was keyed under `_latest` because `extract_cascade_hint()`
+couldn't parse the chunked-encoded Google SSE request body.
 
-### 2. Request Modification Not Implemented
+**Fix:** API handlers now call `mitm_store.set_active_cascade(cascade_id)` before
+sending messages. `record_usage()` falls back to this active cascade ID when the
+heuristic hint is absent, properly correlating usage to cascades.
 
-**File:** `src/mitm/proxy.rs` — `modify_requests: bool`
+### ~~Progressive Thinking Streaming~~
 
-The `MitmConfig.modify_requests` flag is plumbed through but hardcoded to `false`.
-Reserved for future request mutation features (e.g., injecting custom system
-prompts, modifying model selection).
+**Status: SOLVED (2026-02-15)**
+
+The MITM proxy now captures `thinking_text` from `StreamingAccumulator` into
+`MitmStore` as SSE chunks arrive. The Responses API streaming handler reads
+thinking deltas from MitmStore and emits `response.reasoning_summary_text.delta`
+events in real-time. This works for both Google (`thought: true` parts) and
+Anthropic (`thinking_delta`) formats.
 
 ---
 
 ## 🟢 Low
 
-### 3. MITM Integration Tests
+### 1. MITM Integration Tests
 
 Unit tests cover protobuf decoding and intercept parsing (18 tests pass).
 Integration tests for the full MITM pipeline (TLS interception, response
 parsing, usage recording) would be valuable now that interception works.
 
-### 4. MITM for Main Antigravity Session
+### 2. MITM for Main Antigravity Session
 
 The current MITM only works for the standalone LS (default mode).
 Intercepting the main Antigravity session's LS is harder because:
@@ -92,10 +99,3 @@ Intercepting the main Antigravity session's LS is harder because:
   `HTTPS_PROXY` unless `detect_and_use_proxy` is ENABLED via init metadata
 
 **Workaround:** Use standalone mode (default) for all proxy traffic.
-
-### 5. Progressive Thinking Streaming
-
-For extended-thinking models (Opus), thinking text may arrive progressively
-across multiple reactive diffs. Currently thinking is captured atomically via
-polling. Progressive streaming would require parsing reactive diff field numbers
-to extract incremental thinking deltas. See `docs/panel-stream-investigation.md`.
diff --git a/src/api/completions.rs b/src/api/completions.rs
index 3d4bfa1..5669790 100644
--- a/src/api/completions.rs
+++ b/src/api/completions.rs
@@ -207,6 +207,7 @@ pub(crate) async fn handle_completions(
     };
 
     // Send message
+    state.mitm_store.set_active_cascade(&cascade_id).await;
     match state
         .backend
         .send_message(&cascade_id, &user_text, model.model_enum)
diff --git a/src/api/gemini.rs b/src/api/gemini.rs
index 48ebd3a..04de4ba 100644
--- a/src/api/gemini.rs
+++ b/src/api/gemini.rs
@@ -155,6 +155,7 @@ pub(crate) async fn handle_gemini(
     };
 
     // Send message
+    state.mitm_store.set_active_cascade(&cascade_id).await;
     match state
         .backend
         .send_message(&cascade_id, &user_text, model.model_enum)
diff --git a/src/api/responses.rs b/src/api/responses.rs
index 24b7af4..ed229d6 100644
--- a/src/api/responses.rs
+++ b/src/api/responses.rs
@@ -278,6 +278,7 @@ pub(crate) async fn handle_responses(
     };
 
     // Send message
+    state.mitm_store.set_active_cascade(&cascade_id).await;
     match state
         .backend
         .send_message(&cascade_id, &user_text, model.model_enum)
diff --git a/src/mitm/store.rs b/src/mitm/store.rs
index a4dcaa4..8acb736 100644
--- a/src/mitm/store.rs
+++ b/src/mitm/store.rs
@@ -89,6 +89,11 @@ pub struct MitmStore {
     /// Last captured function calls (for conversation history rewriting).
     last_function_calls: Arc<RwLock<Vec<CapturedFunctionCall>>>,
 
+    // ── Cascade correlation ──────────────────────────────────────────────
+    /// Active cascade ID set by the API layer before sending a message.
+    /// Used by the MITM proxy to correlate intercepted traffic to cascades.
+    active_cascade_id: Arc<RwLock<Option<String>>>,
+
     // ── Direct response capture (bypasses LS) ────────────────────────────
     /// Captured response text from MITM when custom tools are active.
     /// The completions/responses handler reads this instead of polling LS steps.
@@ -135,6 +140,7 @@ impl MitmStore {
             pending_tool_results: Arc::new(RwLock::new(Vec::new())),
             call_id_to_name: Arc::new(RwLock::new(HashMap::new())),
             last_function_calls: Arc::new(RwLock::new(Vec::new())),
+            active_cascade_id: Arc::new(RwLock::new(None)),
             captured_response_text: Arc::new(RwLock::new(None)),
             captured_thinking_text: Arc::new(RwLock::new(None)),
             response_complete: Arc::new(AtomicBool::new(false)),
@@ -186,7 +192,13 @@ impl MitmStore {
         //   Call 2: thinking summary text (thinking_output_tokens == 0, response_text has the summary)
         //
         // When Call 2 arrives, we merge its response_text as thinking_text into Call 1's usage.
-        let key = cascade_id.map(|s| s.to_string()).unwrap_or_else(|| "_latest".to_string());
+        let key = if let Some(cid) = cascade_id {
+            cid.to_string()
+        } else if let Some(active) = self.active_cascade_id.read().await.as_ref() {
+            active.clone()
+        } else {
+            "_latest".to_string()
+        };
         let mut latest = self.latest_usage.write().await;
 
         if let Some(existing) = latest.get_mut(&key) {
@@ -436,4 +448,22 @@ impl MitmStore {
     pub async fn take_thinking_text(&self) -> Option<String> {
         self.captured_thinking_text.write().await.take()
     }
+
+    // ── Cascade correlation ──────────────────────────────────────────────
+
+    /// Set the active cascade ID (called by API handlers before sending a message).
+    /// The MITM proxy will use this to correlate intercepted traffic.
+    pub async fn set_active_cascade(&self, cascade_id: &str) {
+        *self.active_cascade_id.write().await = Some(cascade_id.to_string());
+    }
+
+    /// Get the active cascade ID.
+    pub async fn get_active_cascade(&self) -> Option<String> {
+        self.active_cascade_id.read().await.clone()
+    }
+
+    /// Clear the active cascade ID (called after response is complete).
+    pub async fn clear_active_cascade(&self) {
+        *self.active_cascade_id.write().await = None;
+    }
 }