- Subscribe to StreamCascadeReactiveUpdates for real-time cascade state diffs
- Fall back to timer-based polling if streaming RPC unavailable
- Remove StreamCascadePanelReactiveUpdates code (dead end, only has plan_status/user_settings)
- Remove debug diff file-saving code
- Add stream_reactive_rpc() helper to backend
Streaming poll: 800-1200ms → 150-250ms (5x faster)
Sync poll: 1000-1800ms → 200-400ms (4x faster)
Verified via STEP_DUMP instrumentation that the LS updates
plannerResponse.response incrementally during GENERATING status,
so faster polling yields smoother progressive text delivery.
Also restructured streaming to emit reasoning events first
when thinking content is detected in LS steps before response text.
Adds proper streaming SSE events for reasoning content:
- response.output_item.added (reasoning)
- response.reasoning_summary_part.added
- response.reasoning_summary_text.delta
- response.reasoning_summary_text.done
- response.reasoning_summary_part.done
- response.output_item.done (reasoning)
These are emitted before the message events, matching the format
that OpenAI-compatible clients expect for displaying thinking content.
The LS makes two Google API calls for thinking models. Call 2 (thinking
summary) may not have arrived by the time usage_from_poll runs after
Call 1 (response). Now we peek first, and if thinking tokens exist but
text is missing, wait up to 1s for the merge to happen.
Also adds peek_usage method to MitmStore for non-consuming reads.
The LS strips thinking/reasoning text from plannerResponse steps —
only the thinkingSignature (opaque verification blob) is preserved.
The actual thinking text flows through the MITM proxy in the raw
Google SSE response (parts with thought: true) and Anthropic SSE
(thinking_delta content blocks).
Changes:
- StreamingAccumulator now accumulates thinking text from SSE events
- ApiUsage gains thinking_text: Option<String>
- usage_from_poll returns (Usage, Option<thinking_text>)
- Thinking text priority: MITM-captured > LS-extracted (fallback)
- Reasoning output item now populated from real API data
- Removed debug dump code
Thinking content was previously returned as non-standard top-level
fields (thinking, thinking_duration). Now follows the official OpenAI
Responses API format:
- Reasoning appears as a 'type: reasoning' item in the output array
with summary[].text containing the thinking content
- Message item follows after the reasoning item
- thinking_signature kept as proxy extension (internal multi-turn data)
- Removed ResponseOutput/OutputContent structs in favor of
serde_json::Value for polymorphic output items
When the MITM can't extract a cascade ID from the intercepted request
(Content-Length: 0 / chunked encoding), usage is stored under '_latest'.
Now usage_from_poll and completions try the exact cascade_id first,
then fall back to '_latest' so MITM-captured tokens are actually used.