# Sync All Endpoints + Latency + Thinking Streaming ## Phase 1: Sync Responses API (`/v1/responses`) with LS bypass Current state: - `handle_responses_stream` (line 529-859) polls LS steps for text - Doesn't use MitmStore bypass at all - Still suffers from LS multi-turn overhead when tools are active Fix: - Add MITM bypass path (same as completions) — check MitmStore for text + function calls - For function calls: emit `response.output_item.added` (function_call type) + done events - For text: stream from MitmStore `captured_response_text` + `response_complete` ## Phase 2: Sync Gemini endpoint (`/v1/gemini`) with LS bypass Current state: - `handle_gemini` (line 57-236) uses `poll_for_response` then checks MitmStore - Already checks `take_any_function_calls()` after polling - But `poll_for_response` still goes through LS steps Fix: - When tools are active, poll MitmStore directly instead of `poll_for_response` ## Phase 3: Latency improvements - Reduce poll intervals across all handlers - Add MITM store thinking_text capture for real-time streaming ## Phase 4: Real-time thinking streaming investigation Current state: - Google SSE includes `thought: true` parts with thinking text - `streaming_acc.thinking_text` accumulates this - Currently only used for final usage stats, not streamed in real-time Investigation needed: - The MITM intercept already captures thinking_text per-chunk - Need to store thinking_text updates in MitmStore incrementally - Responses handler can then stream thinking deltas in real-time