# Sync All Endpoints + Latency + Thinking Streaming

## Phase 1: Sync Responses API (`/v1/responses`) with LS bypass

Current state:

- `handle_responses_stream` (line 529-859) polls LS steps for text
- Doesn't use MitmStore bypass at all
- Still suffers from LS multi-turn overhead when tools are active

Fix:

- Add MITM bypass path (same as completions) — check MitmStore for text + function calls
- For function calls: emit `response.output_item.added` (function_call type) + done events
- For text: stream from MitmStore `captured_response_text` + `response_complete`

## Phase 2: Sync Gemini endpoint (`/v1/gemini`) with LS bypass

Current state:

- `handle_gemini` (line 57-236) uses `poll_for_response` then checks MitmStore
- Already checks `take_any_function_calls()` after polling
- But `poll_for_response` still goes through LS steps

Fix:

- When tools are active, poll MitmStore directly instead of `poll_for_response`

## Phase 3: Latency improvements

- Reduce poll intervals across all handlers
- Add MITM store thinking_text capture for real-time streaming

## Phase 4: Real-time thinking streaming investigation

Current state:

- Google SSE includes `thought: true` parts with thinking text
- `streaming_acc.thinking_text` accumulates this
- Currently only used for final usage stats, not streamed in real-time

Investigation needed:

- The MITM intercept already captures thinking_text per-chunk
- Need to store thinking_text updates in MitmStore incrementally
- Responses handler can then stream thinking deltas in real-time