Files
zerogravity/.gemini/plans/sync-and-latency.md
Nikketryhard b3af73cebd feat: sync all endpoints with MITM LS bypass + real-time thinking streaming
- Responses API (streaming): MITM bypass path polls MitmStore directly
  when custom tools are active, skipping LS step polling entirely.
  Streams thinking text deltas in real-time as they arrive from the MITM.
  Handles function calls, text response, and thinking/reasoning events.

- Responses API (sync): Same MITM bypass for non-streaming responses.
  Polls MitmStore for function calls or completed text before falling
  back to LS path.

- Gemini endpoint: MITM bypass polls MitmStore directly for tool call
  responses, eliminating LS overhead.

- MitmStore: Added captured_thinking_text field with set/peek/take methods
  for real-time thinking text capture from MITM SSE.

- MITM proxy: Now captures both thinking_text and response_text from
  StreamingAccumulator into MitmStore when bypass mode is active.
2026-02-15 01:03:39 -06:00

1.5 KiB

Sync All Endpoints + Latency + Thinking Streaming

Phase 1: Sync Responses API (/v1/responses) with LS bypass

Current state:

  • handle_responses_stream (line 529-859) polls LS steps for text
  • Doesn't use MitmStore bypass at all
  • Still suffers from LS multi-turn overhead when tools are active

Fix:

  • Add MITM bypass path (same as completions) — check MitmStore for text + function calls
  • For function calls: emit response.output_item.added (function_call type) + done events
  • For text: stream from MitmStore captured_response_text + response_complete

Phase 2: Sync Gemini endpoint (/v1/gemini) with LS bypass

Current state:

  • handle_gemini (line 57-236) uses poll_for_response then checks MitmStore
  • Already checks take_any_function_calls() after polling
  • But poll_for_response still goes through LS steps

Fix:

  • When tools are active, poll MitmStore directly instead of poll_for_response

Phase 3: Latency improvements

  • Reduce poll intervals across all handlers
  • Add MITM store thinking_text capture for real-time streaming

Phase 4: Real-time thinking streaming investigation

Current state:

  • Google SSE includes thought: true parts with thinking text
  • streaming_acc.thinking_text accumulates this
  • Currently only used for final usage stats, not streamed in real-time

Investigation needed:

  • The MITM intercept already captures thinking_text per-chunk
  • Need to store thinking_text updates in MitmStore incrementally
  • Responses handler can then stream thinking deltas in real-time