feat: sync all endpoints with MITM LS bypass + real-time thinking streaming
- Responses API (streaming): MITM bypass path polls MitmStore directly when custom tools are active, skipping LS step polling entirely. Streams thinking text deltas in real-time as they arrive from the MITM. Handles function calls, text response, and thinking/reasoning events. - Responses API (sync): Same MITM bypass for non-streaming responses. Polls MitmStore for function calls or completed text before falling back to LS path. - Gemini endpoint: MITM bypass polls MitmStore directly for tool call responses, eliminating LS overhead. - MitmStore: Added captured_thinking_text field with set/peek/take methods for real-time thinking text capture from MITM SSE. - MITM proxy: Now captures both thinking_text and response_text from StreamingAccumulator into MitmStore when bypass mode is active.
This commit is contained in:
46
.gemini/plans/sync-and-latency.md
Normal file
46
.gemini/plans/sync-and-latency.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Sync All Endpoints + Latency + Thinking Streaming
|
||||
|
||||
## Phase 1: Sync Responses API (`/v1/responses`) with LS bypass
|
||||
|
||||
Current state:
|
||||
|
||||
- `handle_responses_stream` (line 529-859) polls LS steps for text
|
||||
- Doesn't use MitmStore bypass at all
|
||||
- Still suffers from LS multi-turn overhead when tools are active
|
||||
|
||||
Fix:
|
||||
|
||||
- Add MITM bypass path (same as completions) — check MitmStore for text + function calls
|
||||
- For function calls: emit `response.output_item.added` (function_call type) + done events
|
||||
- For text: stream from MitmStore `captured_response_text` + `response_complete`
|
||||
|
||||
## Phase 2: Sync Gemini endpoint (`/v1/gemini`) with LS bypass
|
||||
|
||||
Current state:
|
||||
|
||||
- `handle_gemini` (line 57-236) uses `poll_for_response` then checks MitmStore
|
||||
- Already checks `take_any_function_calls()` after polling
|
||||
- But `poll_for_response` still goes through LS steps
|
||||
|
||||
Fix:
|
||||
|
||||
- When tools are active, poll MitmStore directly instead of `poll_for_response`
|
||||
|
||||
## Phase 3: Latency improvements
|
||||
|
||||
- Reduce poll intervals across all handlers
|
||||
- Add MITM store thinking_text capture for real-time streaming
|
||||
|
||||
## Phase 4: Real-time thinking streaming investigation
|
||||
|
||||
Current state:
|
||||
|
||||
- Google SSE includes `thought: true` parts with thinking text
|
||||
- `streaming_acc.thinking_text` accumulates this
|
||||
- Currently only used for final usage stats, not streamed in real-time
|
||||
|
||||
Investigation needed:
|
||||
|
||||
- The MITM intercept already captures thinking_text per-chunk
|
||||
- Need to store thinking_text updates in MitmStore incrementally
|
||||
- Responses handler can then stream thinking deltas in real-time
|
||||
Reference in New Issue
Block a user