feat: forward generation params via MITM + add usageMetadata to Gemini
- Add GenerationParams struct to MitmStore for temperature, top_p, top_k, max_output_tokens, stop_sequences, frequency/presence_penalty - MITM modify_request injects params into request.generationConfig - All 3 endpoints (Completions, Responses, Gemini) store client params - Add usageMetadata to Gemini sync responses (promptTokenCount, candidatesTokenCount, totalTokenCount, thoughtsTokenCount) - Add generation param fields to GeminiRequest (temperature, topP, etc.) - Completions stream_options.include_usage emits final usage chunk - Completions reasoning_tokens in completion_tokens_details - Update endpoint gap analysis doc (all high-priority gaps resolved)
This commit is contained in:
@@ -72,11 +72,27 @@ heuristic hint is absent, properly correlating usage to cascades.
|
||||
|
||||
**Status: SOLVED (2026-02-15)**
|
||||
|
||||
The MITM proxy now captures `thinking_text` from `StreamingAccumulator` into
|
||||
`MitmStore` as SSE chunks arrive. The Responses API streaming handler reads
|
||||
thinking deltas from MitmStore and emits `response.reasoning_summary_text.delta`
|
||||
events in real-time. This works for both Google (`thought: true` parts) and
|
||||
Anthropic (`thinking_delta`) formats.
|
||||
Thinking text now streams progressively as delta events. The implementation:
|
||||
|
||||
1. **LS cascade steps** — `plannerResponse.thinking` (field 3) grows progressively
|
||||
as the LS receives data. For Opus 4.6, thinking text builds up word-by-word
|
||||
over ~1-2s. For Gemini Flash, thinking arrives in 1-2 larger chunks.
|
||||
2. **Delta tracking** — `last_thinking_len` tracks the previously emitted length.
|
||||
Each poll compares current thinking length and emits only the new characters
|
||||
as `response.reasoning_summary_text.delta` events.
|
||||
3. **Lifecycle** — Structure events (`output_item.added`, `summary_part.added`)
|
||||
emit on first thinking appearance. `done` events emit when response text
|
||||
first appears (indicating thinking phase completed).
|
||||
|
||||
**Verified with Opus 4.6:** (2026-02-15 13:22 UTC)
|
||||
|
||||
```
|
||||
delta_len=24 "The user is asking about"
|
||||
delta_len=61 " the Collatz conjecture..."
|
||||
delta_len=5 " This"
|
||||
delta_len=10 " is a pure"
|
||||
... (11 progressive deltas over ~850ms)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user