feat: capture thinking text via MITM dual-call merge
The LS makes TWO separate Google API calls for thinking models: Call 1: response + thinking token count (no thinking text) Call 2: thinking summary text (no thinking tokens) Each hits a different StreamingAccumulator, so we: 1. Capture response_text in StreamingAccumulator (non-thinking parts) 2. In MitmStore::record_usage, detect when Call 2 arrives for a cascade that already has thinking tokens from Call 1 3. Merge Call 2's response_text as thinking_text on Call 1's usage Also injects includeThoughts into Google API requests via MITM modify to ensure thinking text is available in SSE responses.
This commit is contained in:
@@ -80,6 +80,7 @@ impl GrpcUsage {
|
||||
output_tokens: self.output_tokens,
|
||||
thinking_output_tokens: self.thinking_output_tokens,
|
||||
thinking_text: None, // gRPC proto doesn't carry thinking text
|
||||
response_text: None,
|
||||
response_output_tokens: self.response_output_tokens,
|
||||
cache_creation_input_tokens: self.cache_write_tokens,
|
||||
cache_read_input_tokens: self.cache_read_tokens,
|
||||
|
||||
Reference in New Issue
Block a user