feat: capture thinking text via MITM dual-call merge

The LS makes TWO separate Google API calls for thinking models: Call 1: response + thinking token count (no thinking text) Call 2: thinking summary text (no thinking tokens) Each hits a different StreamingAccumulator, so we: 1. Capture response_text in StreamingAccumulator (non-thinking parts) 2. In MitmStore::record_usage, detect when Call 2 arrives for a cascade that already has thinking tokens from Call 1 3. Merge Call 2's response_text as thinking_text on Call 1's usage Also injects includeThoughts into Google API requests via MITM modify to ensure thinking text is available in SSE responses.
2026-02-14 19:49:15 -06:00
parent 905d55beb5
commit 34b9553484
4 changed files with 92 additions and 3 deletions
--- a/src/mitm/proto.rs
+++ b/src/mitm/proto.rs
@@ -80,6 +80,7 @@ impl GrpcUsage {
            output_tokens: self.output_tokens,
            thinking_output_tokens: self.thinking_output_tokens,
            thinking_text: None, // gRPC proto doesn't carry thinking text
+            response_text: None,
            response_output_tokens: self.response_output_tokens,
            cache_creation_input_tokens: self.cache_write_tokens,
            cache_read_input_tokens: self.cache_read_tokens,