feat: capture thinking text via MITM dual-call merge

The LS makes TWO separate Google API calls for thinking models:
  Call 1: response + thinking token count (no thinking text)
  Call 2: thinking summary text (no thinking tokens)

Each hits a different StreamingAccumulator, so we:
1. Capture response_text in StreamingAccumulator (non-thinking parts)
2. In MitmStore::record_usage, detect when Call 2 arrives for a
   cascade that already has thinking tokens from Call 1
3. Merge Call 2's response_text as thinking_text on Call 1's usage

Also injects includeThoughts into Google API requests via MITM
modify to ensure thinking text is available in SSE responses.
This commit is contained in:
Nikketryhard
2026-02-14 19:49:15 -06:00
parent 905d55beb5
commit 34b9553484
4 changed files with 92 additions and 3 deletions

View File

@@ -80,6 +80,7 @@ impl GrpcUsage {
output_tokens: self.output_tokens,
thinking_output_tokens: self.thinking_output_tokens,
thinking_text: None, // gRPC proto doesn't carry thinking text
response_text: None,
response_output_tokens: self.response_output_tokens,
cache_creation_input_tokens: self.cache_write_tokens,
cache_read_input_tokens: self.cache_read_tokens,