Root cause: errors from Google were being swallowed, replaced with
placeholders like 'Google API returned HTTP 400' or '[Timeout waiting
for response]', or silently converted to fake 'incomplete' responses.
Changes across all endpoints (/v1/chat/completions, /v1/responses,
/v1/gemini, /v1/search):
Error message fidelity:
- UpstreamError message now includes Google's status prefix: [STATUS] msg
- Falls back to raw body if JSON parsing fails (protobuf, HTML, etc.)
- ErrorDetail gains optional code and param fields
Timeout handling:
- poll_for_response returns UpstreamError(504, DEADLINE_EXCEEDED) on timeout
instead of '[Timeout waiting for AI response]' placeholder text
- Streaming timeouts emit proper error events, not fake content
- Sync bypass timeouts return 504 Gateway Timeout, not 200 incomplete
Missing error checks added:
- responses.rs sync bypass: added upstream_error check in polling loop
- gemini.rs sync bypass: added upstream_error check in polling loop
- gemini.rs streaming: added upstream_error check in polling loop
(was completely missing — errors only handled in sync path)
DRY helpers:
- upstream_error_message(): shared exact message extraction
- upstream_error_type(): shared Google→OpenAI error type mapping
- All streaming handlers use these instead of inline formatting
Move the in-flight blocking check to the top of the LLM request flow,
BEFORE request modification. This catches follow-ups on ALL connections
(the LS opens multiple parallel TLS connections). Only the very first
modified request reaches Google — all others get fake STOP responses.
Previously, each new connection independently allowed one request
through before blocking, letting 4-5 requests leak per turn.
When Google returns an error (400, 429, 500, etc.), the MITM proxy now
captures it and the API handlers return it immediately instead of
hanging until timeout.
- UpstreamError struct stored in MitmStore
- MITM proxy parses Google error JSON (message + status)
- Polling handler checks for upstream errors each cycle
- Streaming handlers emit response.failed / SSE error events
- Error status mapped to OpenAI-style types (invalid_request_error,
rate_limit_error, authentication_error, server_error, etc.)
- All handlers clear stale errors at request start