Introduces src/platform.rs with OS detection and env var overrides.
All hardcoded Linux paths replaced with Platform::detect() across
8 source files. Key changes:
- New Platform struct with 11 fields (all overridable via env vars)
- /proc/ access gated to Linux (#[cfg(target_os = "linux")])
- pgrep/pkill patterns broadened for cross-platform LS discovery
- sec-ch-ua-platform header now dynamic per OS
- Token, traces, config, CA cert paths use platform module
- LD_PRELOAD DNS redirect gated to Linux only
- Setup scripts for Linux (systemd) and macOS (launchd)
- find_ls_binary_path has cross-platform stubs
All 46 tests pass, cargo check clean.
- Delete handle_gemini handler (identical to handle_gemini_v1beta)
- Remove /v1/gemini route from router
- Update root handler service name to zerogravity
- Clean all doc references
Replace /v1/gemini with proper Gemini API paths:
- POST /v1beta/models/{model}:generateContent (sync)
- POST /v1beta/models/{model}:streamGenerateContent (streaming)
Model is extracted from URL path. Uses axum wildcard
catch-all since colons in path segments are not supported.
Gemini endpoint now accepts responseMimeType and responseSchema
fields, injected into Google's generationConfig via MITM. Supports
both snake_case and camelCase aliases.
logs command was using journalctl -f (follow) which blocks forever.
Split into three commands:
- logs [N]: show last N lines and exit (default 30)
- logs-follow [N]: tail + follow (old behavior)
- logs-all: full dump
Root cause: errors from Google were being swallowed, replaced with
placeholders like 'Google API returned HTTP 400' or '[Timeout waiting
for response]', or silently converted to fake 'incomplete' responses.
Changes across all endpoints (/v1/chat/completions, /v1/responses,
/v1/gemini, /v1/search):
Error message fidelity:
- UpstreamError message now includes Google's status prefix: [STATUS] msg
- Falls back to raw body if JSON parsing fails (protobuf, HTML, etc.)
- ErrorDetail gains optional code and param fields
Timeout handling:
- poll_for_response returns UpstreamError(504, DEADLINE_EXCEEDED) on timeout
instead of '[Timeout waiting for AI response]' placeholder text
- Streaming timeouts emit proper error events, not fake content
- Sync bypass timeouts return 504 Gateway Timeout, not 200 incomplete
Missing error checks added:
- responses.rs sync bypass: added upstream_error check in polling loop
- gemini.rs sync bypass: added upstream_error check in polling loop
- gemini.rs streaming: added upstream_error check in polling loop
(was completely missing — errors only handled in sync path)
DRY helpers:
- upstream_error_message(): shared exact message extraction
- upstream_error_type(): shared Google→OpenAI error type mapping
- All streaming handlers use these instead of inline formatting
Root cause: proxy.rs eagerly pushed tool rounds via push_tool_round_calls
when intercepting Google's functionCall response. These stale rounds leaked
into LS follow-up requests, producing malformed history that Google timed
out on (60s 'no upstream response').
Changes:
- Remove push_tool_round_calls from proxy.rs response interception
- proxy.rs: use get_tool_rounds (non-destructive) instead of take_tool_rounds
so accumulated rounds persist across multiple LS requests per cascade
- responses.rs/gemini.rs: build rounds via take+push+set pattern — each
handler accumulates its own rounds from get_last_function_calls + results
- completions.rs: unchanged (set_tool_rounds replaces from messages)
- clear_tools: also clears tool_rounds to prevent stale data between sessions
- store.rs: add get_tool_rounds (non-destructive clone) method
- proxy.rs: push_tool_round_calls alongside set_last_function_calls
when Google responds with functionCall — accumulates rounds
- responses.rs: attach_tool_round_results to pair tool results with
the correct round instead of flat add_tool_result
- gemini.rs: same attach_tool_round_results integration
- store.rs: add push_tool_round_calls and attach_tool_round_results
methods for cross-request round accumulation
- Legacy add_tool_result kept for backward compat alongside new path
- Add ToolRound struct to pair function calls with results per-round
- Replace single-match history rewrite (broke after first round) with
multi-round loop that rewrites ALL placeholder model turns
- Fix tool result name fallback: use positional index instead of always
picking the first call
- Set is_complete for any finishReason (FUNCTION_CALL, MAX_TOKENS, etc.)
not just STOP — prevents response_complete flag from never being set
- Legacy fallback: responses.rs path (single-round via last_calls +
pending_results) still works when tool_rounds is empty
- Add tests: multi-round rewrite, single-round legacy, no-op, and
FUNCTION_CALL/MAX_TOKENS finishReason handling
- store.rs: record_function_call now falls back to active_cascade_id
(matching record_usage behavior) instead of blind _latest fallback
- store.rs: add cascade-aware take_function_calls(cascade_id) method
with priority: exact match → active cascade → _latest → any key
- completions.rs: extract tool_calls from assistant messages and tool
results from tool messages, storing them for MITM injection. This was
the ROOT CAUSE — the completions handler stored tool definitions but
never extracted tool results, so modify_request couldn't rewrite the
LS conversation history with proper functionCall/functionResponse
- responses.rs: use cascade-aware take_function_calls for consistency
Without this, request_in_flight stayed true after tool call streaming,
blocking all subsequent turns until the next completions handler
happened to clear it first.
Move the in-flight blocking check to the top of the LLM request flow,
BEFORE request modification. This catches follow-ups on ALL connections
(the LS opens multiple parallel TLS connections). Only the very first
modified request reaches Google — all others get fake STOP responses.
Previously, each new connection independently allowed one request
through before blocking, letting 4-5 requests leak per turn.
- Add request_in_flight flag to MitmStore, set immediately when first
LLM request is forwarded with custom tools active
- Block ALL subsequent LS requests (agentic loop + internal flash-lite)
with fake SSE responses instead of waiting for response_complete
- Fix function call deduplication: drain() accumulator after storing
to prevent 3x duplicate tool calls across SSE chunks
- Clear all stale state (response, thinking, function calls, errors)
at the start of each streaming request
- Handle response_complete with no content (thoughtSignature-only)
gracefully with timeout instead of infinite hang
Hook getaddrinfo() via LD_PRELOAD to redirect Google API domain
resolution to 127.0.0.1, combined with a port-modified endpoint URL.
This makes the LS connect directly to the local MITM proxy for ALL
API calls - even the CodeAssistClient which has Proxy:nil hardcoded.
Architecture:
LS → DNS: googleapis.com → 127.0.0.1 (hooked via getaddrinfo)
→ Connect: 127.0.0.1:MITM_PORT (from -cloud_code_endpoint)
→ MITM proxy intercepts transparent TLS via SNI
→ Forward to real Google API
Key findings from investigation:
- Go uses raw syscalls for connect() (NOT hookable via LD_PRELOAD)
- Go uses libc getaddrinfo() for DNS (hookable via CGO path)
- dns_redirect.so is compiled from embedded C source on first run
- No iptables, no sudo, no CAP_NET_BIND_SERVICE needed
When Google returns an error (400, 429, 500, etc.), the MITM proxy now
captures it and the API handlers return it immediately instead of
hanging until timeout.
- UpstreamError struct stored in MitmStore
- MITM proxy parses Google error JSON (message + status)
- Polling handler checks for upstream errors each cycle
- Streaming handlers emit response.failed / SSE error events
- Error status mapped to OpenAI-style types (invalid_request_error,
rate_limit_error, authentication_error, server_error, etc.)
- All handlers clear stale errors at request start
When input is [{type: 'input_image', ...}, {type: 'input_text', text: '...'}],
the code was looking for items with role: 'user' which don't exist in flat
content arrays. Now extracts text from input_text items directly first,
falling back to role-based messages only if no flat text found.
Also adds debug header dump for MITM request forwarding.
The MITM modifier kept original HTTP headers (including Content-Length)
when replacing the body. When injecting a ~200KB image into a ~66KB
request, Google would only read Content-Length bytes, then hang waiting
for a new request that never comes.
Now we regex-replace the Content-Length header value to match the actual
rechunked body size after modification.
The LS silently ignores the 'images' field from our
SendUserCascadeMessageRequest proto — it never forwards image data
to Google's API.
New approach: store the image in MitmStore, then the MITM request
modifier injects it as 'inlineData' directly into the last user
message's parts array in the Google API JSON request.
Flow:
Client → Proxy (decode base64) → MitmStore.set_pending_image()
LS → Google API → MITM intercepts → inject inlineData part
→ Google receives image + text together
This works for all three API endpoints (responses, completions,
gemini).
SendUserCascadeMessageRequest proto field layout (from JS bundle analysis):
- Field 6 is 'images' (repeated ImageData) at the REQUEST level
- NOT a Blob sub-message inside ChatMessage (field 2)
ImageData proto uses base64_data (field 1) + mime_type (field 2),
not raw bytes. The LS was silently ignoring our ChatMessage blob
because the field structure didn't match.
Also protect MITM modifier from stripping messages containing
inlineData (image parts in Google API JSON).