When OpenCode sends follow-up messages with tool results, include
the full conversation (user message, assistant tool calls, and tool
results) in the text sent to the model. Previously only the user
message was extracted, causing the model to never see tool results
and call the same tool repeatedly in an infinite loop.
Also add tool_calls and tool_call_id fields to CompletionMessage.
Check for MITM-captured function calls BEFORE emitting text in the
streaming handler. This prevents the dummy 'Tool call completed'
placeholder (sent to the LS) from leaking to OpenCode, which was
confusing it into infinite loops.
Also removes duplicate function call storage at end of response loop
since they're now stored immediately when detected.
Previously, captured function calls were only stored in MitmStore
after the response loop ended. The completions handler polls
take_any_function_calls() during streaming, creating a race condition
where the MitmStore was empty.
Now function calls are stored immediately when parse_streaming_chunk
detects them, in both the initial body and body chunk paths.
Google's Gemini API rejects $schema, additionalProperties, $ref,
$defs, default, examples, and title in tool parameter schemas.
OpenCode/MCP tools include these standard JSON Schema fields.
Now recursively stripped during OpenAI→Gemini tool conversion.
When the MITM detects a functionCall in Google's response AND custom
tools are active, send a forged clean text response to the LS instead
of the real one. This prevents the LS from seeing function calls for
tools it doesn't manage, eliminating the retry loop entirely.
The real function call data is captured in MitmStore and returned to
the client (OpenCode) through the completions handler.
Also removes the complex chunked-encoding response rewriting approach
in favor of this simpler forge-and-break strategy.
The function call stripping was only happening when no custom tools
were present. But even with custom tools injected, the LS history
contains functionCall/functionResponse parts for LS-internal tools
that we stripped, causing MALFORMED_FUNCTION_CALL. Now always strip
regardless of custom tools presence.
- Accept tools and tool_choice fields in CompletionRequest
- Convert OpenAI tools to Gemini format and store in MitmStore
- Detect MITM-captured function calls in streaming poll loop
- Emit tool_calls delta chunks in OpenAI streaming format
- Finish with 'tool_calls' reason instead of 'stop' when tools used
- Only clear tools when request has none (prevents stale state leak)
Root cause: after stripping LS tool definitions, two things remained:
1. toolConfig with mode=VALIDATED (forces function calling even with
empty tools array)
2. Model's training/identity context causing it to attempt function
calls in text
Fix:
- Remove empty tools array and toolConfig when no custom tools injected
- Strip functionCall/functionResponse parts from conversation history
- Append explicit 'no tools available' instruction to system prompt
- Remove debug dump code
When LS tools are stripped from the request but the conversation history
still contains functionCall/functionResponse parts referencing those
tools, Google returns MALFORMED_FUNCTION_CALL and the LS retries in an
infinite loop, causing the request to hang forever.
Now after stripping LS tools and confirming no custom tools are injected,
we also strip all functionCall/functionResponse parts from the history
and remove any messages that become empty as a result.
Tool definitions stored in MitmStore from /v1/responses requests were
persisting and getting injected into /v1/chat/completions requests.
This caused Gemini to return functionCalls instead of text, and since
the completions handler has no function call handling logic, it would
poll forever waiting for text that never came.
Fix: clear active_tools, active_tool_config, and has_active_function_call
at the start of handle_completions. Also add clear_active_function_call()
method to MitmStore.
- store.rs: Add tool context storage (active tools, tool config, pending
tool results, call_id mapping, last function calls for history rewrite)
- types.rs: Add tools/tool_choice fields to ResponsesRequest, add
build_function_call_output helper for OpenAI function_call output items
- modify.rs: Replace hardcoded get_weather with dynamic ToolContext
injection. Add openai_tools_to_gemini and openai_tool_choice_to_gemini
converters. Add conversation history rewriting for tool result turns
(replaces fake 'Tool call completed' model turn with real functionCall,
injects functionResponse before last user turn)
- proxy.rs: Build ToolContext from MitmStore before calling modify_request.
Save last_function_calls for history rewriting on subsequent turns
- responses.rs: Store client tools in MitmStore before LS call. Detect
function_call_output in input array for tool result submission. Return
captured functionCalls as OpenAI function_call output items with
generated call_ids and stringified arguments
- gemini.rs: New Gemini-native endpoint (POST /v1/gemini) with zero
format translation. Accepts functionDeclarations directly, returns
functionCall in Gemini format directly
- mod.rs: Wire /v1/gemini route, bump version to 3.3.0
When MITM strips LS tools and injects custom tools:
- Google returns functionCall → captured in MitmStore
- Follow-up LS requests are blocked with fake SSE response
- Proxy consumes captured calls and clears the flag
- Result: 1 real Google API call instead of 5+ per tool call
Flow: Client → Proxy → LS → MITM(inject tool) → Google
Google returns functionCall → MITM captures it
LS tries follow-up → MITM blocks (fake response)
Proxy reads captured functionCall → returns to client
With tools present, LS enters full agentic mode doing multi-turn
tool calls (file searches, terminal commands, etc.). A simple
weather question caused 40+ Google API calls in 120s before timeout.
Tool stripping is required to maintain single-turn behavior.
- Add proxyctl CLI script for systemd service management
- Add systemd user service file for background operation
- Fix standalone LS kill: properly track real LS PID via pgrep
and use sudo kill for cross-user cleanup on shutdown
- Remove deprecated scripts (dns-redirect, iptables-redirect,
mitm-wrapper, standalone-ls, parse-snapshot)
- Disable tool stripping in MITM for tool call investigation
- Update GEMINI.md with CLI tools documentation
- Add panel-stream-investigation.md documenting dead end
- Update KNOWN_ISSUES: move polling and panel stream to resolved
- Update GEMINI.md with standalone LS section and new MITM setup
- Fix standalone-ls-todo to reflect default mode
- Subscribe to StreamCascadeReactiveUpdates for real-time cascade state diffs
- Fall back to timer-based polling if streaming RPC unavailable
- Remove StreamCascadePanelReactiveUpdates code (dead end, only has plan_status/user_settings)
- Remove debug diff file-saving code
- Add stream_reactive_rpc() helper to backend
Streaming poll: 800-1200ms → 150-250ms (5x faster)
Sync poll: 1000-1800ms → 200-400ms (4x faster)
Verified via STEP_DUMP instrumentation that the LS updates
plannerResponse.response incrementally during GENERATING status,
so faster polling yields smoother progressive text delivery.
Also restructured streaming to emit reasoning events first
when thinking content is detected in LS steps before response text.
Adds proper streaming SSE events for reasoning content:
- response.output_item.added (reasoning)
- response.reasoning_summary_part.added
- response.reasoning_summary_text.delta
- response.reasoning_summary_text.done
- response.reasoning_summary_part.done
- response.output_item.done (reasoning)
These are emitted before the message events, matching the format
that OpenAI-compatible clients expect for displaying thinking content.
The LS makes two Google API calls for thinking models. Call 2 (thinking
summary) may not have arrived by the time usage_from_poll runs after
Call 1 (response). Now we peek first, and if thinking tokens exist but
text is missing, wait up to 1s for the merge to happen.
Also adds peek_usage method to MitmStore for non-consuming reads.
The LS makes TWO separate Google API calls for thinking models:
Call 1: response + thinking token count (no thinking text)
Call 2: thinking summary text (no thinking tokens)
Each hits a different StreamingAccumulator, so we:
1. Capture response_text in StreamingAccumulator (non-thinking parts)
2. In MitmStore::record_usage, detect when Call 2 arrives for a
cascade that already has thinking tokens from Call 1
3. Merge Call 2's response_text as thinking_text on Call 1's usage
Also injects includeThoughts into Google API requests via MITM
modify to ensure thinking text is available in SSE responses.
The LS strips thinking/reasoning text from plannerResponse steps —
only the thinkingSignature (opaque verification blob) is preserved.
The actual thinking text flows through the MITM proxy in the raw
Google SSE response (parts with thought: true) and Anthropic SSE
(thinking_delta content blocks).
Changes:
- StreamingAccumulator now accumulates thinking text from SSE events
- ApiUsage gains thinking_text: Option<String>
- usage_from_poll returns (Usage, Option<thinking_text>)
- Thinking text priority: MITM-captured > LS-extracted (fallback)
- Reasoning output item now populated from real API data
- Removed debug dump code
Thinking content was previously returned as non-standard top-level
fields (thinking, thinking_duration). Now follows the official OpenAI
Responses API format:
- Reasoning appears as a 'type: reasoning' item in the output array
with summary[].text containing the thinking content
- Message item follows after the reasoning item
- thinking_signature kept as proxy extension (internal multi-turn data)
- Removed ResponseOutput/OutputContent structs in favor of
serde_json::Value for polymorphic output items
Tools are only needed by the Antigravity webview for tool-call UI.
Our proxy doesn't need them — the model generates text responses fine
without tool definitions. Stripping all 20 tools saves ~15KB per request.
- Only set HTTPS_PROXY/HTTP_PROXY when iptables UID isolation is NOT
available. With iptables, double-proxying caused profile picture
fetches to fail with 'lookup http' DNS errors.
- Fix is_agent detection: handle JSON with spaces after colons
("requestType": "agent" vs "requestType":"agent")
- Suppress wrapper-not-installed warning in standalone mode
- Show 'iptables (standalone)' in banner instead of 'not installed'
When the MITM can't extract a cascade ID from the intercepted request
(Content-Length: 0 / chunked encoding), usage is stored under '_latest'.
Now usage_from_poll and completions try the exact cascade_id first,
then fall back to '_latest' so MITM-captured tokens are actually used.
- Spawn standalone LS as dedicated 'antigravity-ls' user via sudo
- UID-scoped iptables redirect (port 443 → MITM proxy) via mitm-redirect.sh
- Combined CA bundle (system CAs + MITM CA) for Go TLS trust
- Transparent TLS interception with chunked response detection
- Google SSE parser for streamGenerateContent usage extraction
- Timeouts on all MITM operations (TLS handshake, upstream, idle)
- Forward response data immediately (no buffering)
- Per-model token usage capture (input, output, thinking)
- Update docs and known issues to reflect resolved TLS blocker