Commit Graph

55 Commits

Author SHA1 Message Date
Nikketryhard
4f08b994c7 fix: include tool results in conversation context
When OpenCode sends follow-up messages with tool results, include
the full conversation (user message, assistant tool calls, and tool
results) in the text sent to the model. Previously only the user
message was extracted, causing the model to never see tool results
and call the same tool repeatedly in an infinite loop.

Also add tool_calls and tool_call_id fields to CompletionMessage.
2026-02-15 00:42:43 -06:00
Nikketryhard
5d4125fa0d fix: suppress dummy text from tool call responses
Check for MITM-captured function calls BEFORE emitting text in the
streaming handler. This prevents the dummy 'Tool call completed'
placeholder (sent to the LS) from leaking to OpenCode, which was
confusing it into infinite loops.

Also removes duplicate function call storage at end of response loop
since they're now stored immediately when detected.
2026-02-15 00:37:39 -06:00
Nikketryhard
502318acec fix: store function calls in MitmStore immediately on detection
Previously, captured function calls were only stored in MitmStore
after the response loop ended. The completions handler polls
take_any_function_calls() during streaming, creating a race condition
where the MitmStore was empty.

Now function calls are stored immediately when parse_streaming_chunk
detects them, in both the initial body and body chunk paths.
2026-02-15 00:28:40 -06:00
Nikketryhard
40c6379ca1 fix: strip $schema and unsupported JSON Schema fields from tool params
Google's Gemini API rejects $schema, additionalProperties, $ref,
$defs, default, examples, and title in tool parameter schemas.
OpenCode/MCP tools include these standard JSON Schema fields.
Now recursively stripped during OpenAI→Gemini tool conversion.
2026-02-15 00:18:32 -06:00
Nikketryhard
7c44729ace fix: forge dummy STOP response to LS on functionCall capture
When the MITM detects a functionCall in Google's response AND custom
tools are active, send a forged clean text response to the LS instead
of the real one. This prevents the LS from seeing function calls for
tools it doesn't manage, eliminating the retry loop entirely.

The real function call data is captured in MitmStore and returned to
the client (OpenCode) through the completions handler.

Also removes the complex chunked-encoding response rewriting approach
in favor of this simpler forge-and-break strategy.
2026-02-15 00:15:00 -06:00
Nikketryhard
19ff784cae fix: always strip old functionCall/functionResponse from LS history
The function call stripping was only happening when no custom tools
were present. But even with custom tools injected, the LS history
contains functionCall/functionResponse parts for LS-internal tools
that we stripped, causing MALFORMED_FUNCTION_CALL. Now always strip
regardless of custom tools presence.
2026-02-14 23:59:13 -06:00
Nikketryhard
3303ce38de feat: add tool call support to chat completions endpoint
- Accept tools and tool_choice fields in CompletionRequest
- Convert OpenAI tools to Gemini format and store in MitmStore
- Detect MITM-captured function calls in streaming poll loop
- Emit tool_calls delta chunks in OpenAI streaming format
- Finish with 'tool_calls' reason instead of 'stop' when tools used
- Only clear tools when request has none (prevents stale state leak)
2026-02-14 23:47:23 -06:00
Nikketryhard
19090b79f0 fix: prevent MALFORMED_FUNCTION_CALL infinite retry loop
Root cause: after stripping LS tool definitions, two things remained:
1. toolConfig with mode=VALIDATED (forces function calling even with
   empty tools array)
2. Model's training/identity context causing it to attempt function
   calls in text

Fix:
- Remove empty tools array and toolConfig when no custom tools injected
- Strip functionCall/functionResponse parts from conversation history
- Append explicit 'no tools available' instruction to system prompt
- Remove debug dump code
2026-02-14 23:31:26 -06:00
Nikketryhard
a52d1bf475 fix: strip functionCall/functionResponse from history when no tools
When LS tools are stripped from the request but the conversation history
still contains functionCall/functionResponse parts referencing those
tools, Google returns MALFORMED_FUNCTION_CALL and the LS retries in an
infinite loop, causing the request to hang forever.

Now after stripping LS tools and confirming no custom tools are injected,
we also strip all functionCall/functionResponse parts from the history
and remove any messages that become empty as a result.
2026-02-14 23:19:28 -06:00
Nikketryhard
7e16a7b892 fix: clear stale tool state in completions handler to prevent hang
Tool definitions stored in MitmStore from /v1/responses requests were
persisting and getting injected into /v1/chat/completions requests.
This caused Gemini to return functionCalls instead of text, and since
the completions handler has no function call handling logic, it would
poll forever waiting for text that never came.

Fix: clear active_tools, active_tool_config, and has_active_function_call
at the start of handle_completions. Also add clear_active_function_call()
method to MitmStore.
2026-02-14 23:10:45 -06:00
Nikketryhard
786987116b feat: full tool call support (OpenAI + Gemini endpoints)
- store.rs: Add tool context storage (active tools, tool config, pending
  tool results, call_id mapping, last function calls for history rewrite)
- types.rs: Add tools/tool_choice fields to ResponsesRequest, add
  build_function_call_output helper for OpenAI function_call output items
- modify.rs: Replace hardcoded get_weather with dynamic ToolContext
  injection. Add openai_tools_to_gemini and openai_tool_choice_to_gemini
  converters. Add conversation history rewriting for tool result turns
  (replaces fake 'Tool call completed' model turn with real functionCall,
  injects functionResponse before last user turn)
- proxy.rs: Build ToolContext from MitmStore before calling modify_request.
  Save last_function_calls for history rewriting on subsequent turns
- responses.rs: Store client tools in MitmStore before LS call. Detect
  function_call_output in input array for tool result submission. Return
  captured functionCalls as OpenAI function_call output items with
  generated call_ids and stringified arguments
- gemini.rs: New Gemini-native endpoint (POST /v1/gemini) with zero
  format translation. Accepts functionDeclarations directly, returns
  functionCall in Gemini format directly
- mod.rs: Wire /v1/gemini route, bump version to 3.3.0
2026-02-14 22:56:44 -06:00
Nikketryhard
8455aa674f feat: capture function calls from Google + block follow-up quota waste
When MITM strips LS tools and injects custom tools:
- Google returns functionCall → captured in MitmStore
- Follow-up LS requests are blocked with fake SSE response
- Proxy consumes captured calls and clears the flag
- Result: 1 real Google API call instead of 5+ per tool call

Flow: Client → Proxy → LS → MITM(inject tool) → Google
      Google returns functionCall → MITM captures it
      LS tries follow-up → MITM blocks (fake response)
      Proxy reads captured functionCall → returns to client
2026-02-14 22:37:28 -06:00
Nikketryhard
146be139a2 fix: re-enable tool stripping after testing
With tools present, LS enters full agentic mode doing multi-turn
tool calls (file searches, terminal commands, etc.). A simple
weather question caused 40+ Google API calls in 120s before timeout.
Tool stripping is required to maintain single-turn behavior.
2026-02-14 22:18:02 -06:00
Nikketryhard
3e3af85798 feat: add proxyctl daemon manager, fix standalone LS cleanup
- Add proxyctl CLI script for systemd service management
- Add systemd user service file for background operation
- Fix standalone LS kill: properly track real LS PID via pgrep
  and use sudo kill for cross-user cleanup on shutdown
- Remove deprecated scripts (dns-redirect, iptables-redirect,
  mitm-wrapper, standalone-ls, parse-snapshot)
- Disable tool stripping in MITM for tool call investigation
- Update GEMINI.md with CLI tools documentation
2026-02-14 22:14:00 -06:00
Nikketryhard
f64f007421 fix: reduce GetCascadeTrajectory log spam from debug to trace 2026-02-14 21:43:36 -06:00
Nikketryhard
940786c57f docs: update standalone LS, MITM, and panel stream investigation
- Add panel-stream-investigation.md documenting dead end
- Update KNOWN_ISSUES: move polling and panel stream to resolved
- Update GEMINI.md with standalone LS section and new MITM setup
- Fix standalone-ls-todo to reflect default mode
2026-02-14 21:40:35 -06:00
Nikketryhard
b965be3f60 feat: add reactive streaming and remove dead panel stream code
- Subscribe to StreamCascadeReactiveUpdates for real-time cascade state diffs
- Fall back to timer-based polling if streaming RPC unavailable
- Remove StreamCascadePanelReactiveUpdates code (dead end, only has plan_status/user_settings)
- Remove debug diff file-saving code
- Add stream_reactive_rpc() helper to backend
2026-02-14 21:39:04 -06:00
Nikketryhard
3d7a7f492b fix: reduce poll intervals for smoother streaming
Streaming poll: 800-1200ms → 150-250ms (5x faster)
Sync poll: 1000-1800ms → 200-400ms (4x faster)

Verified via STEP_DUMP instrumentation that the LS updates
plannerResponse.response incrementally during GENERATING status,
so faster polling yields smoother progressive text delivery.

Also restructured streaming to emit reasoning events first
when thinking content is detected in LS steps before response text.
2026-02-14 20:34:37 -06:00
Nikketryhard
b1a089d21d feat: emit streaming reasoning events per OpenAI spec
Adds proper streaming SSE events for reasoning content:
- response.output_item.added (reasoning)
- response.reasoning_summary_part.added
- response.reasoning_summary_text.delta
- response.reasoning_summary_text.done
- response.reasoning_summary_part.done
- response.output_item.done (reasoning)

These are emitted before the message events, matching the format
that OpenAI-compatible clients expect for displaying thinking content.
2026-02-14 19:57:52 -06:00
Nikketryhard
5c1f4c77d9 fix: add retry logic for MITM thinking text merge race condition
The LS makes two Google API calls for thinking models. Call 2 (thinking
summary) may not have arrived by the time usage_from_poll runs after
Call 1 (response). Now we peek first, and if thinking tokens exist but
text is missing, wait up to 1s for the merge to happen.

Also adds peek_usage method to MitmStore for non-consuming reads.
2026-02-14 19:54:37 -06:00
Nikketryhard
34b9553484 feat: capture thinking text via MITM dual-call merge
The LS makes TWO separate Google API calls for thinking models:
  Call 1: response + thinking token count (no thinking text)
  Call 2: thinking summary text (no thinking tokens)

Each hits a different StreamingAccumulator, so we:
1. Capture response_text in StreamingAccumulator (non-thinking parts)
2. In MitmStore::record_usage, detect when Call 2 arrives for a
   cascade that already has thinking tokens from Call 1
3. Merge Call 2's response_text as thinking_text on Call 1's usage

Also injects includeThoughts into Google API requests via MITM
modify to ensure thinking text is available in SSE responses.
2026-02-14 19:49:15 -06:00
Nikketryhard
905d55beb5 feat: capture thinking text from MITM-intercepted API responses
The LS strips thinking/reasoning text from plannerResponse steps —
only the thinkingSignature (opaque verification blob) is preserved.
The actual thinking text flows through the MITM proxy in the raw
Google SSE response (parts with thought: true) and Anthropic SSE
(thinking_delta content blocks).

Changes:
- StreamingAccumulator now accumulates thinking text from SSE events
- ApiUsage gains thinking_text: Option<String>
- usage_from_poll returns (Usage, Option<thinking_text>)
- Thinking text priority: MITM-captured > LS-extracted (fallback)
- Reasoning output item now populated from real API data
- Removed debug dump code
2026-02-14 19:30:09 -06:00
Nikketryhard
19dc920872 fix: return thinking as reasoning output item per OpenAI spec
Thinking content was previously returned as non-standard top-level
fields (thinking, thinking_duration). Now follows the official OpenAI
Responses API format:

- Reasoning appears as a 'type: reasoning' item in the output array
  with summary[].text containing the thinking content
- Message item follows after the reasoning item
- thinking_signature kept as proxy extension (internal multi-turn data)
- Removed ResponseOutput/OutputContent structs in favor of
  serde_json::Value for polymorphic output items
2026-02-14 19:16:12 -06:00
Nikketryhard
7c4e781900 feat: aggressive request stripping — keep only identity + conversation
Strip everything from intercepted LLM requests except:
- <identity> section in system instruction
- Actual conversation turns (user messages + model responses)

Removed: tool_calling, web_app_dev, knowledge_discovery,
persistent_context, skills, ephemeral_message, communication_style,
user_information, user_rules, MEMORY, workflows, mcp_servers,
conversation_summaries, ADDITIONAL_METADATA, Step Id prefixes.

Expected reduction: ~92% (63KB → ~5KB for simple requests).
2026-02-14 19:05:49 -06:00
Nikketryhard
1a7c81e5f9 feat: strip ALL tools from intercepted requests by default
Tools are only needed by the Antigravity webview for tool-call UI.
Our proxy doesn't need them — the model generates text responses fine
without tool definitions. Stripping all 20 tools saves ~15KB per request.
2026-02-14 18:53:38 -06:00
Nikketryhard
89a8422291 fix: suppress profile picture warn, ensure release binary rebuilds 2026-02-14 18:50:37 -06:00
Nikketryhard
e678ec655b fix: standalone MITM — remove HTTPS_PROXY with iptables, fix is_agent detection
- Only set HTTPS_PROXY/HTTP_PROXY when iptables UID isolation is NOT
  available. With iptables, double-proxying caused profile picture
  fetches to fail with 'lookup http' DNS errors.
- Fix is_agent detection: handle JSON with spaces after colons
  ("requestType": "agent" vs "requestType":"agent")
- Suppress wrapper-not-installed warning in standalone mode
- Show 'iptables (standalone)' in banner instead of 'not installed'
2026-02-14 18:47:38 -06:00
Nikketryhard
f0c2574c88 feat: MITM request modification — strip bloat from LLM API requests
Intercepts streamGenerateContent requests and trims:
- System instruction: strips web_application_development, knowledge_discovery,
  persistent_context, skills sections (~18KB saved)
- Content messages: strips empty user_rules, workflows boilerplate,
  conversation summaries (~4.5KB saved)
- Tools: keeps 12 essential coding tools, strips 8 non-essential
  (browser_subagent, generate_image, search_web, etc. ~6KB saved)

Total: ~55% reduction in request size while keeping identity, user info,
and all coding-relevant tools intact. Only modifies 'agent' type requests,
checkpoint requests pass through unmodified.

Also:
- Standalone mode is now the default (use --no-standalone to attach to
  existing LS)
- Enable request modification by default
- Add mold linker, sccache, nextest config (8 thread cap)
- Add .cargo/config.toml and .config/nextest.toml
2026-02-14 18:35:07 -06:00
Nikketryhard
061b08fc8f fix: cascade correlation — fallback to _latest MITM usage
When the MITM can't extract a cascade ID from the intercepted request
(Content-Length: 0 / chunked encoding), usage is stored under '_latest'.
Now usage_from_poll and completions try the exact cascade_id first,
then fall back to '_latest' so MITM-captured tokens are actually used.
2026-02-14 18:10:04 -06:00
Nikketryhard
ca36ab0631 chore: clean up MITM logs and add Google SSE tests
- Demote non-LLM request logs to debug (only streamGenerateContent at info)
- Demote non-streaming response headers to debug
- Add 5 Google SSE parser tests (single event, multi-event accumulation,
  chunked framing, completion detection, no-thinking-tokens)
- Fix unused variable warning in proxy.rs
2026-02-14 17:55:17 -06:00
Nikketryhard
d4de436856 feat: MITM interception for standalone LS with UID isolation
- Spawn standalone LS as dedicated 'antigravity-ls' user via sudo
- UID-scoped iptables redirect (port 443 → MITM proxy) via mitm-redirect.sh
- Combined CA bundle (system CAs + MITM CA) for Go TLS trust
- Transparent TLS interception with chunked response detection
- Google SSE parser for streamGenerateContent usage extraction
- Timeouts on all MITM operations (TLS handshake, upstream, idle)
- Forward response data immediately (no buffering)
- Per-model token usage capture (input, output, thinking)
- Update docs and known issues to reflect resolved TLS blocker
2026-02-14 17:50:12 -06:00
Nikketryhard
6842bfeaa5 chore: clean up code — remove dead code, stale allows, eprintln→tracing, remove volatile data from docs 2026-02-14 16:11:34 -06:00
Nikketryhard
2e2d90bdb9 chore: remove BYOK issue — out of scope 2026-02-14 16:07:00 -06:00
Nikketryhard
f3fd203a53 chore: rewrite KNOWN_ISSUES with investigation verdicts and confidence levels 2026-02-14 16:02:01 -06:00
Nikketryhard
05ae6b8652 chore: clean up KNOWN_ISSUES — remove fixed items, renumber 2026-02-14 15:58:52 -06:00
Nikketryhard
2f53485821 fix(#4,#5,#7): remove dead cost field, fix stale fallback paths, mark quota as implemented 2026-02-14 15:55:11 -06:00
Nikketryhard
2ccc4b46f8 fix(#4): remove dead total_cost_usd field; map model enums to readable names 2026-02-14 15:54:03 -06:00
Nikketryhard
dd7b12a97d fix(#2): cap domain cert cache at 64 entries 2026-02-14 15:49:39 -06:00
Nikketryhard
b89d26cc68 fix(#10): use robust regex for extension detectAndUseProxy patch 2026-02-14 15:49:05 -06:00
Nikketryhard
9f5d6e15cc docs: add 6 new known issues from binary analysis session 2026-02-14 15:46:10 -06:00
Nikketryhard
95cb65f1ae docs: complete tool catalog, trajectory types, and browser automation details 2026-02-14 04:22:13 -06:00
Nikketryhard
7f5a0f51d3 docs: enrich module docs with binary analysis cross-references 2026-02-14 04:20:57 -06:00
Nikketryhard
932214fd95 docs: comprehensive LS binary reverse engineering with model enum mapping 2026-02-14 04:19:48 -06:00
Nikketryhard
edad784bcd refactor: extract GrpcUsage::into_api_usage to DRY up h2_handler 2026-02-14 04:13:46 -06:00
Nikketryhard
686f5820d6 refactor: extract ResponseData struct to eliminate 18-arg build_response_object 2026-02-14 04:09:41 -06:00
Nikketryhard
901cd3d2e3 fix: resolve clippy warnings (matches!, map_or, redundant guard, unnecessary allocations) 2026-02-14 04:06:18 -06:00
Nikketryhard
725bdb4e9a chore: add snapshot CLI binary and lib re-export 2026-02-14 04:04:47 -06:00
Nikketryhard
ee6fce12a7 fix: suppress unused direction field warning in snapshot 2026-02-14 04:04:35 -06:00
Nikketryhard
de9be0d564 docs: update README with MITM setup and extension patch instructions 2026-02-14 04:03:25 -06:00
Nikketryhard
9cf7bb75d2 docs: add MITM interception research and redirect scripts 2026-02-14 04:03:22 -06:00