Commit Graph

25 Commits

Author SHA1 Message Date
Nikketryhard
38b4130c55 feat: Implement request generation counter and state management to prevent stale data and unblock Language Server for follow-up requests. 2026-02-16 16:21:52 -06:00
Nikketryhard
3fdd0368a0 fix: block ALL LS follow-up requests across connections
Move the in-flight blocking check to the top of the LLM request flow,
BEFORE request modification. This catches follow-ups on ALL connections
(the LS opens multiple parallel TLS connections). Only the very first
modified request reaches Google — all others get fake STOP responses.

Previously, each new connection independently allowed one request
through before blocking, letting 4-5 requests leak per turn.
2026-02-16 00:57:33 -06:00
Nikketryhard
a8f3c8915f fix: block ALL LS follow-up requests, deduplicate function calls
- Add request_in_flight flag to MitmStore, set immediately when first
  LLM request is forwarded with custom tools active
- Block ALL subsequent LS requests (agentic loop + internal flash-lite)
  with fake SSE responses instead of waiting for response_complete
- Fix function call deduplication: drain() accumulator after storing
  to prevent 3x duplicate tool calls across SSE chunks
- Clear all stale state (response, thinking, function calls, errors)
  at the start of each streaming request
- Handle response_complete with no content (thoughtSignature-only)
  gracefully with timeout instead of infinite hang
2026-02-16 00:51:56 -06:00
Nikketryhard
4e4d8e9474 chore: code cleanup and documentation overhaul
- Remove debug header dump from MITM proxy (was temp debugging code)
- Suppress dead_code warnings for intentional OpenAI compat fields
- Rewrite README with styled mermaid architecture diagrams, full
  feature listing, usage examples, and CLI reference
- Update endpoint-gap-analysis: images implemented, audio only stretch
- Update mitm-interception-status: add request modification and error
  capture components
- Update standalone-ls-todo: add new endpoints to test results
- Zero compiler warnings
2026-02-15 18:27:53 -06:00
Nikketryhard
2882f7cce2 feat: propagate Google upstream errors to client
When Google returns an error (400, 429, 500, etc.), the MITM proxy now
captures it and the API handlers return it immediately instead of
hanging until timeout.

- UpstreamError struct stored in MitmStore
- MITM proxy parses Google error JSON (message + status)
- Polling handler checks for upstream errors each cycle
- Streaming handlers emit response.failed / SSE error events
- Error status mapped to OpenAI-style types (invalid_request_error,
  rate_limit_error, authentication_error, server_error, etc.)
- All handlers clear stale errors at request start
2026-02-15 18:19:38 -06:00
Nikketryhard
371c57bab0 fix: parse flat content arrays in Responses API input
When input is [{type: 'input_image', ...}, {type: 'input_text', text: '...'}],
the code was looking for items with role: 'user' which don't exist in flat
content arrays. Now extracts text from input_text items directly first,
falling back to role-based messages only if no flat text found.

Also adds debug header dump for MITM request forwarding.
2026-02-15 18:10:03 -06:00
Nikketryhard
1a6bfa5b53 fix: update Content-Length header when MITM modifies request body
The MITM modifier kept original HTTP headers (including Content-Length)
when replacing the body. When injecting a ~200KB image into a ~66KB
request, Google would only read Content-Length bytes, then hang waiting
for a new request that never comes.

Now we regex-replace the Content-Length header value to match the actual
rechunked body size after modification.
2026-02-15 18:02:13 -06:00
Nikketryhard
89bea030cc feat: inject images via MITM layer instead of relying on LS
The LS silently ignores the 'images' field from our
SendUserCascadeMessageRequest proto — it never forwards image data
to Google's API.

New approach: store the image in MitmStore, then the MITM request
modifier injects it as 'inlineData' directly into the last user
message's parts array in the Google API JSON request.

Flow:
  Client → Proxy (decode base64) → MitmStore.set_pending_image()
  LS → Google API → MITM intercepts → inject inlineData part
  → Google receives image + text together

This works for all three API endpoints (responses, completions,
gemini).
2026-02-15 17:57:32 -06:00
Nikketryhard
ca9f808ee3 feat: completions API improvements, gemini endpoint, response types 2026-02-15 17:08:53 -06:00
Nikketryhard
b1bd57ab5e feat: forward generation params via MITM + add usageMetadata to Gemini
- Add GenerationParams struct to MitmStore for temperature, top_p,
  top_k, max_output_tokens, stop_sequences, frequency/presence_penalty
- MITM modify_request injects params into request.generationConfig
- All 3 endpoints (Completions, Responses, Gemini) store client params
- Add usageMetadata to Gemini sync responses (promptTokenCount,
  candidatesTokenCount, totalTokenCount, thoughtsTokenCount)
- Add generation param fields to GeminiRequest (temperature, topP, etc.)
- Completions stream_options.include_usage emits final usage chunk
- Completions reasoning_tokens in completion_tokens_details
- Update endpoint gap analysis doc (all high-priority gaps resolved)
2026-02-15 14:23:05 -06:00
Nikketryhard
b3af73cebd feat: sync all endpoints with MITM LS bypass + real-time thinking streaming
- Responses API (streaming): MITM bypass path polls MitmStore directly
  when custom tools are active, skipping LS step polling entirely.
  Streams thinking text deltas in real-time as they arrive from the MITM.
  Handles function calls, text response, and thinking/reasoning events.

- Responses API (sync): Same MITM bypass for non-streaming responses.
  Polls MitmStore for function calls or completed text before falling
  back to LS path.

- Gemini endpoint: MITM bypass polls MitmStore directly for tool call
  responses, eliminating LS overhead.

- MitmStore: Added captured_thinking_text field with set/peek/take methods
  for real-time thinking text capture from MITM SSE.

- MITM proxy: Now captures both thinking_text and response_text from
  StreamingAccumulator into MitmStore when bypass mode is active.
2026-02-15 01:03:39 -06:00
Nikketryhard
50b53097bc fix: bypass LS entirely when custom tools are active
When custom tools are set, don't forward ANY response from Google
to the LS. Instead, capture text and function calls directly into
MitmStore. The completions handler reads from MitmStore.

This eliminates the LS multi-turn loop (5 requests, 30+ seconds)
that occurred because the LS kept processing responses internally.
Tool calls now return in ~1.3s instead of timing out.
2026-02-15 00:54:40 -06:00
Nikketryhard
5d4125fa0d fix: suppress dummy text from tool call responses
Check for MITM-captured function calls BEFORE emitting text in the
streaming handler. This prevents the dummy 'Tool call completed'
placeholder (sent to the LS) from leaking to OpenCode, which was
confusing it into infinite loops.

Also removes duplicate function call storage at end of response loop
since they're now stored immediately when detected.
2026-02-15 00:37:39 -06:00
Nikketryhard
502318acec fix: store function calls in MitmStore immediately on detection
Previously, captured function calls were only stored in MitmStore
after the response loop ended. The completions handler polls
take_any_function_calls() during streaming, creating a race condition
where the MitmStore was empty.

Now function calls are stored immediately when parse_streaming_chunk
detects them, in both the initial body and body chunk paths.
2026-02-15 00:28:40 -06:00
Nikketryhard
7c44729ace fix: forge dummy STOP response to LS on functionCall capture
When the MITM detects a functionCall in Google's response AND custom
tools are active, send a forged clean text response to the LS instead
of the real one. This prevents the LS from seeing function calls for
tools it doesn't manage, eliminating the retry loop entirely.

The real function call data is captured in MitmStore and returned to
the client (OpenCode) through the completions handler.

Also removes the complex chunked-encoding response rewriting approach
in favor of this simpler forge-and-break strategy.
2026-02-15 00:15:00 -06:00
Nikketryhard
19ff784cae fix: always strip old functionCall/functionResponse from LS history
The function call stripping was only happening when no custom tools
were present. But even with custom tools injected, the LS history
contains functionCall/functionResponse parts for LS-internal tools
that we stripped, causing MALFORMED_FUNCTION_CALL. Now always strip
regardless of custom tools presence.
2026-02-14 23:59:13 -06:00
Nikketryhard
786987116b feat: full tool call support (OpenAI + Gemini endpoints)
- store.rs: Add tool context storage (active tools, tool config, pending
  tool results, call_id mapping, last function calls for history rewrite)
- types.rs: Add tools/tool_choice fields to ResponsesRequest, add
  build_function_call_output helper for OpenAI function_call output items
- modify.rs: Replace hardcoded get_weather with dynamic ToolContext
  injection. Add openai_tools_to_gemini and openai_tool_choice_to_gemini
  converters. Add conversation history rewriting for tool result turns
  (replaces fake 'Tool call completed' model turn with real functionCall,
  injects functionResponse before last user turn)
- proxy.rs: Build ToolContext from MitmStore before calling modify_request.
  Save last_function_calls for history rewriting on subsequent turns
- responses.rs: Store client tools in MitmStore before LS call. Detect
  function_call_output in input array for tool result submission. Return
  captured functionCalls as OpenAI function_call output items with
  generated call_ids and stringified arguments
- gemini.rs: New Gemini-native endpoint (POST /v1/gemini) with zero
  format translation. Accepts functionDeclarations directly, returns
  functionCall in Gemini format directly
- mod.rs: Wire /v1/gemini route, bump version to 3.3.0
2026-02-14 22:56:44 -06:00
Nikketryhard
8455aa674f feat: capture function calls from Google + block follow-up quota waste
When MITM strips LS tools and injects custom tools:
- Google returns functionCall → captured in MitmStore
- Follow-up LS requests are blocked with fake SSE response
- Proxy consumes captured calls and clears the flag
- Result: 1 real Google API call instead of 5+ per tool call

Flow: Client → Proxy → LS → MITM(inject tool) → Google
      Google returns functionCall → MITM captures it
      LS tries follow-up → MITM blocks (fake response)
      Proxy reads captured functionCall → returns to client
2026-02-14 22:37:28 -06:00
Nikketryhard
e678ec655b fix: standalone MITM — remove HTTPS_PROXY with iptables, fix is_agent detection
- Only set HTTPS_PROXY/HTTP_PROXY when iptables UID isolation is NOT
  available. With iptables, double-proxying caused profile picture
  fetches to fail with 'lookup http' DNS errors.
- Fix is_agent detection: handle JSON with spaces after colons
  ("requestType": "agent" vs "requestType":"agent")
- Suppress wrapper-not-installed warning in standalone mode
- Show 'iptables (standalone)' in banner instead of 'not installed'
2026-02-14 18:47:38 -06:00
Nikketryhard
f0c2574c88 feat: MITM request modification — strip bloat from LLM API requests
Intercepts streamGenerateContent requests and trims:
- System instruction: strips web_application_development, knowledge_discovery,
  persistent_context, skills sections (~18KB saved)
- Content messages: strips empty user_rules, workflows boilerplate,
  conversation summaries (~4.5KB saved)
- Tools: keeps 12 essential coding tools, strips 8 non-essential
  (browser_subagent, generate_image, search_web, etc. ~6KB saved)

Total: ~55% reduction in request size while keeping identity, user info,
and all coding-relevant tools intact. Only modifies 'agent' type requests,
checkpoint requests pass through unmodified.

Also:
- Standalone mode is now the default (use --no-standalone to attach to
  existing LS)
- Enable request modification by default
- Add mold linker, sccache, nextest config (8 thread cap)
- Add .cargo/config.toml and .config/nextest.toml
2026-02-14 18:35:07 -06:00
Nikketryhard
ca36ab0631 chore: clean up MITM logs and add Google SSE tests
- Demote non-LLM request logs to debug (only streamGenerateContent at info)
- Demote non-streaming response headers to debug
- Add 5 Google SSE parser tests (single event, multi-event accumulation,
  chunked framing, completion detection, no-thinking-tokens)
- Fix unused variable warning in proxy.rs
2026-02-14 17:55:17 -06:00
Nikketryhard
d4de436856 feat: MITM interception for standalone LS with UID isolation
- Spawn standalone LS as dedicated 'antigravity-ls' user via sudo
- UID-scoped iptables redirect (port 443 → MITM proxy) via mitm-redirect.sh
- Combined CA bundle (system CAs + MITM CA) for Go TLS trust
- Transparent TLS interception with chunked response detection
- Google SSE parser for streamGenerateContent usage extraction
- Timeouts on all MITM operations (TLS handshake, upstream, idle)
- Forward response data immediately (no buffering)
- Per-model token usage capture (input, output, thinking)
- Update docs and known issues to reflect resolved TLS blocker
2026-02-14 17:50:12 -06:00
Nikketryhard
901cd3d2e3 fix: resolve clippy warnings (matches!, map_or, redundant guard, unnecessary allocations) 2026-02-14 04:06:18 -06:00
Nikketryhard
4fa8775b61 feat: transparent proxy mode with SNI extraction and DNS bypass for upstream 2026-02-14 04:03:19 -06:00
Nikketryhard
d5e7f09225 feat: initial commit — antigravity proxy with MITM, standalone LS, and snapshot tooling 2026-02-14 02:24:35 -06:00