Commit Graph

26 Commits

Author SHA1 Message Date
Nikketryhard
22177a28a1 chore: fix all clippy warnings and add Cargo.toml metadata 2026-02-18 02:50:47 -06:00
Nikketryhard
ad0aa1556c feat: Add LICENSE file and refactor MITM response handling and tracing. 2026-02-18 02:43:05 -06:00
Nikketryhard
48674f65da refactor: decompose large functions and remove dead code
- Decompose modify_request() into 7 single-responsibility helpers
- Decompose handle_http_over_tls(): extract read_full_request, dispatch_stream_events
- Promote connect_upstream/resolve_upstream to module-level functions
- Split standalone.rs (1238 lines) into 4 submodules:
  standalone/mod.rs, spawn.rs, discovery.rs, stub.rs
- Extract proto wire primitives into proto/wire.rs
- Remove 6 dead MitmStore methods
- Remove dead SessionResult, DEFAULT_SESSION, get_or_create
- Remove dead decode_varint_at, extract_conversation_id
- Clean all unused imports across 10 files
- Suppress structural dead_code warnings on deserialization fields

Warnings: 20 -> 0. All 43 tests pass.
2026-02-17 22:27:26 -06:00
Nikketryhard
eb4c846b24 feat: match CLIProxyAPI system instruction pattern
Replace custom IGNORE/no-tools messages with CLIProxyAPI-style
multi-part system instruction: part[0] = identity text,
part[1] = Please ignore following [ignore]...[/ignore].
2026-02-16 21:46:52 -06:00
Nikketryhard
39381a4dfe fix: multi-round tool history rewrite and finishReason handling
- Add ToolRound struct to pair function calls with results per-round
- Replace single-match history rewrite (broke after first round) with
  multi-round loop that rewrites ALL placeholder model turns
- Fix tool result name fallback: use positional index instead of always
  picking the first call
- Set is_complete for any finishReason (FUNCTION_CALL, MAX_TOKENS, etc.)
  not just STOP — prevents response_complete flag from never being set
- Legacy fallback: responses.rs path (single-round via last_calls +
  pending_results) still works when tool_rounds is empty
- Add tests: multi-round rewrite, single-round legacy, no-op, and
  FUNCTION_CALL/MAX_TOKENS finishReason handling
2026-02-16 19:05:37 -06:00
Nikketryhard
38b4130c55 feat: Implement request generation counter and state management to prevent stale data and unblock Language Server for follow-up requests. 2026-02-16 16:21:52 -06:00
Nikketryhard
3fdd0368a0 fix: block ALL LS follow-up requests across connections
Move the in-flight blocking check to the top of the LLM request flow,
BEFORE request modification. This catches follow-ups on ALL connections
(the LS opens multiple parallel TLS connections). Only the very first
modified request reaches Google — all others get fake STOP responses.

Previously, each new connection independently allowed one request
through before blocking, letting 4-5 requests leak per turn.
2026-02-16 00:57:33 -06:00
Nikketryhard
a8f3c8915f fix: block ALL LS follow-up requests, deduplicate function calls
- Add request_in_flight flag to MitmStore, set immediately when first
  LLM request is forwarded with custom tools active
- Block ALL subsequent LS requests (agentic loop + internal flash-lite)
  with fake SSE responses instead of waiting for response_complete
- Fix function call deduplication: drain() accumulator after storing
  to prevent 3x duplicate tool calls across SSE chunks
- Clear all stale state (response, thinking, function calls, errors)
  at the start of each streaming request
- Handle response_complete with no content (thoughtSignature-only)
  gracefully with timeout instead of infinite hang
2026-02-16 00:51:56 -06:00
Nikketryhard
89bea030cc feat: inject images via MITM layer instead of relying on LS
The LS silently ignores the 'images' field from our
SendUserCascadeMessageRequest proto — it never forwards image data
to Google's API.

New approach: store the image in MitmStore, then the MITM request
modifier injects it as 'inlineData' directly into the last user
message's parts array in the Google API JSON request.

Flow:
  Client → Proxy (decode base64) → MitmStore.set_pending_image()
  LS → Google API → MITM intercepts → inject inlineData part
  → Google receives image + text together

This works for all three API endpoints (responses, completions,
gemini).
2026-02-15 17:57:32 -06:00
Nikketryhard
0a33c1b706 fix: send images as top-level ImageData field, not ChatMessage blob
SendUserCascadeMessageRequest proto field layout (from JS bundle analysis):
- Field 6 is 'images' (repeated ImageData) at the REQUEST level
- NOT a Blob sub-message inside ChatMessage (field 2)

ImageData proto uses base64_data (field 1) + mime_type (field 2),
not raw bytes. The LS was silently ignoring our ChatMessage blob
because the field structure didn't match.

Also protect MITM modifier from stripping messages containing
inlineData (image parts in Google API JSON).
2026-02-15 17:46:41 -06:00
Nikketryhard
afa96b88a5 chore: remove broken googleSearch grounding and /v1/search endpoint 2026-02-15 17:08:46 -06:00
Nikketryhard
b1bd57ab5e feat: forward generation params via MITM + add usageMetadata to Gemini
- Add GenerationParams struct to MitmStore for temperature, top_p,
  top_k, max_output_tokens, stop_sequences, frequency/presence_penalty
- MITM modify_request injects params into request.generationConfig
- All 3 endpoints (Completions, Responses, Gemini) store client params
- Add usageMetadata to Gemini sync responses (promptTokenCount,
  candidatesTokenCount, totalTokenCount, thoughtsTokenCount)
- Add generation param fields to GeminiRequest (temperature, topP, etc.)
- Completions stream_options.include_usage emits final usage chunk
- Completions reasoning_tokens in completion_tokens_details
- Update endpoint gap analysis doc (all high-priority gaps resolved)
2026-02-15 14:23:05 -06:00
Nikketryhard
735c3e357d chore: clean up dead code, fix broken test
- Remove unused methods: append_response_text, clear_response,
  has_pending_function_calls, take_function_calls
- Add #[allow(dead_code)] for intentionally kept future-use methods
  and response modification helpers
- Remove unused now_unix import from gemini.rs
- Fix test_modify_strips_all_tools: tools key is removed entirely
  when no custom tools provided, not left as empty array
- Zero warnings, 32 tests passing
2026-02-15 01:14:51 -06:00
Nikketryhard
40c6379ca1 fix: strip $schema and unsupported JSON Schema fields from tool params
Google's Gemini API rejects $schema, additionalProperties, $ref,
$defs, default, examples, and title in tool parameter schemas.
OpenCode/MCP tools include these standard JSON Schema fields.
Now recursively stripped during OpenAI→Gemini tool conversion.
2026-02-15 00:18:32 -06:00
Nikketryhard
7c44729ace fix: forge dummy STOP response to LS on functionCall capture
When the MITM detects a functionCall in Google's response AND custom
tools are active, send a forged clean text response to the LS instead
of the real one. This prevents the LS from seeing function calls for
tools it doesn't manage, eliminating the retry loop entirely.

The real function call data is captured in MitmStore and returned to
the client (OpenCode) through the completions handler.

Also removes the complex chunked-encoding response rewriting approach
in favor of this simpler forge-and-break strategy.
2026-02-15 00:15:00 -06:00
Nikketryhard
19ff784cae fix: always strip old functionCall/functionResponse from LS history
The function call stripping was only happening when no custom tools
were present. But even with custom tools injected, the LS history
contains functionCall/functionResponse parts for LS-internal tools
that we stripped, causing MALFORMED_FUNCTION_CALL. Now always strip
regardless of custom tools presence.
2026-02-14 23:59:13 -06:00
Nikketryhard
19090b79f0 fix: prevent MALFORMED_FUNCTION_CALL infinite retry loop
Root cause: after stripping LS tool definitions, two things remained:
1. toolConfig with mode=VALIDATED (forces function calling even with
   empty tools array)
2. Model's training/identity context causing it to attempt function
   calls in text

Fix:
- Remove empty tools array and toolConfig when no custom tools injected
- Strip functionCall/functionResponse parts from conversation history
- Append explicit 'no tools available' instruction to system prompt
- Remove debug dump code
2026-02-14 23:31:26 -06:00
Nikketryhard
a52d1bf475 fix: strip functionCall/functionResponse from history when no tools
When LS tools are stripped from the request but the conversation history
still contains functionCall/functionResponse parts referencing those
tools, Google returns MALFORMED_FUNCTION_CALL and the LS retries in an
infinite loop, causing the request to hang forever.

Now after stripping LS tools and confirming no custom tools are injected,
we also strip all functionCall/functionResponse parts from the history
and remove any messages that become empty as a result.
2026-02-14 23:19:28 -06:00
Nikketryhard
786987116b feat: full tool call support (OpenAI + Gemini endpoints)
- store.rs: Add tool context storage (active tools, tool config, pending
  tool results, call_id mapping, last function calls for history rewrite)
- types.rs: Add tools/tool_choice fields to ResponsesRequest, add
  build_function_call_output helper for OpenAI function_call output items
- modify.rs: Replace hardcoded get_weather with dynamic ToolContext
  injection. Add openai_tools_to_gemini and openai_tool_choice_to_gemini
  converters. Add conversation history rewriting for tool result turns
  (replaces fake 'Tool call completed' model turn with real functionCall,
  injects functionResponse before last user turn)
- proxy.rs: Build ToolContext from MitmStore before calling modify_request.
  Save last_function_calls for history rewriting on subsequent turns
- responses.rs: Store client tools in MitmStore before LS call. Detect
  function_call_output in input array for tool result submission. Return
  captured functionCalls as OpenAI function_call output items with
  generated call_ids and stringified arguments
- gemini.rs: New Gemini-native endpoint (POST /v1/gemini) with zero
  format translation. Accepts functionDeclarations directly, returns
  functionCall in Gemini format directly
- mod.rs: Wire /v1/gemini route, bump version to 3.3.0
2026-02-14 22:56:44 -06:00
Nikketryhard
8455aa674f feat: capture function calls from Google + block follow-up quota waste
When MITM strips LS tools and injects custom tools:
- Google returns functionCall → captured in MitmStore
- Follow-up LS requests are blocked with fake SSE response
- Proxy consumes captured calls and clears the flag
- Result: 1 real Google API call instead of 5+ per tool call

Flow: Client → Proxy → LS → MITM(inject tool) → Google
      Google returns functionCall → MITM captures it
      LS tries follow-up → MITM blocks (fake response)
      Proxy reads captured functionCall → returns to client
2026-02-14 22:37:28 -06:00
Nikketryhard
146be139a2 fix: re-enable tool stripping after testing
With tools present, LS enters full agentic mode doing multi-turn
tool calls (file searches, terminal commands, etc.). A simple
weather question caused 40+ Google API calls in 120s before timeout.
Tool stripping is required to maintain single-turn behavior.
2026-02-14 22:18:02 -06:00
Nikketryhard
3e3af85798 feat: add proxyctl daemon manager, fix standalone LS cleanup
- Add proxyctl CLI script for systemd service management
- Add systemd user service file for background operation
- Fix standalone LS kill: properly track real LS PID via pgrep
  and use sudo kill for cross-user cleanup on shutdown
- Remove deprecated scripts (dns-redirect, iptables-redirect,
  mitm-wrapper, standalone-ls, parse-snapshot)
- Disable tool stripping in MITM for tool call investigation
- Update GEMINI.md with CLI tools documentation
2026-02-14 22:14:00 -06:00
Nikketryhard
34b9553484 feat: capture thinking text via MITM dual-call merge
The LS makes TWO separate Google API calls for thinking models:
  Call 1: response + thinking token count (no thinking text)
  Call 2: thinking summary text (no thinking tokens)

Each hits a different StreamingAccumulator, so we:
1. Capture response_text in StreamingAccumulator (non-thinking parts)
2. In MitmStore::record_usage, detect when Call 2 arrives for a
   cascade that already has thinking tokens from Call 1
3. Merge Call 2's response_text as thinking_text on Call 1's usage

Also injects includeThoughts into Google API requests via MITM
modify to ensure thinking text is available in SSE responses.
2026-02-14 19:49:15 -06:00
Nikketryhard
7c4e781900 feat: aggressive request stripping — keep only identity + conversation
Strip everything from intercepted LLM requests except:
- <identity> section in system instruction
- Actual conversation turns (user messages + model responses)

Removed: tool_calling, web_app_dev, knowledge_discovery,
persistent_context, skills, ephemeral_message, communication_style,
user_information, user_rules, MEMORY, workflows, mcp_servers,
conversation_summaries, ADDITIONAL_METADATA, Step Id prefixes.

Expected reduction: ~92% (63KB → ~5KB for simple requests).
2026-02-14 19:05:49 -06:00
Nikketryhard
1a7c81e5f9 feat: strip ALL tools from intercepted requests by default
Tools are only needed by the Antigravity webview for tool-call UI.
Our proxy doesn't need them — the model generates text responses fine
without tool definitions. Stripping all 20 tools saves ~15KB per request.
2026-02-14 18:53:38 -06:00
Nikketryhard
f0c2574c88 feat: MITM request modification — strip bloat from LLM API requests
Intercepts streamGenerateContent requests and trims:
- System instruction: strips web_application_development, knowledge_discovery,
  persistent_context, skills sections (~18KB saved)
- Content messages: strips empty user_rules, workflows boilerplate,
  conversation summaries (~4.5KB saved)
- Tools: keeps 12 essential coding tools, strips 8 non-essential
  (browser_subagent, generate_image, search_web, etc. ~6KB saved)

Total: ~55% reduction in request size while keeping identity, user info,
and all coding-relevant tools intact. Only modifies 'agent' type requests,
checkpoint requests pass through unmodified.

Also:
- Standalone mode is now the default (use --no-standalone to attach to
  existing LS)
- Enable request modification by default
- Add mold linker, sccache, nextest config (8 thread cap)
- Add .cargo/config.toml and .config/nextest.toml
2026-02-14 18:35:07 -06:00