Commit Graph

32 Commits

Author SHA1 Message Date
Nikketryhard
22177a28a1 chore: fix all clippy warnings and add Cargo.toml metadata 2026-02-18 02:50:47 -06:00
Nikketryhard
ad0aa1556c feat: Add LICENSE file and refactor MITM response handling and tracing. 2026-02-18 02:43:05 -06:00
Nikketryhard
00587fcce8 feat: rebrand to ZeroGravity, replace proxyctl with zg Rust binary
Phase 1 - Rename:
- Crate: antigravity-proxy -> zerogravity
- Env: ANTIGRAVITY_OAUTH_TOKEN -> ZEROGRAVITY_TOKEN
- Paths: ~/.config/antigravity-proxy -> ~/.config/zerogravity
- Paths: /tmp/antigravity-* -> /tmp/zerogravity-*
- User: antigravity-ls -> zerogravity-ls
- Service: antigravity-proxy -> zerogravity

Phase 2 - zg daemon manager:
- New Rust binary src/bin/zg.rs replaces scripts/proxyctl bash
- Commands: start, stop, restart, rebuild, status, logs, test, health
- Auto-resolves project dir from binary location
- All commands exit immediately (safe for agent fast-bash)
2026-02-18 01:54:54 -06:00
Nikketryhard
28d3296c87 fix: gemini route, usage capture, search timeout, and trace finalization
- Add missing /v1/gemini POST route and handler
- Capture MitmEvent::Usage in gemini sync/streaming handlers
- Add retry counter (max 3) to search handler to prevent hang
- Add trace finalization at all gemini_sync channel exit points
- Fix UpstreamError trace outcome label
- Add timeout trace with error recording
- Dispatch Usage before ResponseComplete in SSE flush
2026-02-18 01:31:18 -06:00
Nikketryhard
48674f65da refactor: decompose large functions and remove dead code
- Decompose modify_request() into 7 single-responsibility helpers
- Decompose handle_http_over_tls(): extract read_full_request, dispatch_stream_events
- Promote connect_upstream/resolve_upstream to module-level functions
- Split standalone.rs (1238 lines) into 4 submodules:
  standalone/mod.rs, spawn.rs, discovery.rs, stub.rs
- Extract proto wire primitives into proto/wire.rs
- Remove 6 dead MitmStore methods
- Remove dead SessionResult, DEFAULT_SESSION, get_or_create
- Remove dead decode_varint_at, extract_conversation_id
- Clean all unused imports across 10 files
- Suppress structural dead_code warnings on deserialization fields

Warnings: 20 -> 0. All 43 tests pass.
2026-02-17 22:27:26 -06:00
Nikketryhard
637fbc0e54 refactor: endpoint parity and proxy improvements
Mixed changes from recent sessions: endpoint feature parity
improvements, proxy bug fixes, and store cleanup.
2026-02-16 21:47:00 -06:00
Nikketryhard
a47c572e48 fix: forward Google's exact error messages to client
Root cause: errors from Google were being swallowed, replaced with
placeholders like 'Google API returned HTTP 400' or '[Timeout waiting
for response]', or silently converted to fake 'incomplete' responses.

Changes across all endpoints (/v1/chat/completions, /v1/responses,
/v1/gemini, /v1/search):

Error message fidelity:
- UpstreamError message now includes Google's status prefix: [STATUS] msg
- Falls back to raw body if JSON parsing fails (protobuf, HTML, etc.)
- ErrorDetail gains optional code and param fields

Timeout handling:
- poll_for_response returns UpstreamError(504, DEADLINE_EXCEEDED) on timeout
  instead of '[Timeout waiting for AI response]' placeholder text
- Streaming timeouts emit proper error events, not fake content
- Sync bypass timeouts return 504 Gateway Timeout, not 200 incomplete

Missing error checks added:
- responses.rs sync bypass: added upstream_error check in polling loop
- gemini.rs sync bypass: added upstream_error check in polling loop
- gemini.rs streaming: added upstream_error check in polling loop
  (was completely missing — errors only handled in sync path)

DRY helpers:
- upstream_error_message(): shared exact message extraction
- upstream_error_type(): shared Google→OpenAI error type mapping
- All streaming handlers use these instead of inline formatting
2026-02-16 19:30:32 -06:00
Nikketryhard
ba96534ead fix: prevent tool_rounds cross-cascade contamination causing hangs
Root cause: proxy.rs eagerly pushed tool rounds via push_tool_round_calls
when intercepting Google's functionCall response. These stale rounds leaked
into LS follow-up requests, producing malformed history that Google timed
out on (60s 'no upstream response').

Changes:
- Remove push_tool_round_calls from proxy.rs response interception
- proxy.rs: use get_tool_rounds (non-destructive) instead of take_tool_rounds
  so accumulated rounds persist across multiple LS requests per cascade
- responses.rs/gemini.rs: build rounds via take+push+set pattern — each
  handler accumulates its own rounds from get_last_function_calls + results
- completions.rs: unchanged (set_tool_rounds replaces from messages)
- clear_tools: also clears tool_rounds to prevent stale data between sessions
- store.rs: add get_tool_rounds (non-destructive clone) method
2026-02-16 19:21:03 -06:00
Nikketryhard
32f02d6456 fix: extend multi-round tool history to responses and gemini endpoints
- proxy.rs: push_tool_round_calls alongside set_last_function_calls
  when Google responds with functionCall — accumulates rounds
- responses.rs: attach_tool_round_results to pair tool results with
  the correct round instead of flat add_tool_result
- gemini.rs: same attach_tool_round_results integration
- store.rs: add push_tool_round_calls and attach_tool_round_results
  methods for cross-request round accumulation
- Legacy add_tool_result kept for backward compat alongside new path
2026-02-16 19:11:38 -06:00
Nikketryhard
6bda2ecafa fix: tool call race conditions and missing completions tool result extraction
- store.rs: record_function_call now falls back to active_cascade_id
  (matching record_usage behavior) instead of blind _latest fallback
- store.rs: add cascade-aware take_function_calls(cascade_id) method
  with priority: exact match → active cascade → _latest → any key
- completions.rs: extract tool_calls from assistant messages and tool
  results from tool messages, storing them for MITM injection. This was
  the ROOT CAUSE — the completions handler stored tool definitions but
  never extracted tool results, so modify_request couldn't rewrite the
  LS conversation history with proper functionCall/functionResponse
- responses.rs: use cascade-aware take_function_calls for consistency
2026-02-16 18:43:16 -06:00
Nikketryhard
3fdd0368a0 fix: block ALL LS follow-up requests across connections
Move the in-flight blocking check to the top of the LLM request flow,
BEFORE request modification. This catches follow-ups on ALL connections
(the LS opens multiple parallel TLS connections). Only the very first
modified request reaches Google — all others get fake STOP responses.

Previously, each new connection independently allowed one request
through before blocking, letting 4-5 requests leak per turn.
2026-02-16 00:57:33 -06:00
Nikketryhard
2882f7cce2 feat: propagate Google upstream errors to client
When Google returns an error (400, 429, 500, etc.), the MITM proxy now
captures it and the API handlers return it immediately instead of
hanging until timeout.

- UpstreamError struct stored in MitmStore
- MITM proxy parses Google error JSON (message + status)
- Polling handler checks for upstream errors each cycle
- Streaming handlers emit response.failed / SSE error events
- Error status mapped to OpenAI-style types (invalid_request_error,
  rate_limit_error, authentication_error, server_error, etc.)
- All handlers clear stale errors at request start
2026-02-15 18:19:38 -06:00
Nikketryhard
371c57bab0 fix: parse flat content arrays in Responses API input
When input is [{type: 'input_image', ...}, {type: 'input_text', text: '...'}],
the code was looking for items with role: 'user' which don't exist in flat
content arrays. Now extracts text from input_text items directly first,
falling back to role-based messages only if no flat text found.

Also adds debug header dump for MITM request forwarding.
2026-02-15 18:10:03 -06:00
Nikketryhard
89bea030cc feat: inject images via MITM layer instead of relying on LS
The LS silently ignores the 'images' field from our
SendUserCascadeMessageRequest proto — it never forwards image data
to Google's API.

New approach: store the image in MitmStore, then the MITM request
modifier injects it as 'inlineData' directly into the last user
message's parts array in the Google API JSON request.

Flow:
  Client → Proxy (decode base64) → MitmStore.set_pending_image()
  LS → Google API → MITM intercepts → inject inlineData part
  → Google receives image + text together

This works for all three API endpoints (responses, completions,
gemini).
2026-02-15 17:57:32 -06:00
Nikketryhard
976c44fdd4 feat: add image support across all endpoints (responses, completions, gemini) 2026-02-15 17:25:33 -06:00
Nikketryhard
ca9f808ee3 feat: completions API improvements, gemini endpoint, response types 2026-02-15 17:08:53 -06:00
Nikketryhard
b1bd57ab5e feat: forward generation params via MITM + add usageMetadata to Gemini
- Add GenerationParams struct to MitmStore for temperature, top_p,
  top_k, max_output_tokens, stop_sequences, frequency/presence_penalty
- MITM modify_request injects params into request.generationConfig
- All 3 endpoints (Completions, Responses, Gemini) store client params
- Add usageMetadata to Gemini sync responses (promptTokenCount,
  candidatesTokenCount, totalTokenCount, thoughtsTokenCount)
- Add generation param fields to GeminiRequest (temperature, topP, etc.)
- Completions stream_options.include_usage emits final usage chunk
- Completions reasoning_tokens in completion_tokens_details
- Update endpoint gap analysis doc (all high-priority gaps resolved)
2026-02-15 14:23:05 -06:00
Nikketryhard
981fb3b18d fix: resolve cascade correlation, update KNOWN_ISSUES
- MitmStore: added active_cascade_id field with set/get/clear methods
- record_usage() now falls back to active_cascade_id when the heuristic
  cascade hint is absent (fixes usage always going to _latest)
- All three API handlers set active cascade before send_message
- KNOWN_ISSUES: moved 3 issues to resolved:
  - Request modification (already true, was stale entry)
  - Cascade correlation (fixed via active_cascade_id)
  - Progressive thinking streaming (fixed via MITM bypass)
2026-02-15 01:10:34 -06:00
Nikketryhard
b3af73cebd feat: sync all endpoints with MITM LS bypass + real-time thinking streaming
- Responses API (streaming): MITM bypass path polls MitmStore directly
  when custom tools are active, skipping LS step polling entirely.
  Streams thinking text deltas in real-time as they arrive from the MITM.
  Handles function calls, text response, and thinking/reasoning events.

- Responses API (sync): Same MITM bypass for non-streaming responses.
  Polls MitmStore for function calls or completed text before falling
  back to LS path.

- Gemini endpoint: MITM bypass polls MitmStore directly for tool call
  responses, eliminating LS overhead.

- MitmStore: Added captured_thinking_text field with set/peek/take methods
  for real-time thinking text capture from MITM SSE.

- MITM proxy: Now captures both thinking_text and response_text from
  StreamingAccumulator into MitmStore when bypass mode is active.
2026-02-15 01:03:39 -06:00
Nikketryhard
786987116b feat: full tool call support (OpenAI + Gemini endpoints)
- store.rs: Add tool context storage (active tools, tool config, pending
  tool results, call_id mapping, last function calls for history rewrite)
- types.rs: Add tools/tool_choice fields to ResponsesRequest, add
  build_function_call_output helper for OpenAI function_call output items
- modify.rs: Replace hardcoded get_weather with dynamic ToolContext
  injection. Add openai_tools_to_gemini and openai_tool_choice_to_gemini
  converters. Add conversation history rewriting for tool result turns
  (replaces fake 'Tool call completed' model turn with real functionCall,
  injects functionResponse before last user turn)
- proxy.rs: Build ToolContext from MitmStore before calling modify_request.
  Save last_function_calls for history rewriting on subsequent turns
- responses.rs: Store client tools in MitmStore before LS call. Detect
  function_call_output in input array for tool result submission. Return
  captured functionCalls as OpenAI function_call output items with
  generated call_ids and stringified arguments
- gemini.rs: New Gemini-native endpoint (POST /v1/gemini) with zero
  format translation. Accepts functionDeclarations directly, returns
  functionCall in Gemini format directly
- mod.rs: Wire /v1/gemini route, bump version to 3.3.0
2026-02-14 22:56:44 -06:00
Nikketryhard
8455aa674f feat: capture function calls from Google + block follow-up quota waste
When MITM strips LS tools and injects custom tools:
- Google returns functionCall → captured in MitmStore
- Follow-up LS requests are blocked with fake SSE response
- Proxy consumes captured calls and clears the flag
- Result: 1 real Google API call instead of 5+ per tool call

Flow: Client → Proxy → LS → MITM(inject tool) → Google
      Google returns functionCall → MITM captures it
      LS tries follow-up → MITM blocks (fake response)
      Proxy reads captured functionCall → returns to client
2026-02-14 22:37:28 -06:00
Nikketryhard
b965be3f60 feat: add reactive streaming and remove dead panel stream code
- Subscribe to StreamCascadeReactiveUpdates for real-time cascade state diffs
- Fall back to timer-based polling if streaming RPC unavailable
- Remove StreamCascadePanelReactiveUpdates code (dead end, only has plan_status/user_settings)
- Remove debug diff file-saving code
- Add stream_reactive_rpc() helper to backend
2026-02-14 21:39:04 -06:00
Nikketryhard
3d7a7f492b fix: reduce poll intervals for smoother streaming
Streaming poll: 800-1200ms → 150-250ms (5x faster)
Sync poll: 1000-1800ms → 200-400ms (4x faster)

Verified via STEP_DUMP instrumentation that the LS updates
plannerResponse.response incrementally during GENERATING status,
so faster polling yields smoother progressive text delivery.

Also restructured streaming to emit reasoning events first
when thinking content is detected in LS steps before response text.
2026-02-14 20:34:37 -06:00
Nikketryhard
b1a089d21d feat: emit streaming reasoning events per OpenAI spec
Adds proper streaming SSE events for reasoning content:
- response.output_item.added (reasoning)
- response.reasoning_summary_part.added
- response.reasoning_summary_text.delta
- response.reasoning_summary_text.done
- response.reasoning_summary_part.done
- response.output_item.done (reasoning)

These are emitted before the message events, matching the format
that OpenAI-compatible clients expect for displaying thinking content.
2026-02-14 19:57:52 -06:00
Nikketryhard
5c1f4c77d9 fix: add retry logic for MITM thinking text merge race condition
The LS makes two Google API calls for thinking models. Call 2 (thinking
summary) may not have arrived by the time usage_from_poll runs after
Call 1 (response). Now we peek first, and if thinking tokens exist but
text is missing, wait up to 1s for the merge to happen.

Also adds peek_usage method to MitmStore for non-consuming reads.
2026-02-14 19:54:37 -06:00
Nikketryhard
905d55beb5 feat: capture thinking text from MITM-intercepted API responses
The LS strips thinking/reasoning text from plannerResponse steps —
only the thinkingSignature (opaque verification blob) is preserved.
The actual thinking text flows through the MITM proxy in the raw
Google SSE response (parts with thought: true) and Anthropic SSE
(thinking_delta content blocks).

Changes:
- StreamingAccumulator now accumulates thinking text from SSE events
- ApiUsage gains thinking_text: Option<String>
- usage_from_poll returns (Usage, Option<thinking_text>)
- Thinking text priority: MITM-captured > LS-extracted (fallback)
- Reasoning output item now populated from real API data
- Removed debug dump code
2026-02-14 19:30:09 -06:00
Nikketryhard
19dc920872 fix: return thinking as reasoning output item per OpenAI spec
Thinking content was previously returned as non-standard top-level
fields (thinking, thinking_duration). Now follows the official OpenAI
Responses API format:

- Reasoning appears as a 'type: reasoning' item in the output array
  with summary[].text containing the thinking content
- Message item follows after the reasoning item
- thinking_signature kept as proxy extension (internal multi-turn data)
- Removed ResponseOutput/OutputContent structs in favor of
  serde_json::Value for polymorphic output items
2026-02-14 19:16:12 -06:00
Nikketryhard
061b08fc8f fix: cascade correlation — fallback to _latest MITM usage
When the MITM can't extract a cascade ID from the intercepted request
(Content-Length: 0 / chunked encoding), usage is stored under '_latest'.
Now usage_from_poll and completions try the exact cascade_id first,
then fall back to '_latest' so MITM-captured tokens are actually used.
2026-02-14 18:10:04 -06:00
Nikketryhard
6842bfeaa5 chore: clean up code — remove dead code, stale allows, eprintln→tracing, remove volatile data from docs 2026-02-14 16:11:34 -06:00
Nikketryhard
686f5820d6 refactor: extract ResponseData struct to eliminate 18-arg build_response_object 2026-02-14 04:09:41 -06:00
Nikketryhard
901cd3d2e3 fix: resolve clippy warnings (matches!, map_or, redundant guard, unnecessary allocations) 2026-02-14 04:06:18 -06:00
Nikketryhard
d5e7f09225 feat: initial commit — antigravity proxy with MITM, standalone LS, and snapshot tooling 2026-02-14 02:24:35 -06:00