From ea12127acb41417fd76247fcbdc0bec014ea1704 Mon Sep 17 00:00:00 2001 From: Nikketryhard Date: Wed, 18 Feb 2026 03:33:47 -0600 Subject: [PATCH] chore: remove outdated planning documents and the known issues file.chore: remove outdated planning documents and the known issues file. --- .gemini/plans/sync-and-latency.md | 46 ---- .gemini/plans/tool-calls-implementation.md | 292 --------------------- .gitignore | 4 + KNOWN_ISSUES.md | 117 --------- 4 files changed, 4 insertions(+), 455 deletions(-) delete mode 100644 .gemini/plans/sync-and-latency.md delete mode 100644 .gemini/plans/tool-calls-implementation.md delete mode 100644 KNOWN_ISSUES.md diff --git a/.gemini/plans/sync-and-latency.md b/.gemini/plans/sync-and-latency.md deleted file mode 100644 index d3b7c68..0000000 --- a/.gemini/plans/sync-and-latency.md +++ /dev/null @@ -1,46 +0,0 @@ -# Sync All Endpoints + Latency + Thinking Streaming - -## Phase 1: Sync Responses API (`/v1/responses`) with LS bypass - -Current state: - -- `handle_responses_stream` (line 529-859) polls LS steps for text -- Doesn't use MitmStore bypass at all -- Still suffers from LS multi-turn overhead when tools are active - -Fix: - -- Add MITM bypass path (same as completions) — check MitmStore for text + function calls -- For function calls: emit `response.output_item.added` (function_call type) + done events -- For text: stream from MitmStore `captured_response_text` + `response_complete` - -## Phase 2: Sync Gemini endpoint (`/v1/gemini`) with LS bypass - -Current state: - -- `handle_gemini` (line 57-236) uses `poll_for_response` then checks MitmStore -- Already checks `take_any_function_calls()` after polling -- But `poll_for_response` still goes through LS steps - -Fix: - -- When tools are active, poll MitmStore directly instead of `poll_for_response` - -## Phase 3: Latency improvements - -- Reduce poll intervals across all handlers -- Add MITM store thinking_text capture for real-time streaming - -## Phase 4: Real-time thinking streaming investigation - -Current state: - -- Google SSE includes `thought: true` parts with thinking text -- `streaming_acc.thinking_text` accumulates this -- Currently only used for final usage stats, not streamed in real-time - -Investigation needed: - -- The MITM intercept already captures thinking_text per-chunk -- Need to store thinking_text updates in MitmStore incrementally -- Responses handler can then stream thinking deltas in real-time diff --git a/.gemini/plans/tool-calls-implementation.md b/.gemini/plans/tool-calls-implementation.md deleted file mode 100644 index a3f7beb..0000000 --- a/.gemini/plans/tool-calls-implementation.md +++ /dev/null @@ -1,292 +0,0 @@ -# Tool Call Implementation Plan - -## Overview - -Add full tool call support to the Antigravity proxy. Primary endpoint is OpenAI Responses API (`/v1/responses`), with a Gemini-native backup endpoint (`/v1/gemini`). Tools are stored per-session, all `tool_choice` modes supported, parallel tool calls supported. - -## Data Flow - -``` -┌─────────┐ ┌───────────┐ ┌────┐ ┌──────┐ ┌────────┐ -│ Client │─────▶│ Proxy │─────▶│ LS │─────▶│ MITM │─────▶│ Google │ -│ (openai) │ │ (axum) │ │ │ │ │ │ │ -│ │◀─────│ │◀─────│ │◀─────│ │◀─────│ │ -└─────────┘ └───────────┘ └────┘ └──────┘ └────────┘ - │ │ │ │ - │ tools (OAI) │ store tools (Gemini fmt) │ inject │ - │───────────────▶│────────────▶ MitmStore ─────▶│ tools │ - │ │ │──────────────▶│ - │ │ │ │ - │ │ │ functionCall │ - │ │◀──── capture ───────────────│◀──────────────│ - │ tool_calls │ │ block follow │ - │◀───────────────│ │ ups │ - │ │ │ │ - │ tool result │ store result │ inject │ - │───────────────▶│────────────▶ MitmStore ─────▶│ fn response │ - │ │ │──────────────▶│ - │ final text │ │ │ - │◀───────────────│◀────────────────────────────│◀──────────────│ -``` - -## Format Differences - -### Tool Definitions - -| Aspect | OpenAI | Gemini | -| ------------ | -------------------------------------- | ---------------------------------- | -| Wrapper | `{"type":"function","function":{...}}` | `{"functionDeclarations":[{...}]}` | -| Type strings | lowercase: `"object"`, `"string"` | UPPERCASE: `"OBJECT"`, `"STRING"` | -| Parameters | JSON Schema subset | Same schema, uppercase types | - -### Tool Choice - -| OpenAI | Gemini toolConfig | -| --------------------------------------------- | ----------------------------------------------------------------------- | -| `"auto"` | `{"functionCallingConfig":{"mode":"AUTO"}}` | -| `"required"` | `{"functionCallingConfig":{"mode":"ANY"}}` | -| `"none"` | `{"functionCallingConfig":{"mode":"NONE"}}` | -| `{"type":"function","function":{"name":"X"}}` | `{"functionCallingConfig":{"mode":"ANY","allowedFunctionNames":["X"]}}` | - -### Tool Call Response - -| OpenAI (what we return) | Gemini (what Google returns) | -| -------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- | -| `output: [{"type":"function_call","call_id":"call_xxx","name":"get_weather","arguments":"{...}"}]` | `parts: [{"functionCall":{"name":"get_weather","args":{...}}}]` | - -### Tool Result Submission - -| OpenAI (what client sends) | Gemini (what we inject into Google request) | -| -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | -| `input: [{"type":"function_call_output","call_id":"call_xxx","output":"{...}"}]` | `contents: [{role:"model",parts:[{functionCall:...}]},{role:"user",parts:[{functionResponse:{name:"...",response:{...}}}]}]` | - ---- - -## Implementation Phases - -### Phase 1: Store Infrastructure (`store.rs`) - -Add to `MitmStore`: - -```rust -/// Active tool definitions (Gemini format) for MITM injection. -active_tools: Arc>>>, -/// Active tool config (Gemini toolConfig format). -active_tool_config: Arc>>, -/// Pending tool results for MITM to inject as functionResponse. -pending_tool_results: Arc>>, -/// Mapping call_id → function name for tool result routing. -call_id_to_name: Arc>>, -/// Last captured function calls (for conversation history rewriting). -last_function_calls: Arc>>, -``` - -New types: - -```rust -pub struct PendingToolResult { - pub name: String, - pub result: serde_json::Value, -} -``` - -New methods: - -- `set_tools(tools)` / `get_tools()` / `clear_tools()` -- `set_tool_config(config)` / `get_tool_config()` -- `add_tool_result(result)` / `take_tool_results()` -- `register_call_id(call_id, name)` / `lookup_call_id(call_id)` -- `set_last_function_calls(calls)` / `get_last_function_calls()` - -### Phase 2: Request Types (`types.rs`) - -Add to `ResponsesRequest`: - -```rust -#[serde(default)] -pub tools: Option>, -#[serde(default)] -pub tool_choice: Option, -``` - -New output builder: - -```rust -pub fn build_function_call_output(call_id: &str, name: &str, arguments: &str) -> Value -``` - -### Phase 3: Format Conversion + Dynamic Injection (`modify.rs`) - -New public struct: - -```rust -pub struct ToolContext { - pub tools: Option>, // Gemini functionDeclarations - pub tool_config: Option, // Gemini toolConfig - pub pending_results: Vec, // Tool results to inject - pub last_calls: Vec, // For history rewriting -} -``` - -New conversion functions: - -```rust -pub fn openai_tools_to_gemini(tools: &[Value]) -> Vec // OAI → Gemini format -pub fn openai_tool_choice_to_gemini(choice: &Value) -> Value // OAI → Gemini toolConfig -fn uppercase_types(val: Value) -> Value // Recursive type case fix -``` - -Change `modify_request` signature: - -```rust -pub fn modify_request(body: &[u8], tool_ctx: Option<&ToolContext>) -> Option> -``` - -Tool injection logic: - -1. Strip all LS tools (existing) -2. If `tool_ctx.tools` provided → inject as Gemini `functionDeclarations` -3. If `tool_ctx.tool_config` provided → inject as `toolConfig` -4. If `tool_ctx.pending_results` not empty → rewrite conversation history: - - Find model turn with "Tool call completed" → replace with `functionCall` parts - - Find last user turn → prepend `functionResponse` part - -### Phase 4: MITM Plumbing (`proxy.rs`) - -In `handle_http_over_tls`, before calling `modify_request`: - -1. Read `get_tools()`, `get_tool_config()`, `take_tool_results()`, `get_last_function_calls()` from store -2. Build `ToolContext` -3. Pass to `modify_request(body, tool_ctx)` - -After response capture: - -1. Save captured function calls as `last_function_calls` (for future history rewriting) - -### Phase 5: API Handler (`responses.rs`) - -#### Request handling (in `handle_responses`): - -1. If `body.tools` provided: - - Convert OpenAI → Gemini format via `openai_tools_to_gemini()` - - Store in `MitmStore` via `set_tools()` -2. If `body.tool_choice` provided: - - Convert via `openai_tool_choice_to_gemini()` - - Store in `MitmStore` via `set_tool_config()` -3. Check `body.input` for `function_call_output` items: - - If found: look up `call_id` → function name via `lookup_call_id()` - - Store as `PendingToolResult` via `add_tool_result()` - - Extract any accompanying text (or use placeholder) - -#### Response handling (in `handle_responses_sync` / `handle_responses_stream`): - -After polling completes: - -1. Check `take_any_function_calls()` for captured tool calls -2. If captured: - - Generate `call_id` for each (e.g., `"call_" + random`) - - Register `call_id → name` mapping via `register_call_id()` - - Build `function_call` output items via `build_function_call_output()` - - Return these INSTEAD of the text message output -3. If no tool calls: existing text response behavior - -### Phase 6: Gemini-Native Endpoint (`gemini.rs` + `mod.rs`) - -New file `src/api/gemini.rs` with handler `handle_gemini`: - -- Accepts tools in Gemini `functionDeclarations` format directly (no conversion) -- Accepts `toolConfig` directly -- Returns `functionCall` in Gemini format directly -- Same cascade/session management as responses.rs -- Much simpler — no format translation - -Route: `POST /v1/gemini` in `mod.rs` - ---- - -## File Change Summary - -| File | Changes | Complexity | -| ---------------------- | ----------------------------------------------------------------------- | ---------- | -| `src/mitm/store.rs` | Add tool context storage (5 new fields, ~10 methods) | Medium | -| `src/api/types.rs` | Add `tools`/`tool_choice` to request, add output builder | Low | -| `src/mitm/modify.rs` | `ToolContext`, format conversion, dynamic injection, history rewrite | High | -| `src/mitm/proxy.rs` | Read store → build ToolContext → pass to modify | Low | -| `src/api/responses.rs` | Store tools, detect tool results in input, return function_call outputs | High | -| `src/api/gemini.rs` | New file — Gemini-native endpoint (passthrough) | Medium | -| `src/api/mod.rs` | Add route + module declaration | Low | - -## Implementation Order - -1. `store.rs` — foundation, no dependencies -2. `types.rs` — request/response types -3. `modify.rs` — format conversion + injection (depends on store types) -4. `proxy.rs` — plumbing (depends on modify signature) -5. Build + verify compilation -6. `responses.rs` — handler changes (depends on all above) -7. Build + test with `get_weather` request -8. `gemini.rs` + `mod.rs` — Gemini endpoint -9. Build + test with Gemini format -10. Tool result flow test (multi-turn) - -## Testing Strategy - -### Test 1: Basic tool call (sync) - -```bash -curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{ - "model": "gemini-3-flash", - "input": "What is the weather in Tokyo?", - "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}], - "tool_choice": "auto", - "conversation": "tool-test", - "stream": false -}' -# Expected: output contains function_call with name=get_weather, arguments={"city":"Tokyo"} -``` - -### Test 2: Tool result submission (multi-turn) - -```bash -curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{ - "model": "gemini-3-flash", - "input": [{"type":"function_call_output","call_id":"call_xxx","output":"{\"temp\":72,\"unit\":\"F\"}"}], - "conversation": "tool-test", - "stream": false -}' -# Expected: output contains text response using the tool result -``` - -### Test 3: Gemini-native endpoint - -```bash -curl -s http://localhost:8741/v1/gemini -H "Content-Type: application/json" -d '{ - "model": "gemini-3-flash", - "input": "What is the weather in Tokyo?", - "tools": [{"functionDeclarations":[{"name":"get_weather","description":"Get weather","parameters":{"type":"OBJECT","properties":{"city":{"type":"STRING"}},"required":["city"]}}]}], - "conversation": "gemini-tool-test", - "stream": false -}' -# Expected: response contains functionCall in Gemini format -``` - -### Test 4: No tools (regression) - -```bash -curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{ - "model": "gemini-3-flash", - "input": "What is 2+2?", - "stream": false -}' -# Expected: normal text response, no tool call behavior -``` - -## Risks & Mitigations - -| Risk | Impact | Mitigation | -| ---------------------------------------------------------------- | ------ | ------------------------------------------------------------------------- | -| History rewriting breaks conversation | High | Only rewrite when pending_results non-empty; keep original as fallback | -| LS times out waiting for Google response during tool result turn | Medium | Increase timeout for tool result turns | -| Multiple parallel tool calls create race conditions | Medium | AtomicBool + sequential processing already handles this | -| `modify_request` test breakage | Low | Update existing tests for new signature | -| Global tool storage conflicts across concurrent requests | Medium | Not an issue — LS processes one request at a time (single cascade active) | diff --git a/.gitignore b/.gitignore index 2ff7704..3f116bb 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,7 @@ !README.txt test_output.json captured-request-*.json + +# Agent artifacts +.gemini/plans/ +KNOWN_ISSUES.md diff --git a/KNOWN_ISSUES.md b/KNOWN_ISSUES.md deleted file mode 100644 index f76de83..0000000 --- a/KNOWN_ISSUES.md +++ /dev/null @@ -1,117 +0,0 @@ -# Known Issues & Future Work - -All critical blockers have been resolved. Standalone LS with MITM interception -is fully working. Reactive streaming is implemented with polling fallback. -All three API endpoints (Responses, Completions, Gemini) now bypass the LS -when custom tools are active, reading directly from MitmStore. - ---- - -## ✅ Resolved - -### ~~LS Go LLM Client Ignores System TLS Trust Store~~ - -**Status: SOLVED (2026-02-14)** - -Previously the #1 blocker. The standalone LS (`--standalone` flag, now default) -routes all LLM API traffic through the MITM proxy with full decryption. - -**Solution:** - -1. **UID-scoped iptables** — `scripts/mitm-redirect.sh` creates an `antigravity-ls` - system user. iptables redirects only that UID's port-443 traffic → MITM port. -2. **Combined CA bundle** — The Go client honors `SSL_CERT_FILE` when set on - the standalone process. A combined bundle (system CAs + MITM CA) is written - to `/tmp/antigravity-mitm-combined-ca.pem`. -3. **`sudo -u` spawning** — The proxy spawns the LS as the `antigravity-ls` user, - so only the standalone LS traffic is intercepted. No impact on other software. -4. **Google SSE parsing** — MITM parses `streamGenerateContent?alt=sse` responses - and extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`. - -**Verified:** `/v1/usage` returns per-model token usage from intercepted traffic. - -### ~~Polling-Based Cascade Updates~~ - -**Status: SOLVED (2026-02-14)** - -`StreamCascadeReactiveUpdates` is now used for real-time cascade state -notifications. Falls back to timer-based polling if the streaming RPC is -unavailable. Reactive diffs also carry progressive response text and thinking -content (see `docs/panel-stream-investigation.md`). - -### ~~StreamCascadePanelReactiveUpdates — Dead End~~ - -**Status: INVESTIGATED & CLOSED (2026-02-14)** - -`CascadePanelState` only contains `plan_status` and `user_settings` — not -thinking text. The panel reactive component uses a workspace-scoped ID, not -cascade IDs. See `docs/panel-stream-investigation.md`. - -### ~~Request Modification Not Implemented~~ - -**Status: SOLVED (2026-02-15)** - -`MitmConfig.modify_requests` is now `true` by default. Used for: - -- Tool/function call injection into LS requests (Gemini `functionDeclarations`) -- Tool result injection as `functionResponse` parts -- LS bypass when custom tools are active (response captured directly from MITM) - -### ~~Cascade Correlation Is Heuristic~~ - -**Status: SOLVED (2026-02-15)** - -Previously, MITM usage was keyed under `_latest` because `extract_cascade_hint()` -couldn't parse the chunked-encoded Google SSE request body. - -**Fix:** API handlers now call `mitm_store.set_active_cascade(cascade_id)` before -sending messages. `record_usage()` falls back to this active cascade ID when the -heuristic hint is absent, properly correlating usage to cascades. - -### ~~Progressive Thinking Streaming~~ - -**Status: SOLVED (2026-02-15)** - -Thinking text now streams progressively as delta events. The implementation: - -1. **LS cascade steps** — `plannerResponse.thinking` (field 3) grows progressively - as the LS receives data. For Opus 4.6, thinking text builds up word-by-word - over ~1-2s. For Gemini Flash, thinking arrives in 1-2 larger chunks. -2. **Delta tracking** — `last_thinking_len` tracks the previously emitted length. - Each poll compares current thinking length and emits only the new characters - as `response.reasoning_summary_text.delta` events. -3. **Lifecycle** — Structure events (`output_item.added`, `summary_part.added`) - emit on first thinking appearance. `done` events emit when response text - first appears (indicating thinking phase completed). - -**Verified with Opus 4.6:** (2026-02-15 13:22 UTC) - -``` -delta_len=24 "The user is asking about" -delta_len=61 " the Collatz conjecture..." -delta_len=5 " This" -delta_len=10 " is a pure" -... (11 progressive deltas over ~850ms) -``` - ---- - -## 🟢 Low - -### 1. MITM Integration Tests - -Unit tests cover protobuf decoding and intercept parsing (18 tests pass). -Integration tests for the full MITM pipeline (TLS interception, response -parsing, usage recording) would be valuable now that interception works. - -### 2. MITM for Main Antigravity Session - -The current MITM only works for the standalone LS (default mode). -Intercepting the main Antigravity session's LS is harder because: - -- The main LS is managed by the Antigravity app, not by us -- UID-scoped iptables can't target it without affecting all user traffic -- The `mitm-wrapper.sh` approach sets env vars but the LLM client ignores - `HTTPS_PROXY` unless `detect_and_use_proxy` is ENABLED via init metadata - -**Workaround:** Use standalone mode (default) for all proxy traffic.