chore: remove outdated planning documents and the known issues file.chore: remove outdated planning documents and the known issues file.

2026-02-18 03:33:47 -06:00
parent 7577e28229
commit ea12127acb
4 changed files with 4 additions and 455 deletions
--- a/.gemini/plans/sync-and-latency.md
+++ b/.gemini/plans/sync-and-latency.md
@@ -1,46 +0,0 @@
 # Sync All Endpoints + Latency + Thinking Streaming
 ## Phase 1: Sync Responses API (`/v1/responses`) with LS bypass
 Current state:
 - `handle_responses_stream` (line 529-859) polls LS steps for text
 - Doesn't use MitmStore bypass at all
 - Still suffers from LS multi-turn overhead when tools are active
 Fix:
 - Add MITM bypass path (same as completions) — check MitmStore for text + function calls
 - For function calls: emit `response.output_item.added` (function_call type) + done events
 - For text: stream from MitmStore `captured_response_text` + `response_complete`
 ## Phase 2: Sync Gemini endpoint (`/v1/gemini`) with LS bypass
 Current state:
 - `handle_gemini` (line 57-236) uses `poll_for_response` then checks MitmStore
 - Already checks `take_any_function_calls()` after polling
 - But `poll_for_response` still goes through LS steps
 Fix:
 - When tools are active, poll MitmStore directly instead of `poll_for_response`
 ## Phase 3: Latency improvements
 - Reduce poll intervals across all handlers
 - Add MITM store thinking_text capture for real-time streaming
 ## Phase 4: Real-time thinking streaming investigation
 Current state:
 - Google SSE includes `thought: true` parts with thinking text
 - `streaming_acc.thinking_text` accumulates this
 - Currently only used for final usage stats, not streamed in real-time
 Investigation needed:
 - The MITM intercept already captures thinking_text per-chunk
 - Need to store thinking_text updates in MitmStore incrementally
 - Responses handler can then stream thinking deltas in real-time
--- a/.gemini/plans/tool-calls-implementation.md
+++ b/.gemini/plans/tool-calls-implementation.md
@@ -1,292 +0,0 @@
 # Tool Call Implementation Plan
 ## Overview
 Add full tool call support to the Antigravity proxy. Primary endpoint is OpenAI Responses API (`/v1/responses`), with a Gemini-native backup endpoint (`/v1/gemini`). Tools are stored per-session, all `tool_choice` modes supported, parallel tool calls supported.
 ## Data Flow
 ```
 ┌─────────┐      ┌───────────┐      ┌────┐      ┌──────┐      ┌────────┐
 │  Client  │─────▶│  Proxy    │─────▶│ LS │─────▶│ MITM │─────▶│ Google │
 │ (openai) │      │ (axum)    │      │    │      │      │      │        │
 │          │◀─────│           │◀─────│    │◀─────│      │◀─────│        │
 └─────────┘      └───────────┘      └────┘      └──────┘      └────────┘
     │                │                             │              │
     │  tools (OAI)   │  store tools (Gemini fmt)   │  inject      │
     │───────────────▶│────────────▶ MitmStore ─────▶│  tools       │
     │                │                             │──────────────▶│
     │                │                             │              │
     │                │                             │ functionCall  │
     │                │◀──── capture ───────────────│◀──────────────│
     │  tool_calls    │                             │ block follow  │
     │◀───────────────│                             │  ups          │
     │                │                             │              │
     │  tool result   │  store result               │  inject      │
     │───────────────▶│────────────▶ MitmStore ─────▶│ fn response  │
     │                │                             │──────────────▶│
     │  final text    │                             │              │
     │◀───────────────│◀────────────────────────────│◀──────────────│
 ```
 ## Format Differences
 ### Tool Definitions
 | Aspect       | OpenAI                                 | Gemini                             |
 | ------------ | -------------------------------------- | ---------------------------------- |
 | Wrapper      | `{"type":"function","function":{...}}` | `{"functionDeclarations":[{...}]}` |
 | Type strings | lowercase: `"object"`, `"string"`      | UPPERCASE: `"OBJECT"`, `"STRING"`  |
 | Parameters   | JSON Schema subset                     | Same schema, uppercase types       |
 ### Tool Choice
 | OpenAI                                        | Gemini toolConfig                                                       |
 | --------------------------------------------- | ----------------------------------------------------------------------- |
 | `"auto"`                                      | `{"functionCallingConfig":{"mode":"AUTO"}}`                             |
 | `"required"`                                  | `{"functionCallingConfig":{"mode":"ANY"}}`                              |
 | `"none"`                                      | `{"functionCallingConfig":{"mode":"NONE"}}`                             |
 | `{"type":"function","function":{"name":"X"}}` | `{"functionCallingConfig":{"mode":"ANY","allowedFunctionNames":["X"]}}` |
 ### Tool Call Response
 | OpenAI (what we return)                                                                            | Gemini (what Google returns)                                    |
 | -------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
 | `output: [{"type":"function_call","call_id":"call_xxx","name":"get_weather","arguments":"{...}"}]` | `parts: [{"functionCall":{"name":"get_weather","args":{...}}}]` |
 ### Tool Result Submission
 | OpenAI (what client sends)                                                       | Gemini (what we inject into Google request)                                                                                  |
 | -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
 | `input: [{"type":"function_call_output","call_id":"call_xxx","output":"{...}"}]` | `contents: [{role:"model",parts:[{functionCall:...}]},{role:"user",parts:[{functionResponse:{name:"...",response:{...}}}]}]` |
 ---
 ## Implementation Phases
 ### Phase 1: Store Infrastructure (`store.rs`)
 Add to `MitmStore`:
 ```rust
 /// Active tool definitions (Gemini format) for MITM injection.
 active_tools: Arc<RwLock<Option<Vec<Value>>>>,
 /// Active tool config (Gemini toolConfig format).
 active_tool_config: Arc<RwLock<Option<Value>>>,
 /// Pending tool results for MITM to inject as functionResponse.
 pending_tool_results: Arc<RwLock<Vec<PendingToolResult>>>,
 /// Mapping call_id → function name for tool result routing.
 call_id_to_name: Arc<RwLock<HashMap<String, String>>>,
 /// Last captured function calls (for conversation history rewriting).
 last_function_calls: Arc<RwLock<Vec<CapturedFunctionCall>>>,
 ```
 New types:
 ```rust
 pub struct PendingToolResult {
    pub name: String,
    pub result: serde_json::Value,
 }
 ```
 New methods:
 - `set_tools(tools)` / `get_tools()` / `clear_tools()`
 - `set_tool_config(config)` / `get_tool_config()`
 - `add_tool_result(result)` / `take_tool_results()`
 - `register_call_id(call_id, name)` / `lookup_call_id(call_id)`
 - `set_last_function_calls(calls)` / `get_last_function_calls()`
 ### Phase 2: Request Types (`types.rs`)
 Add to `ResponsesRequest`:
 ```rust
 #[serde(default)]
 pub tools: Option<Vec<serde_json::Value>>,
 #[serde(default)]
 pub tool_choice: Option<serde_json::Value>,
 ```
 New output builder:
 ```rust
 pub fn build_function_call_output(call_id: &str, name: &str, arguments: &str) -> Value
 ```
 ### Phase 3: Format Conversion + Dynamic Injection (`modify.rs`)
 New public struct:
 ```rust
 pub struct ToolContext {
    pub tools: Option<Vec<Value>>,          // Gemini functionDeclarations
    pub tool_config: Option<Value>,         // Gemini toolConfig
    pub pending_results: Vec<PendingToolResult>,  // Tool results to inject
    pub last_calls: Vec<CapturedFunctionCall>,    // For history rewriting
 }
 ```
 New conversion functions:
 ```rust
 pub fn openai_tools_to_gemini(tools: &[Value]) -> Vec<Value>     // OAI → Gemini format
 pub fn openai_tool_choice_to_gemini(choice: &Value) -> Value     // OAI → Gemini toolConfig
 fn uppercase_types(val: Value) -> Value                          // Recursive type case fix
 ```
 Change `modify_request` signature:
 ```rust
 pub fn modify_request(body: &[u8], tool_ctx: Option<&ToolContext>) -> Option<Vec<u8>>
 ```
 Tool injection logic:
 1. Strip all LS tools (existing)
 2. If `tool_ctx.tools` provided → inject as Gemini `functionDeclarations`
 3. If `tool_ctx.tool_config` provided → inject as `toolConfig`
 4. If `tool_ctx.pending_results` not empty → rewrite conversation history:
   - Find model turn with "Tool call completed" → replace with `functionCall` parts
   - Find last user turn → prepend `functionResponse` part
 ### Phase 4: MITM Plumbing (`proxy.rs`)
 In `handle_http_over_tls`, before calling `modify_request`:
 1. Read `get_tools()`, `get_tool_config()`, `take_tool_results()`, `get_last_function_calls()` from store
 2. Build `ToolContext`
 3. Pass to `modify_request(body, tool_ctx)`
 After response capture:
 1. Save captured function calls as `last_function_calls` (for future history rewriting)
 ### Phase 5: API Handler (`responses.rs`)
 #### Request handling (in `handle_responses`):
 1. If `body.tools` provided:
   - Convert OpenAI → Gemini format via `openai_tools_to_gemini()`
   - Store in `MitmStore` via `set_tools()`
 2. If `body.tool_choice` provided:
   - Convert via `openai_tool_choice_to_gemini()`
   - Store in `MitmStore` via `set_tool_config()`
 3. Check `body.input` for `function_call_output` items:
   - If found: look up `call_id` → function name via `lookup_call_id()`
   - Store as `PendingToolResult` via `add_tool_result()`
   - Extract any accompanying text (or use placeholder)
 #### Response handling (in `handle_responses_sync` / `handle_responses_stream`):
 After polling completes:
 1. Check `take_any_function_calls()` for captured tool calls
 2. If captured:
   - Generate `call_id` for each (e.g., `"call_" + random`)
   - Register `call_id → name` mapping via `register_call_id()`
   - Build `function_call` output items via `build_function_call_output()`
   - Return these INSTEAD of the text message output
 3. If no tool calls: existing text response behavior
 ### Phase 6: Gemini-Native Endpoint (`gemini.rs` + `mod.rs`)
 New file `src/api/gemini.rs` with handler `handle_gemini`:
 - Accepts tools in Gemini `functionDeclarations` format directly (no conversion)
 - Accepts `toolConfig` directly
 - Returns `functionCall` in Gemini format directly
 - Same cascade/session management as responses.rs
 - Much simpler — no format translation
 Route: `POST /v1/gemini` in `mod.rs`
 ---
 ## File Change Summary
 | File                   | Changes                                                                 | Complexity |
 | ---------------------- | ----------------------------------------------------------------------- | ---------- |
 | `src/mitm/store.rs`    | Add tool context storage (5 new fields, ~10 methods)                    | Medium     |
 | `src/api/types.rs`     | Add `tools`/`tool_choice` to request, add output builder                | Low        |
 | `src/mitm/modify.rs`   | `ToolContext`, format conversion, dynamic injection, history rewrite    | High       |
 | `src/mitm/proxy.rs`    | Read store → build ToolContext → pass to modify                         | Low        |
 | `src/api/responses.rs` | Store tools, detect tool results in input, return function_call outputs | High       |
 | `src/api/gemini.rs`    | New file — Gemini-native endpoint (passthrough)                         | Medium     |
 | `src/api/mod.rs`       | Add route + module declaration                                          | Low        |
 ## Implementation Order
 1. `store.rs` — foundation, no dependencies
 2. `types.rs` — request/response types
 3. `modify.rs` — format conversion + injection (depends on store types)
 4. `proxy.rs` — plumbing (depends on modify signature)
 5. Build + verify compilation
 6. `responses.rs` — handler changes (depends on all above)
 7. Build + test with `get_weather` request
 8. `gemini.rs` + `mod.rs` — Gemini endpoint
 9. Build + test with Gemini format
 10. Tool result flow test (multi-turn)
 ## Testing Strategy
 ### Test 1: Basic tool call (sync)
 ```bash
 curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
  "model": "gemini-3-flash",
  "input": "What is the weather in Tokyo?",
  "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
  "tool_choice": "auto",
  "conversation": "tool-test",
  "stream": false
 }'
 # Expected: output contains function_call with name=get_weather, arguments={"city":"Tokyo"}
 ```
 ### Test 2: Tool result submission (multi-turn)
 ```bash
 curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
  "model": "gemini-3-flash",
  "input": [{"type":"function_call_output","call_id":"call_xxx","output":"{\"temp\":72,\"unit\":\"F\"}"}],
  "conversation": "tool-test",
  "stream": false
 }'
 # Expected: output contains text response using the tool result
 ```
 ### Test 3: Gemini-native endpoint
 ```bash
 curl -s http://localhost:8741/v1/gemini -H "Content-Type: application/json" -d '{
  "model": "gemini-3-flash",
  "input": "What is the weather in Tokyo?",
  "tools": [{"functionDeclarations":[{"name":"get_weather","description":"Get weather","parameters":{"type":"OBJECT","properties":{"city":{"type":"STRING"}},"required":["city"]}}]}],
  "conversation": "gemini-tool-test",
  "stream": false
 }'
 # Expected: response contains functionCall in Gemini format
 ```
 ### Test 4: No tools (regression)
 ```bash
 curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
  "model": "gemini-3-flash",
  "input": "What is 2+2?",
  "stream": false
 }'
 # Expected: normal text response, no tool call behavior
 ```
 ## Risks & Mitigations
 | Risk                                                             | Impact | Mitigation                                                                |
 | ---------------------------------------------------------------- | ------ | ------------------------------------------------------------------------- |
 | History rewriting breaks conversation                            | High   | Only rewrite when pending_results non-empty; keep original as fallback    |
 | LS times out waiting for Google response during tool result turn | Medium | Increase timeout for tool result turns                                    |
 | Multiple parallel tool calls create race conditions              | Medium | AtomicBool + sequential processing already handles this                   |
 | `modify_request` test breakage                                   | Low    | Update existing tests for new signature                                   |
 | Global tool storage conflicts across concurrent requests         | Medium | Not an issue — LS processes one request at a time (single cascade active) |
--- a/.gitignore
+++ b/.gitignore
@@ -7,3 +7,7 @@
 !README.txt
 test_output.json
 captured-request-*.json
 # Agent artifacts
 .gemini/plans/
 KNOWN_ISSUES.md
--- a/KNOWN_ISSUES.md
+++ b/KNOWN_ISSUES.md
@@ -1,117 +0,0 @@
 # Known Issues & Future Work
 All critical blockers have been resolved. Standalone LS with MITM interception
 is fully working. Reactive streaming is implemented with polling fallback.
 All three API endpoints (Responses, Completions, Gemini) now bypass the LS
 when custom tools are active, reading directly from MitmStore.
 ---
 ## ✅ Resolved
 ### ~~LS Go LLM Client Ignores System TLS Trust Store~~
 **Status: SOLVED (2026-02-14)**
 Previously the #1 blocker. The standalone LS (`--standalone` flag, now default)
 routes all LLM API traffic through the MITM proxy with full decryption.
 **Solution:**
 1. **UID-scoped iptables** — `scripts/mitm-redirect.sh` creates an `antigravity-ls`
   system user. iptables redirects only that UID's port-443 traffic → MITM port.
 2. **Combined CA bundle** — The Go client honors `SSL_CERT_FILE` when set on
   the standalone process. A combined bundle (system CAs + MITM CA) is written
   to `/tmp/antigravity-mitm-combined-ca.pem`.
 3. **`sudo -u` spawning** — The proxy spawns the LS as the `antigravity-ls` user,
   so only the standalone LS traffic is intercepted. No impact on other software.
 4. **Google SSE parsing** — MITM parses `streamGenerateContent?alt=sse` responses
   and extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`.
 **Verified:** `/v1/usage` returns per-model token usage from intercepted traffic.
 ### ~~Polling-Based Cascade Updates~~
 **Status: SOLVED (2026-02-14)**
 `StreamCascadeReactiveUpdates` is now used for real-time cascade state
 notifications. Falls back to timer-based polling if the streaming RPC is
 unavailable. Reactive diffs also carry progressive response text and thinking
 content (see `docs/panel-stream-investigation.md`).
 ### ~~StreamCascadePanelReactiveUpdates — Dead End~~
 **Status: INVESTIGATED & CLOSED (2026-02-14)**
 `CascadePanelState` only contains `plan_status` and `user_settings` — not
 thinking text. The panel reactive component uses a workspace-scoped ID, not
 cascade IDs. See `docs/panel-stream-investigation.md`.
 ### ~~Request Modification Not Implemented~~
 **Status: SOLVED (2026-02-15)**
 `MitmConfig.modify_requests` is now `true` by default. Used for:
 - Tool/function call injection into LS requests (Gemini `functionDeclarations`)
 - Tool result injection as `functionResponse` parts
 - LS bypass when custom tools are active (response captured directly from MITM)
 ### ~~Cascade Correlation Is Heuristic~~
 **Status: SOLVED (2026-02-15)**
 Previously, MITM usage was keyed under `_latest` because `extract_cascade_hint()`
 couldn't parse the chunked-encoded Google SSE request body.
 **Fix:** API handlers now call `mitm_store.set_active_cascade(cascade_id)` before
 sending messages. `record_usage()` falls back to this active cascade ID when the
 heuristic hint is absent, properly correlating usage to cascades.
 ### ~~Progressive Thinking Streaming~~
 **Status: SOLVED (2026-02-15)**
 Thinking text now streams progressively as delta events. The implementation:
 1. **LS cascade steps** — `plannerResponse.thinking` (field 3) grows progressively
   as the LS receives data. For Opus 4.6, thinking text builds up word-by-word
   over ~1-2s. For Gemini Flash, thinking arrives in 1-2 larger chunks.
 2. **Delta tracking** — `last_thinking_len` tracks the previously emitted length.
   Each poll compares current thinking length and emits only the new characters
   as `response.reasoning_summary_text.delta` events.
 3. **Lifecycle** — Structure events (`output_item.added`, `summary_part.added`)
   emit on first thinking appearance. `done` events emit when response text
   first appears (indicating thinking phase completed).
 **Verified with Opus 4.6:** (2026-02-15 13:22 UTC)
 ```
 delta_len=24  "The user is asking about"
 delta_len=61  " the Collatz conjecture..."
 delta_len=5   " This"
 delta_len=10  " is a pure"
 ... (11 progressive deltas over ~850ms)
 ```
 ---
 ## 🟢 Low
 ### 1. MITM Integration Tests
 Unit tests cover protobuf decoding and intercept parsing (18 tests pass).
 Integration tests for the full MITM pipeline (TLS interception, response
 parsing, usage recording) would be valuable now that interception works.
 ### 2. MITM for Main Antigravity Session
 The current MITM only works for the standalone LS (default mode).
 Intercepting the main Antigravity session's LS is harder because:
 - The main LS is managed by the Antigravity app, not by us
 - UID-scoped iptables can't target it without affecting all user traffic
 - The `mitm-wrapper.sh` approach sets env vars but the LLM client ignores
  `HTTPS_PROXY` unless `detect_and_use_proxy` is ENABLED via init metadata
 **Workaround:** Use standalone mode (default) for all proxy traffic.