chore: remove outdated planning documents and the known issues file.chore: remove outdated planning documents and the known issues file.
This commit is contained in:
@@ -1,46 +0,0 @@
|
||||
# Sync All Endpoints + Latency + Thinking Streaming
|
||||
|
||||
## Phase 1: Sync Responses API (`/v1/responses`) with LS bypass
|
||||
|
||||
Current state:
|
||||
|
||||
- `handle_responses_stream` (line 529-859) polls LS steps for text
|
||||
- Doesn't use MitmStore bypass at all
|
||||
- Still suffers from LS multi-turn overhead when tools are active
|
||||
|
||||
Fix:
|
||||
|
||||
- Add MITM bypass path (same as completions) — check MitmStore for text + function calls
|
||||
- For function calls: emit `response.output_item.added` (function_call type) + done events
|
||||
- For text: stream from MitmStore `captured_response_text` + `response_complete`
|
||||
|
||||
## Phase 2: Sync Gemini endpoint (`/v1/gemini`) with LS bypass
|
||||
|
||||
Current state:
|
||||
|
||||
- `handle_gemini` (line 57-236) uses `poll_for_response` then checks MitmStore
|
||||
- Already checks `take_any_function_calls()` after polling
|
||||
- But `poll_for_response` still goes through LS steps
|
||||
|
||||
Fix:
|
||||
|
||||
- When tools are active, poll MitmStore directly instead of `poll_for_response`
|
||||
|
||||
## Phase 3: Latency improvements
|
||||
|
||||
- Reduce poll intervals across all handlers
|
||||
- Add MITM store thinking_text capture for real-time streaming
|
||||
|
||||
## Phase 4: Real-time thinking streaming investigation
|
||||
|
||||
Current state:
|
||||
|
||||
- Google SSE includes `thought: true` parts with thinking text
|
||||
- `streaming_acc.thinking_text` accumulates this
|
||||
- Currently only used for final usage stats, not streamed in real-time
|
||||
|
||||
Investigation needed:
|
||||
|
||||
- The MITM intercept already captures thinking_text per-chunk
|
||||
- Need to store thinking_text updates in MitmStore incrementally
|
||||
- Responses handler can then stream thinking deltas in real-time
|
||||
@@ -1,292 +0,0 @@
|
||||
# Tool Call Implementation Plan
|
||||
|
||||
## Overview
|
||||
|
||||
Add full tool call support to the Antigravity proxy. Primary endpoint is OpenAI Responses API (`/v1/responses`), with a Gemini-native backup endpoint (`/v1/gemini`). Tools are stored per-session, all `tool_choice` modes supported, parallel tool calls supported.
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
┌─────────┐ ┌───────────┐ ┌────┐ ┌──────┐ ┌────────┐
|
||||
│ Client │─────▶│ Proxy │─────▶│ LS │─────▶│ MITM │─────▶│ Google │
|
||||
│ (openai) │ │ (axum) │ │ │ │ │ │ │
|
||||
│ │◀─────│ │◀─────│ │◀─────│ │◀─────│ │
|
||||
└─────────┘ └───────────┘ └────┘ └──────┘ └────────┘
|
||||
│ │ │ │
|
||||
│ tools (OAI) │ store tools (Gemini fmt) │ inject │
|
||||
│───────────────▶│────────────▶ MitmStore ─────▶│ tools │
|
||||
│ │ │──────────────▶│
|
||||
│ │ │ │
|
||||
│ │ │ functionCall │
|
||||
│ │◀──── capture ───────────────│◀──────────────│
|
||||
│ tool_calls │ │ block follow │
|
||||
│◀───────────────│ │ ups │
|
||||
│ │ │ │
|
||||
│ tool result │ store result │ inject │
|
||||
│───────────────▶│────────────▶ MitmStore ─────▶│ fn response │
|
||||
│ │ │──────────────▶│
|
||||
│ final text │ │ │
|
||||
│◀───────────────│◀────────────────────────────│◀──────────────│
|
||||
```
|
||||
|
||||
## Format Differences
|
||||
|
||||
### Tool Definitions
|
||||
|
||||
| Aspect | OpenAI | Gemini |
|
||||
| ------------ | -------------------------------------- | ---------------------------------- |
|
||||
| Wrapper | `{"type":"function","function":{...}}` | `{"functionDeclarations":[{...}]}` |
|
||||
| Type strings | lowercase: `"object"`, `"string"` | UPPERCASE: `"OBJECT"`, `"STRING"` |
|
||||
| Parameters | JSON Schema subset | Same schema, uppercase types |
|
||||
|
||||
### Tool Choice
|
||||
|
||||
| OpenAI | Gemini toolConfig |
|
||||
| --------------------------------------------- | ----------------------------------------------------------------------- |
|
||||
| `"auto"` | `{"functionCallingConfig":{"mode":"AUTO"}}` |
|
||||
| `"required"` | `{"functionCallingConfig":{"mode":"ANY"}}` |
|
||||
| `"none"` | `{"functionCallingConfig":{"mode":"NONE"}}` |
|
||||
| `{"type":"function","function":{"name":"X"}}` | `{"functionCallingConfig":{"mode":"ANY","allowedFunctionNames":["X"]}}` |
|
||||
|
||||
### Tool Call Response
|
||||
|
||||
| OpenAI (what we return) | Gemini (what Google returns) |
|
||||
| -------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
|
||||
| `output: [{"type":"function_call","call_id":"call_xxx","name":"get_weather","arguments":"{...}"}]` | `parts: [{"functionCall":{"name":"get_weather","args":{...}}}]` |
|
||||
|
||||
### Tool Result Submission
|
||||
|
||||
| OpenAI (what client sends) | Gemini (what we inject into Google request) |
|
||||
| -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `input: [{"type":"function_call_output","call_id":"call_xxx","output":"{...}"}]` | `contents: [{role:"model",parts:[{functionCall:...}]},{role:"user",parts:[{functionResponse:{name:"...",response:{...}}}]}]` |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Store Infrastructure (`store.rs`)
|
||||
|
||||
Add to `MitmStore`:
|
||||
|
||||
```rust
|
||||
/// Active tool definitions (Gemini format) for MITM injection.
|
||||
active_tools: Arc<RwLock<Option<Vec<Value>>>>,
|
||||
/// Active tool config (Gemini toolConfig format).
|
||||
active_tool_config: Arc<RwLock<Option<Value>>>,
|
||||
/// Pending tool results for MITM to inject as functionResponse.
|
||||
pending_tool_results: Arc<RwLock<Vec<PendingToolResult>>>,
|
||||
/// Mapping call_id → function name for tool result routing.
|
||||
call_id_to_name: Arc<RwLock<HashMap<String, String>>>,
|
||||
/// Last captured function calls (for conversation history rewriting).
|
||||
last_function_calls: Arc<RwLock<Vec<CapturedFunctionCall>>>,
|
||||
```
|
||||
|
||||
New types:
|
||||
|
||||
```rust
|
||||
pub struct PendingToolResult {
|
||||
pub name: String,
|
||||
pub result: serde_json::Value,
|
||||
}
|
||||
```
|
||||
|
||||
New methods:
|
||||
|
||||
- `set_tools(tools)` / `get_tools()` / `clear_tools()`
|
||||
- `set_tool_config(config)` / `get_tool_config()`
|
||||
- `add_tool_result(result)` / `take_tool_results()`
|
||||
- `register_call_id(call_id, name)` / `lookup_call_id(call_id)`
|
||||
- `set_last_function_calls(calls)` / `get_last_function_calls()`
|
||||
|
||||
### Phase 2: Request Types (`types.rs`)
|
||||
|
||||
Add to `ResponsesRequest`:
|
||||
|
||||
```rust
|
||||
#[serde(default)]
|
||||
pub tools: Option<Vec<serde_json::Value>>,
|
||||
#[serde(default)]
|
||||
pub tool_choice: Option<serde_json::Value>,
|
||||
```
|
||||
|
||||
New output builder:
|
||||
|
||||
```rust
|
||||
pub fn build_function_call_output(call_id: &str, name: &str, arguments: &str) -> Value
|
||||
```
|
||||
|
||||
### Phase 3: Format Conversion + Dynamic Injection (`modify.rs`)
|
||||
|
||||
New public struct:
|
||||
|
||||
```rust
|
||||
pub struct ToolContext {
|
||||
pub tools: Option<Vec<Value>>, // Gemini functionDeclarations
|
||||
pub tool_config: Option<Value>, // Gemini toolConfig
|
||||
pub pending_results: Vec<PendingToolResult>, // Tool results to inject
|
||||
pub last_calls: Vec<CapturedFunctionCall>, // For history rewriting
|
||||
}
|
||||
```
|
||||
|
||||
New conversion functions:
|
||||
|
||||
```rust
|
||||
pub fn openai_tools_to_gemini(tools: &[Value]) -> Vec<Value> // OAI → Gemini format
|
||||
pub fn openai_tool_choice_to_gemini(choice: &Value) -> Value // OAI → Gemini toolConfig
|
||||
fn uppercase_types(val: Value) -> Value // Recursive type case fix
|
||||
```
|
||||
|
||||
Change `modify_request` signature:
|
||||
|
||||
```rust
|
||||
pub fn modify_request(body: &[u8], tool_ctx: Option<&ToolContext>) -> Option<Vec<u8>>
|
||||
```
|
||||
|
||||
Tool injection logic:
|
||||
|
||||
1. Strip all LS tools (existing)
|
||||
2. If `tool_ctx.tools` provided → inject as Gemini `functionDeclarations`
|
||||
3. If `tool_ctx.tool_config` provided → inject as `toolConfig`
|
||||
4. If `tool_ctx.pending_results` not empty → rewrite conversation history:
|
||||
- Find model turn with "Tool call completed" → replace with `functionCall` parts
|
||||
- Find last user turn → prepend `functionResponse` part
|
||||
|
||||
### Phase 4: MITM Plumbing (`proxy.rs`)
|
||||
|
||||
In `handle_http_over_tls`, before calling `modify_request`:
|
||||
|
||||
1. Read `get_tools()`, `get_tool_config()`, `take_tool_results()`, `get_last_function_calls()` from store
|
||||
2. Build `ToolContext`
|
||||
3. Pass to `modify_request(body, tool_ctx)`
|
||||
|
||||
After response capture:
|
||||
|
||||
1. Save captured function calls as `last_function_calls` (for future history rewriting)
|
||||
|
||||
### Phase 5: API Handler (`responses.rs`)
|
||||
|
||||
#### Request handling (in `handle_responses`):
|
||||
|
||||
1. If `body.tools` provided:
|
||||
- Convert OpenAI → Gemini format via `openai_tools_to_gemini()`
|
||||
- Store in `MitmStore` via `set_tools()`
|
||||
2. If `body.tool_choice` provided:
|
||||
- Convert via `openai_tool_choice_to_gemini()`
|
||||
- Store in `MitmStore` via `set_tool_config()`
|
||||
3. Check `body.input` for `function_call_output` items:
|
||||
- If found: look up `call_id` → function name via `lookup_call_id()`
|
||||
- Store as `PendingToolResult` via `add_tool_result()`
|
||||
- Extract any accompanying text (or use placeholder)
|
||||
|
||||
#### Response handling (in `handle_responses_sync` / `handle_responses_stream`):
|
||||
|
||||
After polling completes:
|
||||
|
||||
1. Check `take_any_function_calls()` for captured tool calls
|
||||
2. If captured:
|
||||
- Generate `call_id` for each (e.g., `"call_" + random`)
|
||||
- Register `call_id → name` mapping via `register_call_id()`
|
||||
- Build `function_call` output items via `build_function_call_output()`
|
||||
- Return these INSTEAD of the text message output
|
||||
3. If no tool calls: existing text response behavior
|
||||
|
||||
### Phase 6: Gemini-Native Endpoint (`gemini.rs` + `mod.rs`)
|
||||
|
||||
New file `src/api/gemini.rs` with handler `handle_gemini`:
|
||||
|
||||
- Accepts tools in Gemini `functionDeclarations` format directly (no conversion)
|
||||
- Accepts `toolConfig` directly
|
||||
- Returns `functionCall` in Gemini format directly
|
||||
- Same cascade/session management as responses.rs
|
||||
- Much simpler — no format translation
|
||||
|
||||
Route: `POST /v1/gemini` in `mod.rs`
|
||||
|
||||
---
|
||||
|
||||
## File Change Summary
|
||||
|
||||
| File | Changes | Complexity |
|
||||
| ---------------------- | ----------------------------------------------------------------------- | ---------- |
|
||||
| `src/mitm/store.rs` | Add tool context storage (5 new fields, ~10 methods) | Medium |
|
||||
| `src/api/types.rs` | Add `tools`/`tool_choice` to request, add output builder | Low |
|
||||
| `src/mitm/modify.rs` | `ToolContext`, format conversion, dynamic injection, history rewrite | High |
|
||||
| `src/mitm/proxy.rs` | Read store → build ToolContext → pass to modify | Low |
|
||||
| `src/api/responses.rs` | Store tools, detect tool results in input, return function_call outputs | High |
|
||||
| `src/api/gemini.rs` | New file — Gemini-native endpoint (passthrough) | Medium |
|
||||
| `src/api/mod.rs` | Add route + module declaration | Low |
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. `store.rs` — foundation, no dependencies
|
||||
2. `types.rs` — request/response types
|
||||
3. `modify.rs` — format conversion + injection (depends on store types)
|
||||
4. `proxy.rs` — plumbing (depends on modify signature)
|
||||
5. Build + verify compilation
|
||||
6. `responses.rs` — handler changes (depends on all above)
|
||||
7. Build + test with `get_weather` request
|
||||
8. `gemini.rs` + `mod.rs` — Gemini endpoint
|
||||
9. Build + test with Gemini format
|
||||
10. Tool result flow test (multi-turn)
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Test 1: Basic tool call (sync)
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
|
||||
"model": "gemini-3-flash",
|
||||
"input": "What is the weather in Tokyo?",
|
||||
"tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
|
||||
"tool_choice": "auto",
|
||||
"conversation": "tool-test",
|
||||
"stream": false
|
||||
}'
|
||||
# Expected: output contains function_call with name=get_weather, arguments={"city":"Tokyo"}
|
||||
```
|
||||
|
||||
### Test 2: Tool result submission (multi-turn)
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
|
||||
"model": "gemini-3-flash",
|
||||
"input": [{"type":"function_call_output","call_id":"call_xxx","output":"{\"temp\":72,\"unit\":\"F\"}"}],
|
||||
"conversation": "tool-test",
|
||||
"stream": false
|
||||
}'
|
||||
# Expected: output contains text response using the tool result
|
||||
```
|
||||
|
||||
### Test 3: Gemini-native endpoint
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:8741/v1/gemini -H "Content-Type: application/json" -d '{
|
||||
"model": "gemini-3-flash",
|
||||
"input": "What is the weather in Tokyo?",
|
||||
"tools": [{"functionDeclarations":[{"name":"get_weather","description":"Get weather","parameters":{"type":"OBJECT","properties":{"city":{"type":"STRING"}},"required":["city"]}}]}],
|
||||
"conversation": "gemini-tool-test",
|
||||
"stream": false
|
||||
}'
|
||||
# Expected: response contains functionCall in Gemini format
|
||||
```
|
||||
|
||||
### Test 4: No tools (regression)
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
|
||||
"model": "gemini-3-flash",
|
||||
"input": "What is 2+2?",
|
||||
"stream": false
|
||||
}'
|
||||
# Expected: normal text response, no tool call behavior
|
||||
```
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
| Risk | Impact | Mitigation |
|
||||
| ---------------------------------------------------------------- | ------ | ------------------------------------------------------------------------- |
|
||||
| History rewriting breaks conversation | High | Only rewrite when pending_results non-empty; keep original as fallback |
|
||||
| LS times out waiting for Google response during tool result turn | Medium | Increase timeout for tool result turns |
|
||||
| Multiple parallel tool calls create race conditions | Medium | AtomicBool + sequential processing already handles this |
|
||||
| `modify_request` test breakage | Low | Update existing tests for new signature |
|
||||
| Global tool storage conflicts across concurrent requests | Medium | Not an issue — LS processes one request at a time (single cascade active) |
|
||||
4
.gitignore
vendored
4
.gitignore
vendored
@@ -7,3 +7,7 @@
|
||||
!README.txt
|
||||
test_output.json
|
||||
captured-request-*.json
|
||||
|
||||
# Agent artifacts
|
||||
.gemini/plans/
|
||||
KNOWN_ISSUES.md
|
||||
|
||||
117
KNOWN_ISSUES.md
117
KNOWN_ISSUES.md
@@ -1,117 +0,0 @@
|
||||
# Known Issues & Future Work
|
||||
|
||||
All critical blockers have been resolved. Standalone LS with MITM interception
|
||||
is fully working. Reactive streaming is implemented with polling fallback.
|
||||
All three API endpoints (Responses, Completions, Gemini) now bypass the LS
|
||||
when custom tools are active, reading directly from MitmStore.
|
||||
|
||||
---
|
||||
|
||||
## ✅ Resolved
|
||||
|
||||
### ~~LS Go LLM Client Ignores System TLS Trust Store~~
|
||||
|
||||
**Status: SOLVED (2026-02-14)**
|
||||
|
||||
Previously the #1 blocker. The standalone LS (`--standalone` flag, now default)
|
||||
routes all LLM API traffic through the MITM proxy with full decryption.
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. **UID-scoped iptables** — `scripts/mitm-redirect.sh` creates an `antigravity-ls`
|
||||
system user. iptables redirects only that UID's port-443 traffic → MITM port.
|
||||
2. **Combined CA bundle** — The Go client honors `SSL_CERT_FILE` when set on
|
||||
the standalone process. A combined bundle (system CAs + MITM CA) is written
|
||||
to `/tmp/antigravity-mitm-combined-ca.pem`.
|
||||
3. **`sudo -u` spawning** — The proxy spawns the LS as the `antigravity-ls` user,
|
||||
so only the standalone LS traffic is intercepted. No impact on other software.
|
||||
4. **Google SSE parsing** — MITM parses `streamGenerateContent?alt=sse` responses
|
||||
and extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`.
|
||||
|
||||
**Verified:** `/v1/usage` returns per-model token usage from intercepted traffic.
|
||||
|
||||
### ~~Polling-Based Cascade Updates~~
|
||||
|
||||
**Status: SOLVED (2026-02-14)**
|
||||
|
||||
`StreamCascadeReactiveUpdates` is now used for real-time cascade state
|
||||
notifications. Falls back to timer-based polling if the streaming RPC is
|
||||
unavailable. Reactive diffs also carry progressive response text and thinking
|
||||
content (see `docs/panel-stream-investigation.md`).
|
||||
|
||||
### ~~StreamCascadePanelReactiveUpdates — Dead End~~
|
||||
|
||||
**Status: INVESTIGATED & CLOSED (2026-02-14)**
|
||||
|
||||
`CascadePanelState` only contains `plan_status` and `user_settings` — not
|
||||
thinking text. The panel reactive component uses a workspace-scoped ID, not
|
||||
cascade IDs. See `docs/panel-stream-investigation.md`.
|
||||
|
||||
### ~~Request Modification Not Implemented~~
|
||||
|
||||
**Status: SOLVED (2026-02-15)**
|
||||
|
||||
`MitmConfig.modify_requests` is now `true` by default. Used for:
|
||||
|
||||
- Tool/function call injection into LS requests (Gemini `functionDeclarations`)
|
||||
- Tool result injection as `functionResponse` parts
|
||||
- LS bypass when custom tools are active (response captured directly from MITM)
|
||||
|
||||
### ~~Cascade Correlation Is Heuristic~~
|
||||
|
||||
**Status: SOLVED (2026-02-15)**
|
||||
|
||||
Previously, MITM usage was keyed under `_latest` because `extract_cascade_hint()`
|
||||
couldn't parse the chunked-encoded Google SSE request body.
|
||||
|
||||
**Fix:** API handlers now call `mitm_store.set_active_cascade(cascade_id)` before
|
||||
sending messages. `record_usage()` falls back to this active cascade ID when the
|
||||
heuristic hint is absent, properly correlating usage to cascades.
|
||||
|
||||
### ~~Progressive Thinking Streaming~~
|
||||
|
||||
**Status: SOLVED (2026-02-15)**
|
||||
|
||||
Thinking text now streams progressively as delta events. The implementation:
|
||||
|
||||
1. **LS cascade steps** — `plannerResponse.thinking` (field 3) grows progressively
|
||||
as the LS receives data. For Opus 4.6, thinking text builds up word-by-word
|
||||
over ~1-2s. For Gemini Flash, thinking arrives in 1-2 larger chunks.
|
||||
2. **Delta tracking** — `last_thinking_len` tracks the previously emitted length.
|
||||
Each poll compares current thinking length and emits only the new characters
|
||||
as `response.reasoning_summary_text.delta` events.
|
||||
3. **Lifecycle** — Structure events (`output_item.added`, `summary_part.added`)
|
||||
emit on first thinking appearance. `done` events emit when response text
|
||||
first appears (indicating thinking phase completed).
|
||||
|
||||
**Verified with Opus 4.6:** (2026-02-15 13:22 UTC)
|
||||
|
||||
```
|
||||
delta_len=24 "The user is asking about"
|
||||
delta_len=61 " the Collatz conjecture..."
|
||||
delta_len=5 " This"
|
||||
delta_len=10 " is a pure"
|
||||
... (11 progressive deltas over ~850ms)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🟢 Low
|
||||
|
||||
### 1. MITM Integration Tests
|
||||
|
||||
Unit tests cover protobuf decoding and intercept parsing (18 tests pass).
|
||||
Integration tests for the full MITM pipeline (TLS interception, response
|
||||
parsing, usage recording) would be valuable now that interception works.
|
||||
|
||||
### 2. MITM for Main Antigravity Session
|
||||
|
||||
The current MITM only works for the standalone LS (default mode).
|
||||
Intercepting the main Antigravity session's LS is harder because:
|
||||
|
||||
- The main LS is managed by the Antigravity app, not by us
|
||||
- UID-scoped iptables can't target it without affecting all user traffic
|
||||
- The `mitm-wrapper.sh` approach sets env vars but the LLM client ignores
|
||||
`HTTPS_PROXY` unless `detect_and_use_proxy` is ENABLED via init metadata
|
||||
|
||||
**Workaround:** Use standalone mode (default) for all proxy traffic.
|
||||
Reference in New Issue
Block a user