Files
zerogravity/.gemini/plans/tool-calls-implementation.md
Nikketryhard 786987116b feat: full tool call support (OpenAI + Gemini endpoints)
- store.rs: Add tool context storage (active tools, tool config, pending
  tool results, call_id mapping, last function calls for history rewrite)
- types.rs: Add tools/tool_choice fields to ResponsesRequest, add
  build_function_call_output helper for OpenAI function_call output items
- modify.rs: Replace hardcoded get_weather with dynamic ToolContext
  injection. Add openai_tools_to_gemini and openai_tool_choice_to_gemini
  converters. Add conversation history rewriting for tool result turns
  (replaces fake 'Tool call completed' model turn with real functionCall,
  injects functionResponse before last user turn)
- proxy.rs: Build ToolContext from MitmStore before calling modify_request.
  Save last_function_calls for history rewriting on subsequent turns
- responses.rs: Store client tools in MitmStore before LS call. Detect
  function_call_output in input array for tool result submission. Return
  captured functionCalls as OpenAI function_call output items with
  generated call_ids and stringified arguments
- gemini.rs: New Gemini-native endpoint (POST /v1/gemini) with zero
  format translation. Accepts functionDeclarations directly, returns
  functionCall in Gemini format directly
- mod.rs: Wire /v1/gemini route, bump version to 3.3.0
2026-02-14 22:56:44 -06:00

293 lines
14 KiB
Markdown

# Tool Call Implementation Plan
## Overview
Add full tool call support to the Antigravity proxy. Primary endpoint is OpenAI Responses API (`/v1/responses`), with a Gemini-native backup endpoint (`/v1/gemini`). Tools are stored per-session, all `tool_choice` modes supported, parallel tool calls supported.
## Data Flow
```
┌─────────┐ ┌───────────┐ ┌────┐ ┌──────┐ ┌────────┐
│ Client │─────▶│ Proxy │─────▶│ LS │─────▶│ MITM │─────▶│ Google │
│ (openai) │ │ (axum) │ │ │ │ │ │ │
│ │◀─────│ │◀─────│ │◀─────│ │◀─────│ │
└─────────┘ └───────────┘ └────┘ └──────┘ └────────┘
│ │ │ │
│ tools (OAI) │ store tools (Gemini fmt) │ inject │
│───────────────▶│────────────▶ MitmStore ─────▶│ tools │
│ │ │──────────────▶│
│ │ │ │
│ │ │ functionCall │
│ │◀──── capture ───────────────│◀──────────────│
│ tool_calls │ │ block follow │
│◀───────────────│ │ ups │
│ │ │ │
│ tool result │ store result │ inject │
│───────────────▶│────────────▶ MitmStore ─────▶│ fn response │
│ │ │──────────────▶│
│ final text │ │ │
│◀───────────────│◀────────────────────────────│◀──────────────│
```
## Format Differences
### Tool Definitions
| Aspect | OpenAI | Gemini |
| ------------ | -------------------------------------- | ---------------------------------- |
| Wrapper | `{"type":"function","function":{...}}` | `{"functionDeclarations":[{...}]}` |
| Type strings | lowercase: `"object"`, `"string"` | UPPERCASE: `"OBJECT"`, `"STRING"` |
| Parameters | JSON Schema subset | Same schema, uppercase types |
### Tool Choice
| OpenAI | Gemini toolConfig |
| --------------------------------------------- | ----------------------------------------------------------------------- |
| `"auto"` | `{"functionCallingConfig":{"mode":"AUTO"}}` |
| `"required"` | `{"functionCallingConfig":{"mode":"ANY"}}` |
| `"none"` | `{"functionCallingConfig":{"mode":"NONE"}}` |
| `{"type":"function","function":{"name":"X"}}` | `{"functionCallingConfig":{"mode":"ANY","allowedFunctionNames":["X"]}}` |
### Tool Call Response
| OpenAI (what we return) | Gemini (what Google returns) |
| -------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
| `output: [{"type":"function_call","call_id":"call_xxx","name":"get_weather","arguments":"{...}"}]` | `parts: [{"functionCall":{"name":"get_weather","args":{...}}}]` |
### Tool Result Submission
| OpenAI (what client sends) | Gemini (what we inject into Google request) |
| -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `input: [{"type":"function_call_output","call_id":"call_xxx","output":"{...}"}]` | `contents: [{role:"model",parts:[{functionCall:...}]},{role:"user",parts:[{functionResponse:{name:"...",response:{...}}}]}]` |
---
## Implementation Phases
### Phase 1: Store Infrastructure (`store.rs`)
Add to `MitmStore`:
```rust
/// Active tool definitions (Gemini format) for MITM injection.
active_tools: Arc<RwLock<Option<Vec<Value>>>>,
/// Active tool config (Gemini toolConfig format).
active_tool_config: Arc<RwLock<Option<Value>>>,
/// Pending tool results for MITM to inject as functionResponse.
pending_tool_results: Arc<RwLock<Vec<PendingToolResult>>>,
/// Mapping call_id → function name for tool result routing.
call_id_to_name: Arc<RwLock<HashMap<String, String>>>,
/// Last captured function calls (for conversation history rewriting).
last_function_calls: Arc<RwLock<Vec<CapturedFunctionCall>>>,
```
New types:
```rust
pub struct PendingToolResult {
pub name: String,
pub result: serde_json::Value,
}
```
New methods:
- `set_tools(tools)` / `get_tools()` / `clear_tools()`
- `set_tool_config(config)` / `get_tool_config()`
- `add_tool_result(result)` / `take_tool_results()`
- `register_call_id(call_id, name)` / `lookup_call_id(call_id)`
- `set_last_function_calls(calls)` / `get_last_function_calls()`
### Phase 2: Request Types (`types.rs`)
Add to `ResponsesRequest`:
```rust
#[serde(default)]
pub tools: Option<Vec<serde_json::Value>>,
#[serde(default)]
pub tool_choice: Option<serde_json::Value>,
```
New output builder:
```rust
pub fn build_function_call_output(call_id: &str, name: &str, arguments: &str) -> Value
```
### Phase 3: Format Conversion + Dynamic Injection (`modify.rs`)
New public struct:
```rust
pub struct ToolContext {
pub tools: Option<Vec<Value>>, // Gemini functionDeclarations
pub tool_config: Option<Value>, // Gemini toolConfig
pub pending_results: Vec<PendingToolResult>, // Tool results to inject
pub last_calls: Vec<CapturedFunctionCall>, // For history rewriting
}
```
New conversion functions:
```rust
pub fn openai_tools_to_gemini(tools: &[Value]) -> Vec<Value> // OAI → Gemini format
pub fn openai_tool_choice_to_gemini(choice: &Value) -> Value // OAI → Gemini toolConfig
fn uppercase_types(val: Value) -> Value // Recursive type case fix
```
Change `modify_request` signature:
```rust
pub fn modify_request(body: &[u8], tool_ctx: Option<&ToolContext>) -> Option<Vec<u8>>
```
Tool injection logic:
1. Strip all LS tools (existing)
2. If `tool_ctx.tools` provided → inject as Gemini `functionDeclarations`
3. If `tool_ctx.tool_config` provided → inject as `toolConfig`
4. If `tool_ctx.pending_results` not empty → rewrite conversation history:
- Find model turn with "Tool call completed" → replace with `functionCall` parts
- Find last user turn → prepend `functionResponse` part
### Phase 4: MITM Plumbing (`proxy.rs`)
In `handle_http_over_tls`, before calling `modify_request`:
1. Read `get_tools()`, `get_tool_config()`, `take_tool_results()`, `get_last_function_calls()` from store
2. Build `ToolContext`
3. Pass to `modify_request(body, tool_ctx)`
After response capture:
1. Save captured function calls as `last_function_calls` (for future history rewriting)
### Phase 5: API Handler (`responses.rs`)
#### Request handling (in `handle_responses`):
1. If `body.tools` provided:
- Convert OpenAI → Gemini format via `openai_tools_to_gemini()`
- Store in `MitmStore` via `set_tools()`
2. If `body.tool_choice` provided:
- Convert via `openai_tool_choice_to_gemini()`
- Store in `MitmStore` via `set_tool_config()`
3. Check `body.input` for `function_call_output` items:
- If found: look up `call_id` → function name via `lookup_call_id()`
- Store as `PendingToolResult` via `add_tool_result()`
- Extract any accompanying text (or use placeholder)
#### Response handling (in `handle_responses_sync` / `handle_responses_stream`):
After polling completes:
1. Check `take_any_function_calls()` for captured tool calls
2. If captured:
- Generate `call_id` for each (e.g., `"call_" + random`)
- Register `call_id → name` mapping via `register_call_id()`
- Build `function_call` output items via `build_function_call_output()`
- Return these INSTEAD of the text message output
3. If no tool calls: existing text response behavior
### Phase 6: Gemini-Native Endpoint (`gemini.rs` + `mod.rs`)
New file `src/api/gemini.rs` with handler `handle_gemini`:
- Accepts tools in Gemini `functionDeclarations` format directly (no conversion)
- Accepts `toolConfig` directly
- Returns `functionCall` in Gemini format directly
- Same cascade/session management as responses.rs
- Much simpler — no format translation
Route: `POST /v1/gemini` in `mod.rs`
---
## File Change Summary
| File | Changes | Complexity |
| ---------------------- | ----------------------------------------------------------------------- | ---------- |
| `src/mitm/store.rs` | Add tool context storage (5 new fields, ~10 methods) | Medium |
| `src/api/types.rs` | Add `tools`/`tool_choice` to request, add output builder | Low |
| `src/mitm/modify.rs` | `ToolContext`, format conversion, dynamic injection, history rewrite | High |
| `src/mitm/proxy.rs` | Read store → build ToolContext → pass to modify | Low |
| `src/api/responses.rs` | Store tools, detect tool results in input, return function_call outputs | High |
| `src/api/gemini.rs` | New file — Gemini-native endpoint (passthrough) | Medium |
| `src/api/mod.rs` | Add route + module declaration | Low |
## Implementation Order
1. `store.rs` — foundation, no dependencies
2. `types.rs` — request/response types
3. `modify.rs` — format conversion + injection (depends on store types)
4. `proxy.rs` — plumbing (depends on modify signature)
5. Build + verify compilation
6. `responses.rs` — handler changes (depends on all above)
7. Build + test with `get_weather` request
8. `gemini.rs` + `mod.rs` — Gemini endpoint
9. Build + test with Gemini format
10. Tool result flow test (multi-turn)
## Testing Strategy
### Test 1: Basic tool call (sync)
```bash
curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
"model": "gemini-3-flash",
"input": "What is the weather in Tokyo?",
"tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
"tool_choice": "auto",
"conversation": "tool-test",
"stream": false
}'
# Expected: output contains function_call with name=get_weather, arguments={"city":"Tokyo"}
```
### Test 2: Tool result submission (multi-turn)
```bash
curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
"model": "gemini-3-flash",
"input": [{"type":"function_call_output","call_id":"call_xxx","output":"{\"temp\":72,\"unit\":\"F\"}"}],
"conversation": "tool-test",
"stream": false
}'
# Expected: output contains text response using the tool result
```
### Test 3: Gemini-native endpoint
```bash
curl -s http://localhost:8741/v1/gemini -H "Content-Type: application/json" -d '{
"model": "gemini-3-flash",
"input": "What is the weather in Tokyo?",
"tools": [{"functionDeclarations":[{"name":"get_weather","description":"Get weather","parameters":{"type":"OBJECT","properties":{"city":{"type":"STRING"}},"required":["city"]}}]}],
"conversation": "gemini-tool-test",
"stream": false
}'
# Expected: response contains functionCall in Gemini format
```
### Test 4: No tools (regression)
```bash
curl -s http://localhost:8741/v1/responses -H "Content-Type: application/json" -d '{
"model": "gemini-3-flash",
"input": "What is 2+2?",
"stream": false
}'
# Expected: normal text response, no tool call behavior
```
## Risks & Mitigations
| Risk | Impact | Mitigation |
| ---------------------------------------------------------------- | ------ | ------------------------------------------------------------------------- |
| History rewriting breaks conversation | High | Only rewrite when pending_results non-empty; keep original as fallback |
| LS times out waiting for Google response during tool result turn | Medium | Increase timeout for tool result turns |
| Multiple parallel tool calls create race conditions | Medium | AtomicBool + sequential processing already handles this |
| `modify_request` test breakage | Low | Update existing tests for new signature |
| Global tool storage conflicts across concurrent requests | Medium | Not an issue — LS processes one request at a time (single cascade active) |