129 lines
8.0 KiB
Markdown
129 lines
8.0 KiB
Markdown
# Endpoint Gap Analysis
|
|
|
|
> **Updated:** 2026-02-15
|
|
> **Sources:** [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create), [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses), [Gemini Thinking Mode](https://ai.google.dev/gemini-api/docs/thinking-mode), proxy source code
|
|
> **Method:** Full source audit cross-referenced against context7 OpenAI API docs
|
|
|
|
---
|
|
|
|
## What's Implemented
|
|
|
|
### All Endpoints
|
|
|
|
- ✅ Sync + streaming modes
|
|
- ✅ Model selection + validation
|
|
- ✅ OAuth auth check
|
|
- ✅ Timeout control
|
|
- ✅ Tool definitions, tool choice, tool results (OpenAI → Gemini auto-conversion)
|
|
- ✅ MITM bypass path for custom tools
|
|
- ✅ Thinking/reasoning in both sync and streaming
|
|
- ✅ Generation params forwarded via MITM (`temperature`, `top_p`, `top_k`, `max_output_tokens`, `stop_sequences`, `frequency_penalty`, `presence_penalty`)
|
|
- ✅ `reasoning_effort` / `thinkingLevel` — forwarded as `generationConfig.thinkingConfig.thinkingLevel`
|
|
- ✅ `response_format: {type: "json_object"}` — injected as `responseMimeType: "application/json"`
|
|
- ✅ Google Search grounding — `web_search: true` (Completions), `tools: [{type: "web_search_preview"}]` (Responses), `google_search: true` (Gemini)
|
|
- ✅ `/v1/search` endpoint — dedicated web search via Google Search grounding, returns structured results + citations
|
|
|
|
### Reasoning Effort → Thinking Level Mapping
|
|
|
|
| OpenAI `reasoning_effort` | Google `thinkingLevel` | Gemini 3 Pro | Gemini 3 Flash |
|
|
| :-----------------------: | :--------------------: | :----------: | :------------: |
|
|
| `"low"` | `"low"` | ✅ | ✅ |
|
|
| `"medium"` | `"medium"` | ❌ | ✅ |
|
|
| `"high"` | `"high"` | ✅ (default) | ✅ (default) |
|
|
| — | `"minimal"` | ❌ | ✅ |
|
|
|
|
### Completions-Specific
|
|
|
|
- ✅ `stream_options.include_usage` — final chunk with usage before `[DONE]`
|
|
- ✅ `completion_tokens_details.reasoning_tokens` — thinking token count
|
|
- ✅ `prompt_tokens_details.cached_tokens` — cache read tokens
|
|
- ✅ `temperature`, `top_p`, `max_tokens`, `max_completion_tokens`, `frequency_penalty`, `presence_penalty`
|
|
- ✅ `reasoning_effort`
|
|
- ✅ `stop` — string or array, forwarded as `generationConfig.stopSequences`
|
|
- ✅ `response_format: {type: "json_object"}` — injects `responseMimeType`
|
|
- ✅ `response_format: {type: "json_schema", json_schema: {...}}` — injects `responseMimeType` + `responseSchema` via MITM
|
|
- ✅ `n` (multiple choices) — fires N parallel cascades, collects into `choices[]` (sync only, capped at 5)
|
|
- ✅ `conversation` — session ID for multi-turn cascade reuse (custom extension)
|
|
- ✅ `reasoning_content` — thinking text in assistant message
|
|
- ✅ `system_fingerprint` — `fp_<version>` in sync + all streaming chunks
|
|
- ✅ `service_tier` — `"default"` in sync + all streaming chunks
|
|
- ✅ `logprobs: null` — in every choice (sync + streaming)
|
|
- ✅ `metadata` — accepted in request, ignored
|
|
- ✅ `finish_reason` — correctly maps Google's `MAX_TOKENS`→`"length"`, `SAFETY`→`"content_filter"`, etc.
|
|
- ✅ Full `messages[]` history — all user, assistant, system, tool messages forwarded
|
|
|
|
### Responses-Specific
|
|
|
|
- ✅ Full streaming event set (all `response.*` events including reasoning summary)
|
|
- ✅ `temperature`, `top_p`, `max_output_tokens`
|
|
- ✅ `reasoning_effort` — echoed from client request
|
|
- ✅ `thinking_signature` for multi-turn thinking chains
|
|
- ✅ `instructions`, `metadata`, `user` — echoed in response
|
|
- ✅ Usage with MITM-intercepted real tokens
|
|
- ✅ `max_tool_calls` — limits tool calls returned per response
|
|
- ✅ `conversation` — session reuse
|
|
- ✅ `previous_response_id`, `store`, `parallel_tool_calls`, `truncation`, `text.format`, `tool_choice` — echoed
|
|
- ✅ `tools` — echoed from client request (was previously always `[]`)
|
|
- ✅ `text.format` — `{format: {type: "json_schema", ...}}` injects `responseMimeType` + `responseSchema` via MITM, echoed in response
|
|
|
|
### Gemini-Specific
|
|
|
|
- ✅ Native tool format (no conversion needed)
|
|
- ✅ `usageMetadata` in sync **and streaming** responses
|
|
- ✅ `temperature`, `topP`, `topK`, `maxOutputTokens`, `stopSequences`
|
|
- ✅ `thinkingLevel`
|
|
- ✅ Session/conversation reuse
|
|
- ✅ Array/multipart `input` — strings, string arrays, `{text: "..."}` object arrays
|
|
|
|
---
|
|
|
|
## Fixed Bugs
|
|
|
|
| # | Bug | Fix |
|
|
| --- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| B1 | Messages history dropped | `extract_chat_input` now calls `build_conversation_with_tools` with ALL messages — full multi-turn via `messages[]` works. |
|
|
| B2 | `finish_reason` never `"length"` | `google_to_openai_finish_reason()` helper maps `MAX_TOKENS`→`"length"`, `SAFETY`/`RECITATION`/etc→`"content_filter"`. Applied to all paths. |
|
|
| B3 | `reasoning` always null | `build_response_object` now echoes client's `reasoning_effort` from `RequestParams`. |
|
|
| B4 | `tool_choice` always `"auto"` | Changed from `&'static str` to `serde_json::Value`. Echoes whatever the client sent. |
|
|
| B5 | `tools` always `[]` | Echoes the client's tools array in the response. |
|
|
| B7 | `temperature`/`top_p` wrong | Already defaults to `1.0` via `unwrap_or(1.0)`. Was a false positive — no fix needed. |
|
|
|
|
### Acceptable / Won't Fix
|
|
|
|
| # | Bug | Status |
|
|
| --- | ----------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
|
|
| B6 | `Usage::estimate` fake tokens as fallback | Only triggers on timeout/error paths. Heuristic `len/4` is reasonable for timeouts where output tokens = 0. |
|
|
|
|
---
|
|
|
|
## TODO — New Features
|
|
|
|
### Trivial (all done ✅)
|
|
|
|
All trivial response shape fixes have been implemented.
|
|
|
|
### Medium (schema injection via MITM) — all done ✅
|
|
|
|
All structured output features have been implemented.
|
|
|
|
### Hard (new features)
|
|
|
|
| # | Gap | API | Notes |
|
|
| --- | ------------------------- | ---- | ---------------------------------------------------------- |
|
|
| 7 | **`parallel_tool_calls`** | Both | Accept param, echo in response. Can't enforce server-side. |
|
|
|
|
### Stretch (research needed)
|
|
|
|
| # | Gap | API | Notes |
|
|
| --- | -------------------------- | ---- | ---------------------------------------------------------------------------------------------------------------------------- |
|
|
| 12 | **Image/audio modalities** | Both | LS `sendMessage` is text-only. Need to reverse-engineer proto format for binary payloads. Gemini 3 supports vision natively. |
|
|
|
|
---
|
|
|
|
## Won't Implement
|
|
|
|
| # | Gap | Reason |
|
|
| --- | ------------------------------- | ------------------------------------------------------------------------ |
|
|
| 9 | `prediction` (Predicted Output) | Inference-level speculative decoding optimization. No Gemini equivalent. |
|
|
| 10 | `logprobs` / `top_logprobs` | Gemini never exposes token-level log probabilities. |
|