Endpoint Gap Analysis
Updated: 2026-02-15
Sources: OpenAI Chat Completions API, OpenAI Responses API, Gemini Thinking Mode, proxy source code
Method: Full source audit cross-referenced against context7 OpenAI API docs
What's Implemented
All Endpoints
- ✅ Sync + streaming modes
- ✅ Model selection + validation
- ✅ OAuth auth check
- ✅ Timeout control
- ✅ Tool definitions, tool choice, tool results (OpenAI → Gemini auto-conversion)
- ✅ MITM bypass path for custom tools
- ✅ Thinking/reasoning in both sync and streaming
- ✅ Generation params forwarded via MITM (
temperature, top_p, top_k, max_output_tokens, stop_sequences, frequency_penalty, presence_penalty)
- ✅
reasoning_effort / thinkingLevel — forwarded as generationConfig.thinkingConfig.thinkingLevel
- ✅
response_format: {type: "json_object"} — injected as responseMimeType: "application/json"
- ✅ Google Search grounding —
web_search: true (Completions), tools: [{type: "web_search_preview"}] (Responses), google_search: true (Gemini)
- ✅
/v1/search endpoint — dedicated web search via Google Search grounding, returns structured results + citations
Reasoning Effort → Thinking Level Mapping
OpenAI reasoning_effort |
Google thinkingLevel |
Gemini 3 Pro |
Gemini 3 Flash |
"low" |
"low" |
✅ |
✅ |
"medium" |
"medium" |
❌ |
✅ |
"high" |
"high" |
✅ (default) |
✅ (default) |
| — |
"minimal" |
❌ |
✅ |
Completions-Specific
- ✅
stream_options.include_usage — final chunk with usage before [DONE]
- ✅
completion_tokens_details.reasoning_tokens — thinking token count
- ✅
prompt_tokens_details.cached_tokens — cache read tokens
- ✅
temperature, top_p, max_tokens, max_completion_tokens, frequency_penalty, presence_penalty
- ✅
reasoning_effort
- ✅
stop — string or array, forwarded as generationConfig.stopSequences
- ✅
response_format: {type: "json_object"} — injects responseMimeType
- ✅
response_format: {type: "json_schema", json_schema: {...}} — injects responseMimeType + responseSchema via MITM
- ✅
n (multiple choices) — fires N parallel cascades, collects into choices[] (sync only, capped at 5)
- ✅
conversation — session ID for multi-turn cascade reuse (custom extension)
- ✅
reasoning_content — thinking text in assistant message
- ✅
system_fingerprint — fp_<version> in sync + all streaming chunks
- ✅
service_tier — "default" in sync + all streaming chunks
- ✅
logprobs: null — in every choice (sync + streaming)
- ✅
metadata — accepted in request, ignored
- ✅
finish_reason — correctly maps Google's MAX_TOKENS→"length", SAFETY→"content_filter", etc.
- ✅ Full
messages[] history — all user, assistant, system, tool messages forwarded
Responses-Specific
- ✅ Full streaming event set (all
response.* events including reasoning summary)
- ✅
temperature, top_p, max_output_tokens
- ✅
reasoning_effort — echoed from client request
- ✅
thinking_signature for multi-turn thinking chains
- ✅
instructions, metadata, user — echoed in response
- ✅ Usage with MITM-intercepted real tokens
- ✅
max_tool_calls — limits tool calls returned per response
- ✅
conversation — session reuse
- ✅
previous_response_id, store, parallel_tool_calls, truncation, text.format, tool_choice — echoed
- ✅
tools — echoed from client request (was previously always [])
- ✅
text.format — {format: {type: "json_schema", ...}} injects responseMimeType + responseSchema via MITM, echoed in response
Gemini-Specific
- ✅ Native tool format (no conversion needed)
- ✅
usageMetadata in sync and streaming responses
- ✅
temperature, topP, topK, maxOutputTokens, stopSequences
- ✅
thinkingLevel
- ✅ Session/conversation reuse
- ✅ Array/multipart
input — strings, string arrays, {text: "..."} object arrays
Fixed Bugs
| # |
Bug |
Fix |
| B1 |
Messages history dropped |
extract_chat_input now calls build_conversation_with_tools with ALL messages — full multi-turn via messages[] works. |
| B2 |
finish_reason never "length" |
google_to_openai_finish_reason() helper maps MAX_TOKENS→"length", SAFETY/RECITATION/etc→"content_filter". Applied to all paths. |
| B3 |
reasoning always null |
build_response_object now echoes client's reasoning_effort from RequestParams. |
| B4 |
tool_choice always "auto" |
Changed from &'static str to serde_json::Value. Echoes whatever the client sent. |
| B5 |
tools always [] |
Echoes the client's tools array in the response. |
| B7 |
temperature/top_p wrong |
Already defaults to 1.0 via unwrap_or(1.0). Was a false positive — no fix needed. |
Acceptable / Won't Fix
| # |
Bug |
Status |
| B6 |
Usage::estimate fake tokens as fallback |
Only triggers on timeout/error paths. Heuristic len/4 is reasonable for timeouts where output tokens = 0. |
TODO — New Features
Trivial (all done ✅)
All trivial response shape fixes have been implemented.
Medium (schema injection via MITM) — all done ✅
All structured output features have been implemented.
Hard (new features)
| # |
Gap |
API |
Notes |
| 7 |
parallel_tool_calls |
Both |
Accept param, echo in response. Can't enforce server-side. |
Stretch (research needed)
| # |
Gap |
API |
Notes |
| 12 |
Image/audio modalities |
Both |
LS sendMessage is text-only. Need to reverse-engineer proto format for binary payloads. Gemini 3 supports vision natively. |
Won't Implement
| # |
Gap |
Reason |
| 9 |
prediction (Predicted Output) |
Inference-level speculative decoding optimization. No Gemini equivalent. |
| 10 |
logprobs / top_logprobs |
Gemini never exposes token-level log probabilities. |