Endpoint Gap Analysis

Updated: 2026-02-15
Sources: OpenAI Chat Completions API, OpenAI Responses API, Gemini Thinking Mode, proxy source code
Method: Full source audit cross-referenced against context7 OpenAI API docs

What's Implemented

All Endpoints

✅ Sync + streaming modes
✅ Model selection + validation
✅ OAuth auth check
✅ Timeout control
✅ Tool definitions, tool choice, tool results (OpenAI → Gemini auto-conversion)
✅ MITM bypass path for custom tools
✅ Thinking/reasoning in both sync and streaming
✅ Generation params forwarded via MITM (temperature, top_p, top_k, max_output_tokens, stop_sequences, frequency_penalty, presence_penalty)
✅ reasoning_effort / thinkingLevel — forwarded as generationConfig.thinkingConfig.thinkingLevel
✅ response_format: {type: "json_object"} — injected as responseMimeType: "application/json"
✅ Google Search grounding — web_search: true (Completions), tools: [{type: "web_search_preview"}] (Responses), google_search: true (Gemini)
✅ /v1/search endpoint — dedicated web search via Google Search grounding, returns structured results + citations

Reasoning Effort → Thinking Level Mapping

OpenAI `reasoning_effort`	Google `thinkingLevel`	Gemini 3 Pro	Gemini 3 Flash
`"low"`	`"low"`	✅	✅
`"medium"`	`"medium"`	❌	✅
`"high"`	`"high"`	✅ (default)	✅ (default)
—	`"minimal"`	❌	✅

Completions-Specific

✅ stream_options.include_usage — final chunk with usage before [DONE]
✅ completion_tokens_details.reasoning_tokens — thinking token count
✅ prompt_tokens_details.cached_tokens — cache read tokens
✅ temperature, top_p, max_tokens, max_completion_tokens, frequency_penalty, presence_penalty
✅ reasoning_effort
✅ stop — string or array, forwarded as generationConfig.stopSequences
✅ response_format: {type: "json_object"} — injects responseMimeType
✅ response_format: {type: "json_schema", json_schema: {...}} — injects responseMimeType + responseSchema via MITM
✅ n (multiple choices) — fires N parallel cascades, collects into choices[] (sync only, capped at 5)
✅ conversation — session ID for multi-turn cascade reuse (custom extension)
✅ reasoning_content — thinking text in assistant message
✅ system_fingerprint — fp_<version> in sync + all streaming chunks
✅ service_tier — "default" in sync + all streaming chunks
✅ logprobs: null — in every choice (sync + streaming)
✅ metadata — accepted in request, ignored
✅ finish_reason — correctly maps Google's MAX_TOKENS→"length", SAFETY→"content_filter", etc.
✅ Full messages[] history — all user, assistant, system, tool messages forwarded

Responses-Specific

✅ Full streaming event set (all response.* events including reasoning summary)
✅ temperature, top_p, max_output_tokens
✅ reasoning_effort — echoed from client request
✅ thinking_signature for multi-turn thinking chains
✅ instructions, metadata, user — echoed in response
✅ Usage with MITM-intercepted real tokens
✅ max_tool_calls — limits tool calls returned per response
✅ conversation — session reuse
✅ previous_response_id, store, parallel_tool_calls, truncation, text.format, tool_choice — echoed
✅ tools — echoed from client request (was previously always [])
✅ text.format — {format: {type: "json_schema", ...}} injects responseMimeType + responseSchema via MITM, echoed in response

Gemini-Specific

✅ Native tool format (no conversion needed)
✅ usageMetadata in sync and streaming responses
✅ temperature, topP, topK, maxOutputTokens, stopSequences
✅ thinkingLevel
✅ Session/conversation reuse
✅ Array/multipart input — strings, string arrays, {text: "..."} object arrays

Fixed Bugs

#	Bug	Fix
B1	Messages history dropped	`extract_chat_input` now calls `build_conversation_with_tools` with ALL messages — full multi-turn via `messages[]` works.
B2	`finish_reason` never `"length"`	`google_to_openai_finish_reason()` helper maps `MAX_TOKENS`→`"length"`, `SAFETY`/`RECITATION`/etc→`"content_filter"`. Applied to all paths.
B3	`reasoning` always null	`build_response_object` now echoes client's `reasoning_effort` from `RequestParams`.
B4	`tool_choice` always `"auto"`	Changed from `&'static str` to `serde_json::Value`. Echoes whatever the client sent.
B5	`tools` always `[]`	Echoes the client's tools array in the response.
B7	`temperature`/`top_p` wrong	Already defaults to `1.0` via `unwrap_or(1.0)`. Was a false positive — no fix needed.

Acceptable / Won't Fix

#	Bug	Status
B6	`Usage::estimate` fake tokens as fallback	Only triggers on timeout/error paths. Heuristic `len/4` is reasonable for timeouts where output tokens = 0.

TODO — New Features

Trivial (all done ✅)

All trivial response shape fixes have been implemented.

Medium (schema injection via MITM) — all done ✅

All structured output features have been implemented.

Hard (new features)

#	Gap	API	Notes
7	`parallel_tool_calls`	Both	Accept param, echo in response. Can't enforce server-side.

Stretch (research needed)

#	Gap	API	Notes
12	Image/audio modalities	Both	LS `sendMessage` is text-only. Need to reverse-engineer proto format for binary payloads. Gemini 3 supports vision natively.

Won't Implement

#	Gap	Reason
9	`prediction` (Predicted Output)	Inference-level speculative decoding optimization. No Gemini equivalent.
10	`logprobs` / `top_logprobs`	Gemini never exposes token-level log probabilities.

8.0 KiB Raw Blame History

Endpoint Gap Analysis

What's Implemented

All Endpoints

Reasoning Effort → Thinking Level Mapping

Completions-Specific

Responses-Specific

Gemini-Specific

Fixed Bugs

Acceptable / Won't Fix

TODO — New Features

Trivial (all done ✅)

Medium (schema injection via MITM) — all done ✅

Hard (new features)

Stretch (research needed)

Won't Implement

8.0 KiB

Raw Blame History