# Endpoint Gap Analysis > **Updated:** 2026-02-15 > **Sources:** [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create), [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses), [Gemini Thinking Mode](https://ai.google.dev/gemini-api/docs/thinking-mode), proxy source code > **Method:** Full source audit cross-referenced against context7 OpenAI API docs --- ## What's Implemented ### All Endpoints - ✅ Sync + streaming modes - ✅ Model selection + validation - ✅ OAuth auth check - ✅ Timeout control - ✅ Tool definitions, tool choice, tool results (OpenAI → Gemini auto-conversion) - ✅ MITM bypass path for custom tools - ✅ Thinking/reasoning in both sync and streaming - ✅ Generation params forwarded via MITM (`temperature`, `top_p`, `top_k`, `max_output_tokens`, `stop_sequences`, `frequency_penalty`, `presence_penalty`) - ✅ `reasoning_effort` / `thinkingLevel` — forwarded as `generationConfig.thinkingConfig.thinkingLevel` - ✅ `response_format: {type: "json_object"}` — injected as `responseMimeType: "application/json"` - ✅ Google Search grounding — `web_search: true` (Completions), `tools: [{type: "web_search_preview"}]` (Responses), `google_search: true` (Gemini) - ✅ `/v1/search` endpoint — dedicated web search via Google Search grounding, returns structured results + citations - ✅ Image uploads — `input_image` / `image_url` with base64 data URIs, injected via MITM as `inlineData` - ✅ Upstream error propagation — Google API errors (400, 429, 500) returned to client instantly instead of hanging ### Reasoning Effort → Thinking Level Mapping | OpenAI `reasoning_effort` | Google `thinkingLevel` | Gemini 3 Pro | Gemini 3 Flash | | :-----------------------: | :--------------------: | :----------: | :------------: | | `"low"` | `"low"` | ✅ | ✅ | | `"medium"` | `"medium"` | ❌ | ✅ | | `"high"` | `"high"` | ✅ (default) | ✅ (default) | | — | `"minimal"` | ❌ | ✅ | ### Completions-Specific - ✅ `stream_options.include_usage` — final chunk with usage before `[DONE]` - ✅ `completion_tokens_details.reasoning_tokens` — thinking token count - ✅ `prompt_tokens_details.cached_tokens` — cache read tokens - ✅ `temperature`, `top_p`, `max_tokens`, `max_completion_tokens`, `frequency_penalty`, `presence_penalty` - ✅ `reasoning_effort` - ✅ `stop` — string or array, forwarded as `generationConfig.stopSequences` - ✅ `response_format: {type: "json_object"}` — injects `responseMimeType` - ✅ `response_format: {type: "json_schema", json_schema: {...}}` — injects `responseMimeType` + `responseSchema` via MITM - ✅ `n` (multiple choices) — fires N parallel cascades, collects into `choices[]` (sync only, capped at 5) - ✅ `conversation` — session ID for multi-turn cascade reuse (custom extension) - ✅ `reasoning_content` — thinking text in assistant message - ✅ `system_fingerprint` — `fp_` in sync + all streaming chunks - ✅ `service_tier` — `"default"` in sync + all streaming chunks - ✅ `logprobs: null` — in every choice (sync + streaming) - ✅ `metadata` — accepted in request, ignored - ✅ `finish_reason` — correctly maps Google's `MAX_TOKENS`→`"length"`, `SAFETY`→`"content_filter"`, etc. - ✅ Full `messages[]` history — all user, assistant, system, tool messages forwarded ### Responses-Specific - ✅ Full streaming event set (all `response.*` events including reasoning summary) - ✅ `temperature`, `top_p`, `max_output_tokens` - ✅ `reasoning_effort` — echoed from client request - ✅ `thinking_signature` for multi-turn thinking chains - ✅ `instructions`, `metadata`, `user` — echoed in response - ✅ Usage with MITM-intercepted real tokens - ✅ `max_tool_calls` — limits tool calls returned per response - ✅ `conversation` — session reuse - ✅ `previous_response_id`, `store`, `parallel_tool_calls`, `truncation`, `text.format`, `tool_choice` — echoed - ✅ `tools` — echoed from client request (was previously always `[]`) - ✅ `text.format` — `{format: {type: "json_schema", ...}}` injects `responseMimeType` + `responseSchema` via MITM, echoed in response ### Gemini-Specific - ✅ Native tool format (no conversion needed) - ✅ `usageMetadata` in sync **and streaming** responses - ✅ `temperature`, `topP`, `topK`, `maxOutputTokens`, `stopSequences` - ✅ `thinkingLevel` - ✅ Session/conversation reuse - ✅ Array/multipart `input` — strings, string arrays, `{text: "..."}` object arrays --- ## Fixed Bugs | # | Bug | Fix | | --- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | | B1 | Messages history dropped | `extract_chat_input` now calls `build_conversation_with_tools` with ALL messages — full multi-turn via `messages[]` works. | | B2 | `finish_reason` never `"length"` | `google_to_openai_finish_reason()` helper maps `MAX_TOKENS`→`"length"`, `SAFETY`/`RECITATION`/etc→`"content_filter"`. Applied to all paths. | | B3 | `reasoning` always null | `build_response_object` now echoes client's `reasoning_effort` from `RequestParams`. | | B4 | `tool_choice` always `"auto"` | Changed from `&'static str` to `serde_json::Value`. Echoes whatever the client sent. | | B5 | `tools` always `[]` | Echoes the client's tools array in the response. | | B7 | `temperature`/`top_p` wrong | Already defaults to `1.0` via `unwrap_or(1.0)`. Was a false positive — no fix needed. | ### Acceptable / Won't Fix | # | Bug | Status | | --- | ----------------------------------------- | ----------------------------------------------------------------------------------------------------------- | | B6 | `Usage::estimate` fake tokens as fallback | Only triggers on timeout/error paths. Heuristic `len/4` is reasonable for timeouts where output tokens = 0. | --- ## TODO — New Features ### Trivial (all done ✅) All trivial response shape fixes have been implemented. ### Medium (schema injection via MITM) — all done ✅ All structured output features have been implemented. ### Hard (new features) | # | Gap | API | Notes | | --- | ------------------------- | ---- | ---------------------------------------------------------- | | 7 | **`parallel_tool_calls`** | Both | Accept param, echo in response. Can't enforce server-side. | ### Stretch (research needed) | # | Gap | API | Notes | | --- | --------------- | ---- | ---------------------------------------------------------------- | | 12 | **Audio input** | Both | Audio modalities not yet supported. Vision/images work via MITM. | --- ## Won't Implement | # | Gap | Reason | | --- | ------------------------------- | ------------------------------------------------------------------------ | | 9 | `prediction` (Predicted Output) | Inference-level speculative decoding optimization. No Gemini equivalent. | | 10 | `logprobs` / `top_logprobs` | Gemini never exposes token-level log probabilities. |