- Add GenerationParams struct to MitmStore for temperature, top_p, top_k, max_output_tokens, stop_sequences, frequency/presence_penalty - MITM modify_request injects params into request.generationConfig - All 3 endpoints (Completions, Responses, Gemini) store client params - Add usageMetadata to Gemini sync responses (promptTokenCount, candidatesTokenCount, totalTokenCount, thoughtsTokenCount) - Add generation param fields to GeminiRequest (temperature, topP, etc.) - Completions stream_options.include_usage emits final usage chunk - Completions reasoning_tokens in completion_tokens_details - Update endpoint gap analysis doc (all high-priority gaps resolved)
118 lines
4.6 KiB
Markdown
118 lines
4.6 KiB
Markdown
# Known Issues & Future Work
|
|
|
|
All critical blockers have been resolved. Standalone LS with MITM interception
|
|
is fully working. Reactive streaming is implemented with polling fallback.
|
|
All three API endpoints (Responses, Completions, Gemini) now bypass the LS
|
|
when custom tools are active, reading directly from MitmStore.
|
|
|
|
---
|
|
|
|
## ✅ Resolved
|
|
|
|
### ~~LS Go LLM Client Ignores System TLS Trust Store~~
|
|
|
|
**Status: SOLVED (2026-02-14)**
|
|
|
|
Previously the #1 blocker. The standalone LS (`--standalone` flag, now default)
|
|
routes all LLM API traffic through the MITM proxy with full decryption.
|
|
|
|
**Solution:**
|
|
|
|
1. **UID-scoped iptables** — `scripts/mitm-redirect.sh` creates an `antigravity-ls`
|
|
system user. iptables redirects only that UID's port-443 traffic → MITM port.
|
|
2. **Combined CA bundle** — The Go client honors `SSL_CERT_FILE` when set on
|
|
the standalone process. A combined bundle (system CAs + MITM CA) is written
|
|
to `/tmp/antigravity-mitm-combined-ca.pem`.
|
|
3. **`sudo -u` spawning** — The proxy spawns the LS as the `antigravity-ls` user,
|
|
so only the standalone LS traffic is intercepted. No impact on other software.
|
|
4. **Google SSE parsing** — MITM parses `streamGenerateContent?alt=sse` responses
|
|
and extracts `promptTokenCount`, `candidatesTokenCount`, `thoughtsTokenCount`.
|
|
|
|
**Verified:** `/v1/usage` returns per-model token usage from intercepted traffic.
|
|
|
|
### ~~Polling-Based Cascade Updates~~
|
|
|
|
**Status: SOLVED (2026-02-14)**
|
|
|
|
`StreamCascadeReactiveUpdates` is now used for real-time cascade state
|
|
notifications. Falls back to timer-based polling if the streaming RPC is
|
|
unavailable. Reactive diffs also carry progressive response text and thinking
|
|
content (see `docs/panel-stream-investigation.md`).
|
|
|
|
### ~~StreamCascadePanelReactiveUpdates — Dead End~~
|
|
|
|
**Status: INVESTIGATED & CLOSED (2026-02-14)**
|
|
|
|
`CascadePanelState` only contains `plan_status` and `user_settings` — not
|
|
thinking text. The panel reactive component uses a workspace-scoped ID, not
|
|
cascade IDs. See `docs/panel-stream-investigation.md`.
|
|
|
|
### ~~Request Modification Not Implemented~~
|
|
|
|
**Status: SOLVED (2026-02-15)**
|
|
|
|
`MitmConfig.modify_requests` is now `true` by default. Used for:
|
|
|
|
- Tool/function call injection into LS requests (Gemini `functionDeclarations`)
|
|
- Tool result injection as `functionResponse` parts
|
|
- LS bypass when custom tools are active (response captured directly from MITM)
|
|
|
|
### ~~Cascade Correlation Is Heuristic~~
|
|
|
|
**Status: SOLVED (2026-02-15)**
|
|
|
|
Previously, MITM usage was keyed under `_latest` because `extract_cascade_hint()`
|
|
couldn't parse the chunked-encoded Google SSE request body.
|
|
|
|
**Fix:** API handlers now call `mitm_store.set_active_cascade(cascade_id)` before
|
|
sending messages. `record_usage()` falls back to this active cascade ID when the
|
|
heuristic hint is absent, properly correlating usage to cascades.
|
|
|
|
### ~~Progressive Thinking Streaming~~
|
|
|
|
**Status: SOLVED (2026-02-15)**
|
|
|
|
Thinking text now streams progressively as delta events. The implementation:
|
|
|
|
1. **LS cascade steps** — `plannerResponse.thinking` (field 3) grows progressively
|
|
as the LS receives data. For Opus 4.6, thinking text builds up word-by-word
|
|
over ~1-2s. For Gemini Flash, thinking arrives in 1-2 larger chunks.
|
|
2. **Delta tracking** — `last_thinking_len` tracks the previously emitted length.
|
|
Each poll compares current thinking length and emits only the new characters
|
|
as `response.reasoning_summary_text.delta` events.
|
|
3. **Lifecycle** — Structure events (`output_item.added`, `summary_part.added`)
|
|
emit on first thinking appearance. `done` events emit when response text
|
|
first appears (indicating thinking phase completed).
|
|
|
|
**Verified with Opus 4.6:** (2026-02-15 13:22 UTC)
|
|
|
|
```
|
|
delta_len=24 "The user is asking about"
|
|
delta_len=61 " the Collatz conjecture..."
|
|
delta_len=5 " This"
|
|
delta_len=10 " is a pure"
|
|
... (11 progressive deltas over ~850ms)
|
|
```
|
|
|
|
---
|
|
|
|
## 🟢 Low
|
|
|
|
### 1. MITM Integration Tests
|
|
|
|
Unit tests cover protobuf decoding and intercept parsing (18 tests pass).
|
|
Integration tests for the full MITM pipeline (TLS interception, response
|
|
parsing, usage recording) would be valuable now that interception works.
|
|
|
|
### 2. MITM for Main Antigravity Session
|
|
|
|
The current MITM only works for the standalone LS (default mode).
|
|
Intercepting the main Antigravity session's LS is harder because:
|
|
|
|
- The main LS is managed by the Antigravity app, not by us
|
|
- UID-scoped iptables can't target it without affecting all user traffic
|
|
- The `mitm-wrapper.sh` approach sets env vars but the LLM client ignores
|
|
`HTTPS_PROXY` unless `detect_and_use_proxy` is ENABLED via init metadata
|
|
|
|
**Workaround:** Use standalone mode (default) for all proxy traffic.
|