Files

Nikketryhard b1bd57ab5e feat: forward generation params via MITM + add usageMetadata to Gemini

- Add GenerationParams struct to MitmStore for temperature, top_p,
  top_k, max_output_tokens, stop_sequences, frequency/presence_penalty
- MITM modify_request injects params into request.generationConfig
- All 3 endpoints (Completions, Responses, Gemini) store client params
- Add usageMetadata to Gemini sync responses (promptTokenCount,
  candidatesTokenCount, totalTokenCount, thoughtsTokenCount)
- Add generation param fields to GeminiRequest (temperature, topP, etc.)
- Completions stream_options.include_usage emits final usage chunk
- Completions reasoning_tokens in completion_tokens_details
- Update endpoint gap analysis doc (all high-priority gaps resolved)

2026-02-15 14:23:05 -06:00

4.6 KiB

Raw Blame History

Known Issues & Future Work

All critical blockers have been resolved. Standalone LS with MITM interception is fully working. Reactive streaming is implemented with polling fallback. All three API endpoints (Responses, Completions, Gemini) now bypass the LS when custom tools are active, reading directly from MitmStore.

✅ Resolved

LS Go LLM Client Ignores System TLS Trust Store

Status: SOLVED (2026-02-14)

Previously the #1 blocker. The standalone LS (--standalone flag, now default) routes all LLM API traffic through the MITM proxy with full decryption.

Solution:

UID-scoped iptables — scripts/mitm-redirect.sh creates an antigravity-ls system user. iptables redirects only that UID's port-443 traffic → MITM port.
Combined CA bundle — The Go client honors SSL_CERT_FILE when set on the standalone process. A combined bundle (system CAs + MITM CA) is written to /tmp/antigravity-mitm-combined-ca.pem.
sudo -u spawning — The proxy spawns the LS as the antigravity-ls user, so only the standalone LS traffic is intercepted. No impact on other software.
Google SSE parsing — MITM parses streamGenerateContent?alt=sse responses and extracts promptTokenCount, candidatesTokenCount, thoughtsTokenCount.

Verified: /v1/usage returns per-model token usage from intercepted traffic.

Polling-Based Cascade Updates

Status: SOLVED (2026-02-14)

StreamCascadeReactiveUpdates is now used for real-time cascade state notifications. Falls back to timer-based polling if the streaming RPC is unavailable. Reactive diffs also carry progressive response text and thinking content (see docs/panel-stream-investigation.md).

StreamCascadePanelReactiveUpdates — Dead End

Status: INVESTIGATED & CLOSED (2026-02-14)

CascadePanelState only contains plan_status and user_settings — not thinking text. The panel reactive component uses a workspace-scoped ID, not cascade IDs. See docs/panel-stream-investigation.md.

Request Modification Not Implemented

Status: SOLVED (2026-02-15)

MitmConfig.modify_requests is now true by default. Used for:

Tool/function call injection into LS requests (Gemini functionDeclarations)
Tool result injection as functionResponse parts
LS bypass when custom tools are active (response captured directly from MITM)

Cascade Correlation Is Heuristic

Status: SOLVED (2026-02-15)

Previously, MITM usage was keyed under _latest because extract_cascade_hint() couldn't parse the chunked-encoded Google SSE request body.

Fix: API handlers now call mitm_store.set_active_cascade(cascade_id) before sending messages. record_usage() falls back to this active cascade ID when the heuristic hint is absent, properly correlating usage to cascades.

Progressive Thinking Streaming

Status: SOLVED (2026-02-15)

Thinking text now streams progressively as delta events. The implementation:

LS cascade steps — plannerResponse.thinking (field 3) grows progressively as the LS receives data. For Opus 4.6, thinking text builds up word-by-word over ~1-2s. For Gemini Flash, thinking arrives in 1-2 larger chunks.
Delta tracking — last_thinking_len tracks the previously emitted length. Each poll compares current thinking length and emits only the new characters as response.reasoning_summary_text.delta events.
Lifecycle — Structure events (output_item.added, summary_part.added) emit on first thinking appearance. done events emit when response text first appears (indicating thinking phase completed).

Verified with Opus 4.6: (2026-02-15 13:22 UTC)

delta_len=24  "The user is asking about"
delta_len=61  " the Collatz conjecture..."
delta_len=5   " This"
delta_len=10  " is a pure"
... (11 progressive deltas over ~850ms)

🟢 Low

1. MITM Integration Tests

Unit tests cover protobuf decoding and intercept parsing (18 tests pass). Integration tests for the full MITM pipeline (TLS interception, response parsing, usage recording) would be valuable now that interception works.

2. MITM for Main Antigravity Session

The current MITM only works for the standalone LS (default mode). Intercepting the main Antigravity session's LS is harder because:

The main LS is managed by the Antigravity app, not by us
UID-scoped iptables can't target it without affecting all user traffic
The mitm-wrapper.sh approach sets env vars but the LLM client ignores HTTPS_PROXY unless detect_and_use_proxy is ENABLED via init metadata

Workaround: Use standalone mode (default) for all proxy traffic.

4.6 KiB Raw Blame History

Known Issues & Future Work

✅ Resolved

LS Go LLM Client Ignores System TLS Trust Store

Polling-Based Cascade Updates

StreamCascadePanelReactiveUpdates — Dead End

Request Modification Not Implemented

Cascade Correlation Is Heuristic

Progressive Thinking Streaming

🟢 Low

1. MITM Integration Tests

2. MITM for Main Antigravity Session

4.6 KiB

Raw Blame History