fix: use hardcoded default base URL for report_url fallback

The SCHWAB_MCP_BASE_URL env var may not be set in all deployment environments. Default to https://schwab-mcp.ext.ben.io so the report_url enrichment works out of the box.
fix: resolve report_url=None for blob-URL tickers and fix dataclass serialization
2026-05-21 14:55:42 +00:00 · 2026-05-21 14:46:08 +00:00 · 2026-05-20 15:09:37 -05:00 · 2026-05-20 13:36:43 -05:00 · 2026-05-04 14:36:52 +00:00 · 2026-05-04 14:31:01 +00:00
5 changed files with 277 additions and 19 deletions
--- a/.gitea/workflows/build.yaml
+++ b/.gitea/workflows/build.yaml
@@ -5,6 +5,7 @@ on:
    branches:
      - main
      - master
+  workflow_dispatch:

 jobs:
  build:
@@ -15,12 +16,16 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v3

-      - name: Checkout schwab-scraper
-        uses: actions/checkout@v3
-        with:
-          repository: b3nw/schwab-scraper
-          path: vendor/schwab-scraper
-          token: ${{ secrets.CR_PAT }}
+      - name: Clone schwab-scraper
+        env:
+          CLONE_TOKEN: ${{ secrets.CRT_READ_ONLY }}
+        run: |
+          mkdir -p vendor
+          git clone --depth=1 --branch main \
+            "https://x-access-token:${CLONE_TOKEN}@gitea.ext.ben.io/b3nw/schwab-scraper.git" \
+            vendor/schwab-scraper
+          git -C vendor/schwab-scraper rev-parse HEAD > vendor/schwab-scraper-commit.txt
+          echo "${{ gitea.sha }}" > vendor/mcp-server-commit.txt

      - name: Login to Gitea Container Registry
        uses: docker/login-action@v2
--- a/3
+++ b/3
@@ -17,6 +17,9 @@ RUN uv venv && \
    uv pip install --upgrade playwright && \
    rm -rf /tmp/schwab-scraper

+COPY vendor/schwab-scraper-commit.txt /app/schwab-scraper-commit.txt
+COPY vendor/mcp-server-commit.txt /app/mcp-server-commit.txt
+
 COPY . .

 FROM python:3.12-slim-bookworm
--- a/compose.yaml
+++ b/compose.yaml
@@ -21,7 +21,8 @@ services:
          memory: 128M
          cpus: '0.1'
    environment:
-      - SCHWAB_PLAYWRIGHT_URL=ws://schwab-browser:3000/playwright/chromium
+      - SCHWAB_PLAYWRIGHT_URL=ws://browser.local.ben.io:3000/playwright/chromium?timeout=300000
+      - SCHWAB_MCP_BASE_URL=https://schwab-mcp.ext.ben.io
      - PORT=8000
    volumes:
      - ./cookies.json:/app/cookies.json
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "schwab-mcp-custom"
-version = "0.2.0"
+version = "0.2.1"
 description = "MCP server wrapping schwab-scraper"
 readme = "README.md"
 requires-python = ">=3.12"
--- a/server.py
+++ b/server.py
@@ -1,16 +1,141 @@
+import dataclasses
+import io
 import json
 import logging
 import os
+import sys
 import time
+from contextlib import contextmanager
 from typing import Optional, Any, Tuple

 from fastmcp import FastMCP
 from starlette.applications import Starlette
-from starlette.responses import JSONResponse
+from starlette.responses import JSONResponse, Response
 from starlette.routing import Route, Mount
 import uvicorn

 import schwab_scraper.unified_api as api
+from schwab_scraper.storage.cache import read_cached_pdf
+
+
+# ---------------------------------------------------------------------------
+# Configure logging so it actually reaches stderr (visible in docker logs).
+# The scraper and MCP libraries log extensively but don't set up handlers
+# when imported as a module, so messages are silently dropped.
+# ---------------------------------------------------------------------------
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+    stream=sys.stderr,
+)
+
+# Ensure the scraper logger propagates to our root handler
+_scraper_logger = logging.getLogger("schwab_scraper")
+_scraper_logger.setLevel(logging.DEBUG if os.getenv("SCHWAB_DEBUG", "").lower() in ("1", "true") else logging.INFO)
+_scraper_logger.propagate = True
+
+_startup_logger = logging.getLogger("schwab_mcp_custom")
+
+
+def _read_commit_file(path: str) -> str | None:
+    try:
+        with open(path) as f:
+            return f.read().strip() or None
+    except FileNotFoundError:
+        return None
+
+
+_scraper_commit = _read_commit_file(
+    os.path.join(os.path.dirname(__file__), "schwab-scraper-commit.txt")
+)
+_mcp_commit = _read_commit_file(
+    os.path.join(os.path.dirname(__file__), "mcp-server-commit.txt")
+)
+
+if _scraper_commit:
+    _startup_logger.info("schwab-scraper commit: %s", _scraper_commit)
+else:
+    _startup_logger.info("schwab-scraper commit: (not available)")
+
+if _mcp_commit:
+    _startup_logger.info("mcp-server commit: %s", _mcp_commit)
+else:
+    _startup_logger.info("mcp-server commit: (not available)")
+
+try:
+    from importlib.metadata import version as _pkg_version
+
+    _startup_logger.info("schwab-scraper package version: %s", _pkg_version("schwab-scraper"))
+except Exception:
+    _startup_logger.info("schwab-scraper package version: (unknown)")
+
+_DEFAULT_BASE_URL = "https://schwab-mcp.ext.ben.io"
+
+
+# ---------------------------------------------------------------------------
+# Log capture helper — captures scraper logs to a string buffer AND tees
+# them to stderr so they remain visible in docker logs.
+# ---------------------------------------------------------------------------
+class _TeeHandler(logging.StreamHandler):
+    """Handler that copies every record to a secondary (StringIO) buffer."""
+
+    def __init__(self, stream, extra_buf: io.StringIO, level=logging.NOTSET):
+        super().__init__(stream)
+        self.extra_buf = extra_buf
+        self.tee_level = level
+
+    def emit(self, record):
+        super().emit(record)
+        if record.levelno >= self.tee_level:
+            try:
+                msg = self.format(record)
+                self.extra_buf.write(msg + "\n")
+                self.extra_buf.flush()
+            except Exception:
+                pass
+
+
+@contextmanager
+def capture_logs(logger_name: str = "schwab_scraper", level: int = logging.DEBUG):
+    """
+    Context manager that captures log output to a string buffer
+    while still writing to stderr (docker-visible).
+
+    Yields the buffer so callers can read captured logs after the block.
+    """
+    logger = logging.getLogger(logger_name)
+    old_level = logger.level
+    if old_level > level:
+        logger.setLevel(level)
+
+    buf = io.StringIO()
+    handler = _TeeHandler(sys.stderr, buf, level=level)
+    handler.setLevel(level)
+    handler.setFormatter(logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s"))
+    logger.addHandler(handler)
+
+    # Also tee the root logger in case scraper logs through sub-loggers
+    root_old_level = logging.getLogger().level
+    if root_old_level > level:
+        logging.getLogger().setLevel(level)
+
+    try:
+        yield buf
+    finally:
+        logger.removeHandler(handler)
+        if old_level != logger.level:
+            logger.setLevel(old_level)
+        if root_old_level != logging.getLogger().level:
+            logging.getLogger().setLevel(root_old_level)
+
+
+def _enrich_with_logs(result: dict, log_buffer: io.StringIO, debug: bool) -> dict:
+    """Attach captured logs to a result dict when debug=True or on error."""
+    logs = log_buffer.getvalue()
+    if logs and (debug or not result.get("success", False)):
+        result["logs"] = logs
+    return result
+

 # ---------------------------------------------------------------------------
 # Monkey-patch mcp.shared.session.RequestResponder to work around a
@@ -115,18 +240,25 @@ login_manager = LoginManager()
 mcp = FastMCP("SchwabScraper")


+def _json_default(obj: Any) -> Any:
+    """JSON fallback handler that converts dataclasses to dicts before str()."""
+    if dataclasses.is_dataclass(obj) and not isinstance(obj, type):
+        return dataclasses.asdict(obj)
+    return str(obj)
+
+
 def serialize(obj: Any) -> str:
    """Safely serialize Pydantic models or dataclasses to JSON string."""
    if hasattr(obj, "model_dump_json"):
        return obj.model_dump_json()
    elif hasattr(obj, "model_dump"):
-        return json.dumps(obj.model_dump(), default=str)
+        return json.dumps(obj.model_dump(), default=_json_default)
    elif isinstance(obj, list):
        return json.dumps([
            o.model_dump() if hasattr(o, "model_dump") else o
            for o in obj
-        ], default=str)
-    return json.dumps(obj, default=str)
+        ], default=_json_default)
+    return json.dumps(obj, default=_json_default)


 # ---------------------------------------------------------------------------
@@ -177,9 +309,71 @@ async def login(
            "data": None,
        })

-    result = await api.login(username=username, password=password, debug=debug)
-    success = result.get("success", False)
-    login_manager.record_attempt(success)
+    mcp_logger = logging.getLogger("schwab_mcp_custom")
+    mcp_logger.info("=== LOGIN TOOL CALLED ===")
+    mcp_logger.info(f"debug={debug}, username_provided={bool(username)}, password_provided={bool(password)}")
+
+    # Diagnostic: if credentials not provided, show what config path would be used
+    if not username or not password:
+        from schwab_scraper.core.config import get_config_path
+        config_path = get_config_path()
+        config_exists = os.path.exists(config_path)
+        mcp_logger.info(f"Config fallback: path={config_path}, exists={config_exists}")
+
+    with capture_logs(level=logging.DEBUG if debug else logging.INFO) as log_buf:
+        mcp_logger.info("capture_logs context entered")
+        if debug:
+            mcp_logger.info("DEBUG MODE ENABLED — verbose logging active")
+
+        # api.login does not exist in unified_api; call the underlying scraper directly
+        from schwab_scraper.browser.auth import login_to_schwab
+        from schwab_scraper.core.config import get_schwab_credentials, load_config
+
+        if not username or not password:
+            config = load_config()
+            username, password = get_schwab_credentials(config)
+
+        if not username or not password:
+            result = {
+                "success": False,
+                "error": "Username and password are required (or set in config.json)",
+                "error_type": "AUTHENTICATION",
+                "retryable": False,
+                "data": None,
+            }
+        else:
+            try:
+                cookies = await login_to_schwab(username, password)
+                if cookies:
+                    result = {
+                        "success": True,
+                        "data": {"cookies_count": len(cookies)},
+                        "error": None,
+                        "error_type": None,
+                        "retryable": False,
+                    }
+                else:
+                    result = {
+                        "success": False,
+                        "error": "Login failed — no cookies returned. Check credentials or 2FA status.",
+                        "error_type": "AUTHENTICATION",
+                        "retryable": True,
+                        "data": None,
+                    }
+            except Exception as exc:
+                result = {
+                    "success": False,
+                    "error": str(exc),
+                    "error_type": "UNKNOWN",
+                    "retryable": True,
+                    "data": None,
+                }
+
+        success = result.get("success", False)
+        login_manager.record_attempt(success)
+        mcp_logger.info(f"login completed — success={success}")
+        result = _enrich_with_logs(result, log_buf, debug)
+    mcp_logger.info("capture_logs context exited, returning result")
    return serialize(result)


@@ -190,7 +384,9 @@ async def refresh_session(debug: bool = False) -> str:
    Args:
        debug: Enable debug logging
    """
-    result = await api.refresh_session(debug=debug)
+    with capture_logs(level=logging.DEBUG if debug else logging.INFO) as log_buf:
+        result = await api.refresh_session(debug=debug)
+        result = _enrich_with_logs(result, log_buf, debug)
    return serialize(result)


@@ -272,6 +468,20 @@ async def get_morningstar_data(ticker: str, debug: bool = False) -> str:
        debug: Enable debug logging
    """
    result = await api.get_morningstar_data(ticker, debug=debug)
+
+    # When the scraper used blob URLs (modern Schwab web components), report_url
+    # is None even though the PDF was downloaded and parsed successfully.  Point
+    # callers at the MCP server's cached-PDF endpoint instead.
+    if (
+        isinstance(result, dict)
+        and result.get("success")
+        and result.get("data") is not None
+    ):
+        data = result["data"]
+        if hasattr(data, "report_url") and data.report_url is None and data.source is not None:
+            base = os.getenv("SCHWAB_MCP_BASE_URL", _DEFAULT_BASE_URL).rstrip("/")
+            data.report_url = f"{base}/reports/{ticker.upper()}/pdf"
+
    return serialize(result)


@@ -284,9 +494,31 @@ async def upload_cookies(cookies_json: str) -> str:
    """
    try:
        cookies = json.loads(cookies_json)
-        with open("cookies.json", "w") as f:
-            json.dump(cookies, f)
-        return json.dumps({"status": "success", "message": "cookies.json updated successfully"})
+
+        # Some browser extensions wrap cookies in an object (e.g. {"cookies": [...]})
+        if isinstance(cookies, dict):
+            if "cookies" in cookies:
+                cookies = cookies["cookies"]
+            else:
+                return json.dumps({
+                    "status": "error",
+                    "message": "Expected a list of cookies or an object with a 'cookies' key",
+                })
+
+        if not isinstance(cookies, list):
+            return json.dumps({
+                "status": "error",
+                "message": f"Expected a list of cookies, got {type(cookies).__name__}",
+            })
+
+        from schwab_scraper.core.config import get_cookies_path
+        cookies_path = get_cookies_path()
+        with open(cookies_path, "w") as f:
+            json.dump(cookies, f, indent=2)
+        return json.dumps({
+            "status": "success",
+            "message": f"{cookies_path} updated with {len(cookies)} cookies",
+        })
    except Exception as e:
        return json.dumps({"status": "error", "message": str(e)})

@@ -320,10 +552,27 @@ async def health(request):
    return JSONResponse({"status": "ok"})


+async def serve_report_pdf(request):
+    """Serve a cached Morningstar report PDF by ticker."""
+    ticker = request.path_params["ticker"].upper()
+    pdf_bytes = read_cached_pdf(ticker)
+    if not pdf_bytes:
+        return JSONResponse(
+            {"error": f"No cached report for {ticker}. Call get_morningstar_data first."},
+            status_code=404,
+        )
+    return Response(
+        pdf_bytes,
+        media_type="application/pdf",
+        headers={"Content-Disposition": f'inline; filename="{ticker}_morningstar.pdf"'},
+    )
+
+
 mcp_app = mcp.http_app()
 app = Starlette(
    routes=[
        Route("/health", health),
+        Route("/reports/{ticker}/pdf", serve_report_pdf),
        Mount("/", app=mcp_app),
    ],
    lifespan=mcp_app.lifespan,
Author	SHA1	Message	Date
b3nw	b06fc47d29	fix: use hardcoded default base URL for report_url fallback All checks were successful Build and Push Docker Image / build (push) Successful in 43s Details The SCHWAB_MCP_BASE_URL env var may not be set in all deployment environments. Default to https://schwab-mcp.ext.ben.io so the report_url enrichment works out of the box.	2026-05-21 14:55:42 +00:00
b3nw	27d1e2be10	fix: resolve report_url=None for blob-URL tickers and fix dataclass serialization All checks were successful Build and Push Docker Image / build (push) Successful in 1m4s Details When Schwab uses modern blob URLs (increasingly common), find_report() returns __CLICK_TO_OPEN__ and the scraper skips storing a report_url even though the PDF downloads and parses successfully. This caused agents to see report_url=None for tickers like PEP/BR/DPZ/MSCI/BMI. Changes: - Fix serialize() to use dataclasses.asdict() instead of str() for dataclass payloads, producing proper JSON objects instead of Python repr strings - Add /reports/{ticker}/pdf endpoint to serve cached Morningstar PDFs - Enrich report_url with the MCP's own PDF endpoint when blob URLs were used and the report was successfully downloaded - Add SCHWAB_MCP_BASE_URL env var to compose for self-referential URLs	2026-05-21 14:46:08 +00:00
b3nw	0e048a1e08	ci: keep schwab-mcp rebuilds dispatch-only All checks were successful Build and Push Docker Image / build (push) Successful in 40s Details	2026-05-20 15:09:37 -05:00
b3nw	7550e39add	ci: add manual and scheduled image rebuilds All checks were successful Build and Push Docker Image / build (push) Successful in 1m7s Details	2026-05-20 13:36:43 -05:00
b3nw	cc1226defe	fix(ci): use git -C instead of cd to avoid breaking working directory All checks were successful Build and Push Docker Image / build (push) Successful in 42s Details The previous cd vendor/schwab-scraper caused the subsequent echo to write mcp-server-commit.txt into the wrong path, which made the CI build fail.	2026-05-04 14:36:52 +00:00
b3nw	4982b7d09f	feat: log schwab-scraper and mcp-server commit SHAs at container startup Some checks failed Build and Push Docker Image / build (push) Failing after 39s Details Bake commit SHAs into the Docker image via CI and log them on server startup so it's easy to verify which version of schwab-scraper is running.	2026-05-04 14:31:01 +00:00
b3nw	8c196b7f65	fix(server): repair login tool and harden upload_cookies All checks were successful Build and Push Docker Image / build (push) Successful in 38s Details - login tool was calling api.login() which did not exist in unified_api, causing AttributeError on every invocation. Now calls login_to_schwab directly with proper credential fallback to config.json. - upload_cookies hardcoded 'cookies.json' instead of get_cookies_path(), and did not handle wrapped export formats ({cookies: [...]}). Both fixed. - Result envelopes now match the standard {success, data, error, error_type, retryable} shape used by other tools.	2026-04-28 04:15:18 +00:00
b3nw	9f799ee264	feat(logging): trace credential source and config path in login tool All checks were successful Build and Push Docker Image / build (push) Successful in 39s Details Add diagnostic logging to the MCP login tool handler: - Log whether username/password were provided explicitly - If falling back to config, log the resolved config path and whether it exists - This complements upstream scraper v0.6.18 credential diagnostics Bumps version to 0.2.1.	2026-04-28 02:52:09 +00:00
b3nw	d28b9d32f6	test(option-a): point SCHWAB_PLAYWRIGHT_URL to CLI's browserless endpoint All checks were successful Build and Push Docker Image / build (push) Successful in 38s Details Temporarily switch from the local schwab-browser sidecar to the browserless endpoint used by the working CLI (browser.local.ben.io). This tests whether /assert 403 is caused by browser environment drift.	2026-04-28 02:39:20 +00:00
b3nw	f51e61b8d7	fix(logging): configure stderr logging + tee capture, add debug confirmation All checks were successful Build and Push Docker Image / build (push) Successful in 37s Details - Set up logging.basicConfig() at module load so scraper logs reach stderr (visible in docker logs instead of silently dropped) - Replace StringIO-only capture with TeeHandler that writes to BOTH stderr and the StringIO buffer, so logs remain visible in docker while also being returned in tool responses - Add explicit 'LOGIN TOOL CALLED' and 'DEBUG MODE ENABLED' log lines at the start of the login tool so users can verify logging is active	2026-04-28 02:16:31 +00:00
b3nw	1999392df7	fix(mcp): capture scraper logs and return them in tool responses All checks were successful Build and Push Docker Image / build (push) Successful in 38s Details Scraper debug output goes to stderr which is invisible in MCP stdio mode. Add capture_logs context manager that attaches a StringIO handler to the schwab_scraper logger during tool execution, then includes captured logs in the response envelope when debug=True or on failure. Applied to login() and refresh_session() which are the critical paths for authentication diagnostics.	2026-04-28 02:04:58 +00:00
b3nw	0c23b0e261	fix(ci): use CRT_READ_ONLY for cross-repo clone All checks were successful Build and Push Docker Image / build (push) Successful in 41s Details actions/checkout@v3's Basic auth header pattern fails with 403 when accessing a different private repository. Switch to a plain git clone with the CRT_READ_ONLY token embedded in the HTTPS URL.	2026-04-28 01:40:42 +00:00