BYOK reality check: 5 things that break in Open Design today
We promised BYOK as first-class. Five open bug threads from this week — Gemini, DeepSeek, OpenCode, Windows — show where the seams are still rough, and what to use until each fix lands.
We’ve been telling people Open Design is BYOK from the ground up. That’s still true. The seed post on the BYOK design workflow walks through the working path — point the daemon at any OpenAI-compatible endpoint, paste your key, you’re done. For the majority of setups, that’s the whole story, and it stays the whole story.
But “BYOK” isn’t a single feature. It’s a contract that reaches into the chat composer, the finalize endpoint, the model picker, the CLI launch path, and the analytics layer. Each of those is a place where the contract can break — and right now several of them are open issues in our public tracker, reported by users in the last 48 hours.
We could have written the seed post and stopped there. Instead, here’s the honesty pass: the threads that came in this week, what breaks, why it breaks, what to do today, and which PR (or roadmap slot) is fixing it. None of these are hidden. They’re filed, labelled, and linked below — and we’d rather you read them from us than discover them mid-deck on a Friday.
The promise vs the bug list
The framing matters, because the easy misread is “BYOK is half-baked.” It isn’t. None of the five below are “BYOK doesn’t work” bugs. Every one of them lives at the boundary between an adapter we own — the OpenAI-compatible layer, the model picker, the finalize path — and one we don’t: the upstream provider’s CLI, their packaging choices, or the host platform’s process model.
That boundary is where reality lives in any open-source CLI orchestrator. We don’t run inference, we don’t ship a forked CLI for each provider, and we don’t wrap everything in a proxy that smooths the edges (and quietly taxes your tokens). The price of that posture is that when a provider’s CLI changes shape, or Windows enforces a limit macOS doesn’t, the seam shows. This week, five of those seams showed at once.
Here are all five, in the order they came in.
Gemini gets lost on the way to “Finish Design”
Issue #1619 — bug, open
Symptom
BYOK is configured for Gemini. The Settings test connection succeeds. The model picker returns Gemini models. Regular chat works — you can hold a full conversation against your own Gemini key with no trouble. But the moment the user hits Finish Design, the daemon throws an Anthropic-shaped error, as if it suddenly forgot which provider it was talking to.
Why it happens
The maintainer reply on the thread confirms it: regular API-mode chat honors the selected Gemini BYOK provider end to end, but Finish Design has not yet been generalized beyond the Anthropic-compatible finalize path. Everything else routes through the provider-aware proxy that knows how to speak each upstream’s dialect. Finish Design still goes through a hardcoded Anthropic finalize endpoint left over from an earlier release — so a Gemini response arriving in a non-Anthropic shape trips the parser.
Workaround
Route Gemini through OpenRouter under the Anthropic-compatible provider slot. The Finish Design path then sees an Anthropic-shaped response coming back from OpenRouter’s shim and finalizes correctly. It’s an extra hop, and you’re paying OpenRouter’s routing rather than calling Gemini directly, but it’s stable today and it covers the one path that’s broken without touching the chat path that already works.
Who’s fixing it
The Finish Design generalization is now on the BYOK roadmap as a P1. No PR yet — this is the next thing the daemon team picks up, and it’s the only one of the five that’s a defect in code we fully own rather than a boundary mismatch.
Gemini 3 Flash dies on Windows before the prompt lands
Issue #1611 — bug, open
Symptom
Gemini 3 Flash Preview fails inside Open Design on Windows with stdin: write EOF after about 1.5 seconds — before the prompt ever reaches the model. Gemini 3 Pro works fine in the exact same install. And the direct Gemini CLI (gemini --model gemini-3-flash-preview ...) succeeds when GEMINI_CLI_TRUST_WORKSPACE=true is set. So it’s not the key, not the account, and not the CLI in isolation.
Why it happens
The diagnosis took two passes, which is worth showing because it’s a good example of how these get untangled. The first read of the reporter’s screenshot looked like an upstream 429 RESOURCE_EXHAUSTED quota error. After a clean PowerShell repro that wrote OD_GEMINI_3_FLASH_OK to stdout, the picture changed: the model is reachable, the CLI is healthy, the failure is on the Open Design → Gemini CLI launch path specifically, and it’s specific to the Flash variant on Windows. Pro takes the same launch path and survives; Flash doesn’t.
Workaround
Select Gemini 3 Pro Preview in the model picker. It runs through the same launch path and works. Separately — and this bit caught more time than the bug itself — check ~/.gemini/hooks/. A slow gsd-check-update.js hook (Hook execution error: Hook timed out after 60000ms) was adding roughly 104s of overhead to every run in this user’s case, completely independent of the Flash failure. Clean your Gemini hooks regardless; a stalled update-check hook will make any agent feel broken.
Who’s fixing it
Flagged as Flash-specific and OD-side. Investigation is in progress on the daemon’s stdin write path — the write EOF says the child’s stdin closed before the daemon finished writing the prompt, so the fix lives in how that particular variant is spawned.
DeepSeek TUI has a 30 KB prompt ceiling on Windows
Issue #1610 — bug, open
Symptom
On the DeepSeek wrapper v0.8.33 in a Windows packaged build, a long composed prompt fails our pre-flight guard with 81397 > 30000 bytes. The user did nothing wrong — they just composed a prompt rich enough (system context, design system, references) to cross 30 KB.
Why it happens
That guard is intentional, and it’s protecting you from a worse error. The DeepSeek TUI adapter currently sends the prompt as a positional command-line argument — argv-bound — and Windows caps the total command line well below where macOS and Linux do. Without the pre-flight check, the same prompt would fail later, deeper in the spawn, with a far less useful ENAMETOOLONG error and no hint that the cause was prompt size. So we fail early and name the number. The mismatch the issue exposes is in the docs: the high-level guidance implies Windows long-prompt fallbacks apply broadly, but the DeepSeek TUI path doesn’t have one yet — its transport is still argv, not stdin or a prompt file.
Workaround
If you’re on Windows with the DeepSeek TUI adapter, keep the composed prompt under 30 KB, or switch to a stdin-based adapter — Claude Code, Codex, and OpenCode all take their prompt over stdin and have no comparable ceiling. On macOS and Linux this issue does not bite at all; the argv limit there is high enough that real-world prompts don’t reach it.
Who’s fixing it
The right fix is a stdin or prompt-file transport for the DeepSeek TUI adapter, which removes the argv ceiling entirely and brings it in line with the stdin adapters. It’s tracked on the adapter team’s queue.
The OpenCode local-CLI test times out before the model warms up
Issue #1603 — bug, priority:p0, open
Symptom
In Settings → BYOK → OpenCode, the connection test reliably times out at 45 seconds. The strange part: if the user first opens OpenCode Desktop’s terminal and attaches a local LLM there, the same Open Design test then succeeds on the next try.
Why it happens
That “open the Desktop terminal first” detail is the whole clue. Open Design doesn’t attach to a running OpenCode Desktop session. For a Settings smoke test, the daemon spawns its own fresh OpenCode CLI subprocess and waits for an ok reply. With a cold local model — one that hasn’t been loaded into memory yet — that first reply can take longer than the 45-second budget, because the model is being read off disk and warmed before it can answer anything. Opening the Desktop terminal and letting it answer one prompt warms the model in the OS cache in a way the daemon’s fresh subprocess can then immediately benefit from. So it isn’t really an OpenCode bug; it’s a cold-start timing assumption that’s wrong for local models.
Workaround
Before testing OpenCode in Open Design, open OpenCode Desktop, attach your local LLM, and let it answer one prompt. Then run the OD connection test — the model is warm and the reply lands inside the budget. As of v0.7.0, the connection-test budget is also configurable, so if your local model is genuinely slow to load you can bump the window instead of warming it by hand.
Who’s fixing it
The daemon-side fix is a longer or configurable warmup window specifically for local-model adapters, so a cold local model doesn’t get judged on the same clock as a hosted API. It’s tracked at p0 — the highest priority of the five, because local-model users are exactly the audience BYOK is meant to serve.
The packaged web app refuses to load over plain HTTP
Issue #1620 — bug, open
Symptom
Slightly different bug, same family. The reporter is running the packaged web app on a LAN IP over plain HTTP, and the page throws on load — it never reaches a usable state.
Why it happens
After PR #1428, the analytics provider and the PDF export nonce started calling crypto.randomUUID() directly, bypassing the tiered helper introduced in PR #900 that falls back gracefully when the secure crypto API isn’t available. Chromium does not expose crypto.randomUUID in non-secure contexts — and a bare LAN IP over plain HTTP is, by Chromium’s definition, not a secure context. So the direct call throws at load time, and the page goes down with it. It isn’t strictly a BYOK bug, but it bites the exact same audience: people running their own infrastructure, often air-gapped, often over plain HTTP because standing up a cert for an internal tool isn’t worth the friction.
Workaround
Serve the web app over HTTPS or over localhost. Both satisfy Chromium’s secure-context requirement — localhost is treated as secure even without a cert — and the page loads normally. For a quick internal setup, localhost is the zero-cost path; for LAN access, a self-signed cert over HTTPS is the durable one.
Who’s fixing it
PR #1621 routes the remaining call sites back through the tiered UUID helper from PR #900, so the secure-context fallback applies everywhere again instead of only where it was already wired. It’s open and under review.
What this actually says about BYOK in Open Design
Read the list as a contract map, not a quality verdict. Four of these five issues sit at adapter boundaries — Gemini’s CLI launch path, DeepSeek’s argv-bound CLI transport, OpenCode’s cold-start launch model, the host platform’s secure-context rules. The fifth, the Finish Design one, is at our own finalize endpoint, where we hardcoded an Anthropic-shaped response a release ago and haven’t generalized it yet. That one’s on us; the other four are the tax you pay for respecting tools you didn’t build.
And that’s the structural point. Every BYOK system that isn’t a rebadged proxy ends up here. You either own the inference — and lose BYOK, because now you’re the one buying tokens and marking them up — or you respect the upstream tools and inherit their edges: their CLIs, their packaging quirks, the platform limits they each handle differently. We picked the second posture deliberately, and we’d pick it again. The cost is weeks that look like this one, where the daemon and adapter teams filed work across five surfaces in two days.
The trade is still right. A working setup on Claude Code, Codex, Cursor, Gemini Pro on macOS, and DeepSeek on Linux — the matrix that covers roughly 90% of our actual users — runs cleanly today, with no proxy tax and no margin on your tokens. The five threads above are what the other 10% of the matrix looks like in mid-May 2026: named, filed, and each with a fix in flight. Honest seams beat a smooth surface that hides where the bill goes.
What to use today (matrix)
This is the practical version of the section above — the same five seams, mapped onto what’s safe to reach for right now. A ✓ means the path works as-is; an ✗ links the issue that’s blocking it, with the workaround in the relevant section.
| Provider | macOS | Linux | Windows | Finish Design path |
|---|---|---|---|---|
| Claude Code (Sonnet / Opus) | ✓ | ✓ | ✓ | native |
| Codex | ✓ | ✓ | ✓ | native |
| Cursor (BYOK) | ✓ | ✓ | ✓ | native |
| Gemini 3 Pro Preview | ✓ | ✓ | ✓ | OpenRouter shim (#1619) |
| Gemini 3 Flash Preview | ✓ | ✓ | ✗ (#1611) | OpenRouter shim (#1619) |
| DeepSeek (API) | ✓ | ✓ | ✓ | OpenRouter shim |
| DeepSeek TUI (long prompts) | ✓ | ✓ | ✗ (#1610) | OpenRouter shim |
| OpenCode (local model) | ✓ | ✓ | ✓ (warm first, #1603) | n/a |
Two reads of this table. If your stack is in the all-✓ block — Claude Code, Codex, Cursor, or Gemini Pro — you’re on the clean path and nothing above changes your day. If you’re on one of the ✗ rows, the matching section has the workaround that gets you running today while the linked fix lands. Either way, subscribe to the BYOK label on the tracker if you want a notification when a row flips from ✗ to ✓.
What to do next
Open Design’s skills library is the working layer underneath all of this — the file-driven contracts the BYOK adapter feeds into once the connection is healthy. The seams above are about getting bytes from your key to the model and back; the skills are what the model actually does with them. If you want to see what a skill consumes from the model and what it doesn’t care about — which is also why these adapter edges don’t change the output, only whether you reach it — that directory is the right place to start.
Related reading
- BYOK design workflow: run Claude, Codex, or Qwen on your own key — the original BYOK explainer, and the working path the five seams above sit at the edge of
- Why we built Open Design as a skill layer, not a product — why we respect upstream tools instead of wrapping them in a proxy, which is the whole reason these boundaries exist
- 31 skills, 72 systems — how the library works — what BYOK actually feeds into once the connection is healthy