BYOK design workflow: run Claude, Codex, or Qwen on your own key
Most AI design tools quietly add a margin to every token you spend. Open Design takes the opposite stance — bring your own model key, pay the provider directly, and keep full control of where inference runs. Here's how the BYOK layer actually works.
If you’ve used a hosted AI design product in 2026, you’ve probably noticed the bill creeping up. A subscription on top of a per-seat charge, layered on top of an inference markup that nobody publishes. The math is opaque on purpose.
Open Design doesn’t run inference. We don’t have a margin on tokens. The entire workflow is built around bring-your-own-key (BYOK) — you point the daemon at any OpenAI-compatible endpoint, paste your own API key, and you’re done.
This post explains why we made that choice, how it works under the hood, and what it actually changes in your day-to-day workflow. If you want the bigger philosophical argument behind it, why we built Open Design as a skill layer, not a product is the companion piece — this one is the hands-on version.
What “BYOK” really means here
There are two definitions of BYOK floating around the AI tooling space, and they’re not the same thing:
- Surface BYOK — the tool lets you paste a key, but still routes inference through their servers, logs your prompts, and may apply rate limits.
- Real BYOK — the tool calls the model provider directly from your machine (or your infrastructure). Your prompts never touch the vendor’s servers. The vendor takes no margin.
Open Design is the second kind. The daemon makes HTTP calls to whichever endpoint you configure, with your key, from your machine. We don’t proxy. We don’t log. We don’t see your prompts.
Where the call actually goes
When the daemon picks up a job, it composes the prompt — pulling in the relevant SKILL.md and DESIGN.md files for the task — and then makes a single HTTP request to the base URL you set. The response streams back to your machine, the agent writes the artifact to disk, and that’s the whole loop. There is no Open Design server in the path. The same daemon that discovers your skills also owns the network call, which is why “where does this run?” is a setting and not a sales conversation.
The OpenAI-compatible adapter
Most AI inference endpoints in 2026 speak the OpenAI Chat Completions API. We use that as the lowest-common-denominator protocol. If your provider speaks it (and almost all of them do), you’re supported by default — no plugin, no per-provider integration to wait on.
Providers you can point it at
| Provider | Typical base URL shape | Good for |
|---|---|---|
| OpenAI | https://api.openai.com/v1 | gpt-image-2, gpt-5.x, strongest general passes |
| Anthropic | OpenAI compat shim, or the dedicated Claude adapter | taste-heavy refinement, long briefs |
| DeepSeek | https://api.deepseek.com/v1 | cost-efficient long-context drafting |
| Groq | provider base URL | low-latency draft cycles |
| OpenRouter | https://openrouter.ai/api/v1 | any frontier model, one billing relationship |
| Self-hosted vLLM / TGI / Ollama | your own host, e.g. http://localhost:11434/v1 | fully local, client-confidential work |
| Qwen / Kimi / Hermes | provider base URL | regional models with OAI-compatible endpoints |
The list isn’t a hard-coded allowlist — it’s just where people commonly land. Anything that answers the Chat Completions shape works.
Two fields, then restart
Configuration is two fields:
OPENAI_BASE_URL=https://api.deepseek.com/v1
OPENAI_API_KEY=sk-…
Drop them in .env.local, restart the daemon, and you’re on a different model. Switching to a local Ollama box for a sensitive project is the same two lines:
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama
There’s no model registry to update, no account to re-link, no migration. The key and the endpoint are the entire surface.
Why this matters for design work
Design workflows have a specific cost shape that hosted-inference products are bad at:
- Iteration is the unit of work. A real design pass means 30–50 prompt cycles, not three. Hosted plans throttle hard at the 50-cycle mark.
- Long context is the norm. A serious brief involves brand documents, prior work, system specs, and reference imagery. That context blows past the token budgets in hosted UIs.
- Model choice should be ad-hoc. Some passes want a fast cheap model. Some want the strongest available. Some want a local model for sensitive content. A hosted product picks one for you.
BYOK fixes all three. You pay per token, you choose the model, you don’t get throttled.
Iteration stops being rationed
This is the one that quietly changes how you work. When every extra cycle is metered against a plan, you start self-censoring — you take the third draft because the fourth feels expensive. On BYOK the marginal cost of one more pass is a few cents at the model provider, so the decision goes back to being about the work, not the meter. The third draft is usually where the design gets good; a tool that taxes iteration is taxing the exact step that matters.
What about cost?
A common worry: “If I’m paying directly, won’t it be more expensive?”
In practice, no. Here’s a typical day of design work in our internal usage:
| Task | Tokens | Provider | Cost |
|---|---|---|---|
| Brief intake (3 docs) | 30K input | Claude Sonnet | $0.09 |
| First draft pass | 80K input + 20K output | Claude Sonnet | $0.54 |
| 5 iteration cycles | 250K input + 80K output | Claude Sonnet | $1.95 |
| Final polish | 50K input + 30K output | Claude Opus (one pass) | $1.35 |
| Day total | ~$3.93 |
That’s a deck, two landing variants, and a brand exploration. The hosted equivalent — assuming a $30/month “creator” plan with overage charges — would run about $50 for the same work, give you fewer iterations, and lock you to one model.
If you want to go cheaper, swap Claude Sonnet for DeepSeek V3.2 and the day drops under $1. The point isn’t that one model is right — it’s that the price/quality dial is in your hands rather than baked into a subscription tier.
Privacy and compliance
There’s a second reason BYOK matters: the prompts contain your client’s brand.
Hosted inference means routing brand documents, unannounced product names, internal pricing, and pre-launch creative through a third party’s servers. Most companies have an opinion about that. Some have a contract about it.
With BYOK, the prompt round-trip is between your laptop and the model provider you’ve already vetted (or self-hosted). Open Design is not in the loop. We have no log to subpoena, no breach surface to leak from, no audit gap to explain.
What “no log” buys you in practice
For agency work, regulated industries, or anything pre-launch, this is the only stance that holds up. If a security review asks “where do our brand assets go?”, the answer is “to the model provider in our contract, and nowhere else” — not “to a vendor dashboard we don’t control.” Self-hosting an Ollama or vLLM endpoint tightens it further: the bytes never leave your network at all. This is the same trade-off explored in the BYOK reality check, which is honest about where the rough edges still are — local models and frontier models are not interchangeable on taste, and you own the prompt-injection surface yourself.
How to switch providers mid-project
One of the underrated benefits of BYOK is provider arbitrage during a project:
- Drafting — use a cheap model (DeepSeek V3.2, Qwen 3) on the question form and first iteration
- Refinement — switch to Claude Sonnet or GPT-5 for the middle passes where taste matters
- Sensitive content — swap to a local Ollama model for client-confidential prompts
- Final polish — burn one pass on the strongest model available (Opus, GPT-5 Pro)
In Open Design, switching is editing two lines in .env.local. There’s no migration, no re-onboarding, no plan upgrade.
A worked routing for one brief
Concretely, a single landing-page brief might run like this. For the draft and first iterations — cheap and fast — point at a low-cost provider:
OPENAI_BASE_URL=https://api.deepseek.com/v1
OPENAI_API_KEY=sk-…
Then, for the passes where taste decides the outcome, switch to a stronger model (via the OpenAI-compat shim):
OPENAI_BASE_URL=https://api.anthropic.com/v1 # via the compat shim
OPENAI_API_KEY=sk-ant-…
Same skills, same design system on disk, same artifacts — only the engine behind the workflow changed. Because skills and systems are just files (SKILL.md and DESIGN.md), nothing about your setup is tied to a particular model. This is what owning the workflow actually means: the tool gets out of the way, and the model is a parameter you change as the brief demands.
Try it
Clone the repo, set OPENAI_BASE_URL and OPENAI_API_KEY in .env.local, run pnpm tools-dev. The daemon will use whatever endpoint you point it at, with whatever model you pay for, on whatever schedule you want.
That’s the entire BYOK story. There’s no special tier, no upgrade flow, no billing relationship with us. You pay the model provider, you keep your keys, you keep your prompts. We provide the layer.
Related reading
- Why we built Open Design as a skill layer, not a product — the bet behind shipping a thin layer instead of a hosted app
- The BYOK reality check: 5 things that break — the honest trade-offs and rough edges of bringing your own key
- 31 skills, 72 systems: how the Open Design library works — the
SKILL.md/DESIGN.mdfiles that stay constant no matter which model you run