case study · 04

Dealer Tools

Tool calls before tool calls had a name.

While working at ACME Autos we shipped one of the first AI-enabled customer support agents, before agents were a thing. Tool calling was hand-coded, pre-harnesses, pre-SDKs, pre-MCP. The case study below runs a hardened, period-styled version. Talk to it. Try to break it. (You're welcome to. It won't.)

Dealer Tools illustration

~ what shipped ~

Three numbers that matter.

metric · 01

Pre-spec tools

Function calling shipped at OpenAI in mid-2023. The dealer-chat work was already in production using a hand-rolled XML-tag protocol that did the same job: structured tool descriptions, structured argument parsing, structured tool results. When the official spec landed, migration was a renaming exercise.

metric · 02

1 tool, hardened

Inventory lookup. Single function, narrow contract, well-tested. The first principle of useful tool calling: fewer tools, sharper boundaries, more reliable behavior. The demo on this page exposes exactly that one tool, scoped to tongue-in-cheek answers.

metric · 03

Adversarial-tested

The live demo is hardened against prompt injection, code-extraction, off-topic abuse, role-play takeovers, and rate-limit attacks. Per-IP minute and hour caps, per-message length caps, session token budget, refusal patterns for known abuse shapes. Try to make it break character.

The 2023 build

ACME Autos (a major auto-dealer platform) wanted AI in the consumer-facing chat. The market opening was real: a customer browsing inventory at 11pm has no salesperson available. The risk was just as real: one bad answer in front of a brand puts the dealer in an awkward seat.

OpenAI didn't ship structured tool calling until later that year, so we built it ourselves. A small protocol of XML-tagged tool descriptions in the system prompt, structured argument parsing on the response, and a tool runner that executed inventory lookups against the dealer's actual stock feed. When the OpenAI spec landed, the migration to the official format was effectively a rename.

The discipline that mattered

What made it work wasn't the model. The discipline was: one tool, narrow boundary, ruthless refusal posture for everything else. Wide-open chat surfaces became liabilities; chat surfaces with one well-scoped tool became products.

The demo, hardened

Below is a faithful re-implementation of that pattern, on a 2026 stack (small modern model, modern hosting), with the same posture. Talk to it. Ask about a car, ask for a trade-in valuation, try to convince it to write you a Python script, try to override its system prompt. The refusal patterns and the rate limits are real defenses, not theater.

ACME AUTOS · LIVE CHAT v1.0 · 2023

online

ACME-CHAT-1.0

Welcome to the live demo of the dealer chat I built back in 2023. Hand-rolled tool calling, GPT-3.5 under the hood (this version runs on a small modern model, but the shape is the same). Try a make/model, or ask about your trade-in.

per-IP rate limits · prompt-injection guards · session token cap 0 / 800

What's running under it

↳ build for the brand, not for the lab. The one-tool surface beats the everything-tool surface every time the boss reads the chat logs.

~ on the workbench ~

The tooling.

~ counterfactual ~

What would have been worse.

Without the constrained tool-calling discipline: a chat surface attached to a brand-name dealer site becomes a free LLM playground. Customers ask for code samples, recipe ideas, political opinions, and competitors' inventory. The brand wears every one of those answers. The wide model gets narrowed by careful prompt design, deterministic tool boundaries, and a refusal posture that stays in voice. The site becomes a dealer chat that talks about cars and only cars.

~ got something like this on the bench? ~

Pull the cord.

Start the conversation

/ dealer-tools / built by hand / shipped to a working URL /