Coding Agent Concept Cars

by Martin Monperrus

Concept cars are prototypes built by automakers not to sell, but to explore design directions — to ask “what if?” without committing to mass production. The coding agents I’ve been building fore: each one pushes one idea to its logical extreme, for fun and profit.

The Meta-Circular Agent (Feb 2026)

A meta-circular evaluator is a Lisp interpreter written in Lisp — a program that can process its own source. This agent does the same thing for coding agents. The setup: a specification describes a minimal coding agent. Then ask a coding agent to implement it. The result is a working coding agent. Then give that agent the same specification and ask it to implement itself. It succeeds.

The agent reimplements itself from its own spec. Meta-circularity achieved.

The Seed Agent (June 2026)

A coding agent normally starts with a fixed set of tools: read, write, execute. The seed agent starts with exactly one: create_tool. Its only capability on turn one is to write new Python functions and register them into its own live session.

Given the task “explore the agentknit package and count its public functions,” it first creates a find_module_path tool, then a list_directory tool, then a count_file_lines tool — and only then starts doing the actual work.

The insight: tool creation is itself a tool. An agent that can extend itself needs no prebuilt scaffolding. One meta-tool is enough to reach any toolset. See the trace.

The Self-Reflect Agent

Every agent session is a trajectory: a sequence of tool calls, outputs, and reasoning steps. This agent runs its task, then reads its own trajectory using trajectoriz-cli, reflects on what went wrong or could be tighter, and rewrites its own system prompt. The improved prompt is saved to disk and used in the next session.

The idea is a closed loop: task → trace → reflect → better prompt → repeat. The agent is its own prompt engineer.

The Remote Control Agent

Most coding agents operate locally. This one operates over SSH. Its tools — read_file, write_file, execute_shell — each open an SSH connection, run a command on a remote host, and return the output. The model has no idea it’s not local.

The design question it explores: can you separate “where the model runs” from “where the code runs”? The answer is yes, cleanly. The agent uses a DeepSeek model via an OpenCode endpoint, controls a machine at KTH, and the only thing connecting them is three SSH-wrapper tools.

The Task Difficulty Agent

This agent has exactly two tools: read_file and ask_question. It reads the repository, asks the user clarifying questions, and outputs a single adjective — trivial, easy, moderate, hard, expert — describing how hard the task is.

The purpose: route cheap tasks to cheap models. A task rated “trivial” doesn’t need Claude Opus. A well-calibrated difficulty signal saves money at scale.

The constraint (two tools, one output) is deliberate. Constraints force clarity about what the agent actually needs to know.

The A/B Testing Agent

Two agents, one prompt, two git worktrees, run in parallel. When both finish, you see a side-by-side diff and pick the better solution. Results are logged to classifier_results.jsonl.

The use case: you have a hypothesis that model A handles refactoring better than model B. You run 50 prompts through both, aggregate the wins, and measure alignment with a reference (Claude’s judgment). It’s model evaluation by doing, not by benchmark.

The Async Agent

Standard coding agents are synchronous: call a tool, wait, continue. This agent’s execute_shell_command returns immediately with a tool_exec_id and file paths for stdout/stderr. A separate query_tool_exec tool polls for completion.

This matters for long-running tasks — compilation, test suites, deployments. The model can issue multiple commands in flight, check on them, interleave reasoning. It’s the difference between blocking I/O and async I/O, applied to agent tool calls.

The model also has a plan_shell_command(when) tool that schedules a command for N minutes in the future — a primitive cron, inside the agent.

The Second-Guess Agent

Before every shell command executes, this agent waits two seconds. That pause is not a bug; it’s the design. The operator — human or supervisor LLM — has two seconds to hit Ctrl-C.

Two seconds is roughly the inference time for a small supervisor model to classify the pending command as safe or dangerous. The agent is built for a world where every exec call is observable and cancellable before damage is done.

The Slash Agent

Slash commands (/model, /clear, /usage, /help) are normally operator controls: the human types them. This agent exposes them as structured tool calls the LLM can invoke directly.

The model can switch its own model mid-session, check its own token usage, and clear context when it decides the conversation is getting too long. It is its own session manager.

The Probing Agent

Before any of the above agents can run against an arbitrary model, someone needs to know how that model delivers tool calls — native JSON schema, XML tags, inline JSON, or something else. The probing agent figures this out automatically, caches the result as agent_spec_<model>.json, and shares it across all future sessions.

One probe, one cached spec, any model. It’s the adapter layer that makes all the other agents model-agnostic.


None of these are production systems. They’re design sketches. The seed agent is impractical for most tasks (why start with one tool if you can start with ten?). The second-guess agent adds latency by design. The self-reflect agent assumes the trajectory is useful signal, which isn’t always true.

But concept cars don’t need to be practical. They need to be clear about one idea. Each of these is.

The common substrate is agentknit, a minimal Python library for building tool-calling agents. The probes run against 20+ models across 5 providers without code changes.