12 min read

AI Development

Why Prompt Engineering Doesn’t Scale (And What Does)

Every developer has a prompt graveyard. A folder of one-off requests that worked once and never again. Here’s why prompt engineering is a dead end—and what actually scales.

You’ve got a prompt that works.

Maybe it took you an hour to get right. Maybe you found it on Twitter. Maybe you refined it over weeks until Claude or Cursor finally produces code you don’t hate.

And then you start a new project.

Or a teammate asks how you got such good results.

Or you switch from Cursor to Claude Code.

And suddenly that perfect prompt is useless. You’re starting over. Again.


The Prompt Graveyard

Every developer I know has one. A notes file, a Notion page, a folder of .md files filled with prompts that worked once. Prompts for generating tests. Prompts for code review. Prompts for that specific thing you do with TypeScript.

The graveyard grows. The prompts rot.

They rot because prompts are ephemeral. They assume context that no longer exists. They reference patterns you’ve since abandoned. They were tuned for a specific model version that’s now outdated.

You don’t maintain prompts. You abandon them and write new ones.

This is the dirty secret of “prompt engineering”: it’s not engineering at all. It’s guessing. It’s trial and error with no version control, no tests, no way to know if it still works.


”Just Write Better Prompts” Doesn’t Work

I’ve heard this advice a hundred times. Be more specific. Add examples. Use system prompts. Chain your requests.

Fine. But then what?

You write a better prompt. It works. Now:

  • How do you share it with your team?
  • How do you use it across projects?
  • How do you use it in Cursor and Claude Code and Gemini?
  • How do you update it when your stack changes?
  • How do you know when it stops working?

You can’t. Because prompts aren’t infrastructure. They’re text you paste into a chat box.

The “prompt engineering” discourse treats AI like a magic box you need to speak to correctly. Say the right words, get good code. Say the wrong words, get garbage.

But that’s not how we build software. We don’t rely on incantations. We build systems.


The Missing Abstraction

Think about how we work with any other tool in our stack.

ESLint has configuration files. TypeScript has tsconfig.json. Prettier has .prettierrc. These aren’t prompts—they’re infrastructure. They’re declarative. They’re version-controlled. They work the same way every time.

AI coding assistants have… what? A text box. Maybe a system prompt buried in settings. Maybe some custom instructions you set up once and forgot about.

We’re treating AI like a novelty instead of a tool. And that’s why it feels unreliable.

What’s missing is a layer between “paste a prompt” and “hope for the best.” A way to define:

  1. What you want done — not as a one-off request, but as a repeatable workflow
  2. Who should do it — specialized behaviors for different types of work
  3. What they should know — context about your stack, your patterns, your standards

These aren’t prompts. They’re abstractions. And they need names.


Commands, Agents, and Skills

Here’s the mental model that actually scales:

Commands are workflows you trigger explicitly. Not “write me a test” but /test—a defined process that knows how to write tests for your stack, run them, and fix failures. Same command, same behavior, every time.

Agents are specialized personas. A Planner that breaks down features. An Engineer that writes production code. A Debugger that traces root causes. Each focused on what they’re best at, with appropriate tools and constraints.

Skills are framework knowledge that activates automatically. Working in a Next.js app? The agent knows about Server Components, the app directory, and your data fetching patterns—without you explaining it every time.

This isn’t prompt engineering. It’s system design.

Commands are like CLI tools. Agents are like team roles. Skills are like documentation that’s always in context.

And critically: they’re all just files. Markdown files you can edit, version control, and share.


AI as Part of the Codebase

Here’s what changes when you treat AI configuration as infrastructure:

It’s portable. Same commands work across projects. Start a new repo, initialize your config, and /build works the same way it did yesterday.

It’s shareable. Commit your AI config to git. Now the whole team has the same workflows. No more “what prompt do you use for code review?”

It’s maintainable. When your patterns change, update the files. When a new model comes out, test your commands against it. When something breaks, you can actually debug it.

It’s composable. Commands can reference agents. Agents can load skills. Skills can be combined. You’re building a system, not collecting one-liners.

This is what I mean by “AI as part of the codebase.” Not prompts you paste into a chat. Configuration that lives alongside your code, evolves with your code, and works like every other tool in your stack.


The Fragmentation Problem

There’s one more thing that makes this hard: every AI tool does it differently.

Cursor wants rules in .cursor/rules/ with YAML frontmatter. Claude Code wants slash commands in .claude/commands/. Gemini wants TOML files. They’re all slightly different formats, different conventions, different capabilities.

So even if you build this system—commands, agents, skills—you build it three times. Or you pick one tool and lock yourself in.

This is the problem I’ve been working on with Dotpack. One source of truth that compiles to any format:

.dotpack/           →    .gemini/
├── commands/       →    .cursor/rules/
├── agents/         →    .claude/
└── skills/

Edit once. Generate for every tool. Switch tools without losing your system.

But honestly, the abstraction matters more than the implementation. Commands, agents, and skills will scale whether you use Dotpack or build your own. The point is to stop thinking in prompts and start thinking in systems.


Stop Optimizing Sentences. Start Designing Systems.

Prompt engineering is a dead end. You’ll spend forever tweaking words, and you’ll never escape the copy-paste-pray cycle.

What scales is structure:

  • Commands that encode your workflows
  • Agents that specialize in different work
  • Skills that carry your stack knowledge

Build this system once. Use it everywhere. Maintain it like code.

That’s not prompt engineering. That’s engineering.


I’m building Dotpack to make this practical—preconfigured commands, agents, and skills that work across AI tools. If you’re tired of the prompt graveyard, check it out.