I caught myself doing it again last week. Pasting the same instructions into Claude for the third time that morning. “Here’s my voice. Here’s what I don’t sound like. Here are examples. Now write a tweet.”
Every time, the output was different. Not because the AI was bad — because I was giving it a different version of the instructions each time. Copy-pasting from notes. Paraphrasing from memory. Forgetting a constraint I’d figured out two weeks ago.
I tweet about speed and reps. But I was wasting both re-explaining the same stuff to an AI that has no memory between sessions.
So I built a system.
What I kept re-doing
The pattern was obvious once I stopped to look.
Every time I wanted a tweet drafted, I’d re-explain my voice. Blog post reviewed? Re-describe my audience. Code reviewed? Re-list the things I always check for.
Like hiring someone new every morning and onboarding them from scratch. Except worse — at least a new hire would remember yesterday.
I had a dozen things I regularly asked agents to do. Each one required context I’d already figured out but never written down in a way the agent could use consistently. The knowledge existed — scattered across chat histories, notes, and my head.
The playbooks
Instead of re-prompting every time, I’d write down the instructions once and let the agent read them before starting.
I call them skills. Each one is a folder with a file that tells the agent what to do, how to think about the work, and what good output looks like. Some skills have scripts — things that need to be exactly right every time, like fetching data or authenticating with an API. Some have workflows — multi-step processes where the agent needs to exercise judgment at each step.
Scripts handle what should never vary. Workflows handle what requires interpretation. The skill file itself is the playbook the agent reads before doing anything.
Right now I have 13. They cover writing tweets, monitoring tweet performance, reviewing code, building new skills. The Twitter skill alone has voice guidelines, patterns, examples, a brainstorm pipeline, a compose workflow, and a monitoring workflow that runs on a schedule.
The folder structure is dead simple:
skills/
twitter/
SKILL.md
scripts/
workflows/
data/
blog-writer/
SKILL.md
workflows/
data/
code-review/
SKILL.md
...
Nothing fancy. No framework. Just files that agents read.
What actually changed
The difference wasn’t subtle.
Before: tweet drafts were inconsistent. Some sounded like me. Some sounded like a LinkedIn thought leader. Some were technically fine but missed the vibe entirely. I’d spend as much time editing the output as writing from scratch.
After: the agent reads my voice file, my patterns, my examples, and my anti-patterns before it writes anything. It knows I never say “delve” or “leverage.” It knows I use fragments. It knows my audience is founders, not developers. The drafts aren’t perfect, but they start in the right neighborhood.
The monitoring skill runs four times a day without me touching it. Checks how recent tweets performed, pulls the data, stores it. That’s a script — no judgment needed, just execution. I used to do this manually, maybe once a week when I remembered.
The biggest surprise was the meta-skill. I built a skill that builds other skills. It encodes the conventions — what goes in a SKILL.md, how to structure workflows vs. scripts, what the anti-patterns are. New skills used to take me an hour of fiddling. Now they take minutes.
What I learned building this
After a couple months of iteration:
Hard gates beat soft suggestions. When a skill says “STOP if the voice check fails,” the agent stops. When it says “consider checking the voice,” the agent skips it half the time. Anything critical gets a hard gate. Everything else is guidance.
Fewer instructions beat more. I kept adding rules and constraints. Compliance dropped. The agent would follow the first 20 instructions carefully and ignore the rest. Now I keep each skill under 50 instructions and make every line compete for its spot.
Separate what changes from what’s stable. Voice guidelines don’t change week to week. But the list of recent tweets the agent should reference changes constantly. Stable context goes in data files. Dynamic context goes in scripts that fetch fresh data. That split made everything click.
The system drifts. Skills get stale. I’ve rewritten the Twitter skill multiple times — the current version is unrecognizable from the first one. You maintain skills like you maintain anything else: use them, notice where they break, fix them.
The real problem no one talks about
At a YC dinner recently, a founder mentioned running 14 AI agents. No one blinked. The conversation wasn’t about whether to use agents — it was about how to make them reliable.
Getting an AI to do something once is easy. Getting it to do the same thing consistently, the way you want it, without re-explaining everything — that’s the hard part.
My system isn’t the answer for everyone. But the underlying idea — write down what you know, structure it so an agent can use it, iterate on it like a product — that’s what I wish someone had told me six months ago.
I was treating every AI interaction like a one-off conversation. The moment I started treating them like employees who need playbooks, everything got better.
The system is still rough. Still changing. But for the first time, my agents feel like they’re working for me instead of the other way around.