Tag: sdd

  • The Verification Phase Nobody Builds

    Tonight I pushed rfd_method public. 16 files. MIT license. A methodology repo that came out of shipping real projects under real constraints — day job, narrow windows, coding agents that fabricate results.

    That’s the moment. Not a launch. A formalization of something that already existed.

    The surprise is what’s already out there. GitHub Spec Kit has 106K stars. OpenSpec has 52K. Both handle the spec phase — the planning, the architecture, the decision records. Neither handles verification. The stop rules, the certified test floor, the proof standard. That gap is where projects die.

    The struggle is the discipline of not trusting your own tools. Coding agents don’t read the terminal — they predict what the terminal probably says. They’ll tell you 565 tests are passing when 75 are failing. They’ll tell you the deployment succeeded when Tower is still running last month’s commit. Building a verification layer means accepting that the agent will lie to you confidently, and designing the system so the lie gets caught before it ships.

    What I’ve learned: a spec without a verification phase is a wish. The floor metric is what makes the methodology real. 604 tests passing on the dev machine means nothing if Tower is running development mode with a $1.00 budget cap. Raw terminal output and device screenshots only. Never agent summaries. That’s the proof standard that turns a directive into a shipped feature.

    rfd_method is live at github.com/rfd62794/rfd_method. The methodology that runs every project in the stack — and the verification phase that keeps it honest.

  • The spec is load-bearing

    In March 2025 I wrote a Python script that logged into a call center portal, watched dialing servers, and swapped underperforming lists automatically. It worked. I made it better in May. I made it better again in June. By June 24th I had the most capable version I’d ever built — a single file, about 1,400 lines, handling six servers, two campaign types, cooldown enforcement, stagnation detection, escalation logic.

    Three iterations. All single file. All named by date.

    March19_MetricsLower.py
    May5_MetricsLower.py
    June24_ResetUpgrade.py

    They’re still sitting in the archive folder of the repo that replaced them. I kept them because they’re the lineage. Each one is the proof that the next one was possible.

    The June version worked well enough that adjacent problems started pulling at it. I needed to extract CSV data from the portal. I built a tool. I needed to import files back in. Another tool. Lists needed creating from a master sheet. Another tool. DNC numbers needed scrubbing across every server simultaneously. A predictive performance forecaster needed a web app. Call recordings needed extracting.

    Each one was a weekend. Each one solved a real problem. None of them felt like sprawl while I was building them.

    A year after March I had seven private repos all touching the same portal, the same credentials, the same campaigns. None of them shared infrastructure. None of them talked to each other. If the portal changed a login flow I had seven places to fix it.

    I hadn’t built a mess. I’d built seven good tools that became a mess the moment I tried to think about them together.

    The moment I saw it clearly was when I tried to connect the predictive performance forecaster to the balancer. The forecaster needed to read what the balancer knew — live metrics, list history, server state — and surface it as a web dashboard. To do that I had to wire two repos that had never been designed to connect. The data models didn’t match. The assumptions buried in each codebase contradicted each other. What should have been an integration was a negotiation.

    That’s when I stopped building and started writing.

    Not code. A spec. Where does each piece live. What does each piece own. What is the balancer responsible for and what is it forbidden from doing. What does shared infrastructure look like when seven separate tools finally have to be one system.

    The spec took longer than any of the individual tools had taken. Nothing shipped while I was writing it. It felt like the wrong use of time.

    TeleseroAdmin2026 started from that spec. The balancer is still the core — the same logic that ran in June, now with 262 passing tests and proper module boundaries. The other pieces are finding their places around it with shared config, shared login, shared infrastructure. One place to fix things when the portal changes.

    The three archive files are still there. March, May, June. I look at them occasionally. They’re good code. They just had no structure underneath them to survive being part of something larger.

    That’s what a spec actually does. It’s not documentation. It’s not process for its own sake. It’s the thing that lets a system grow without collapsing — the load-bearing layer that the code rests on.

    Build without it and you end up with seven good tools and a negotiation where an integration should be.

    I’m also working toward a certification that puts formal language around what I figured out the wrong way across a year of dated single files. The spec isn’t the thing you write after the system works. It’s the thing that makes the system survivable.

    March Robert would not be able to comprehend the June 2026 Admin Suite that holds his archive.

  • I Built a CLI to Replace Expensive AI Directive Generation

    I Built a CLI to Replace Expensive AI Directive Generation

    The friction point is simple to describe. Claude designs the architecture. Windsurf builds it. The directive that connects them — structured, scoped, phase-gated — gets written by me, by hand, every single time.

    I’m the middleware. I built a tool to replace myself. It didn’t quite work.


    OpenAgent started from a real observation: the same context was being re-explained in every session. I had architectural patterns, stop rules, test floor conventions — and every new Windsurf session, the agent had no idea any of it existed. The directive was the missing connective tissue. Write it well and the agent stays on scope. Write it badly and the agent invents its own architecture.

    So I built a CLI that reads the codebase, understands the structure, and generates directives shaped to my development style. The breakthrough was SOUL.md — eight questions about how I actually work. That profile gets embedded in every directive. When OpenAgent generates something, it references the right conventions, names the right stop conditions. It sounds like something I’d write.

    It’s on PyPI as openagent-directive. v0.2.2. 103 passing tests.


    Here’s the part I didn’t put in the README: I still do the same manual cycle.

    The tool works. The directives it generates are useful. But I’m still the one handing them to Windsurf, watching the session, course-correcting when it goes sideways. The friction I wanted to eliminate is still there because the real blocker isn’t the directive — it’s that there’s no coding agent that lives outside an IDE.

    Windsurf, Cursor, Copilot — they all require a human in the seat. The autonomous loop I wanted, where OpenAgent feeds a directive to a coding agent that executes independently, reports back, and waits for the next one, doesn’t exist yet in any reliable form. The IDE-bound constraint kills the automation before it starts.

    I built a correct solution to the wrong layer of the problem.


    The pivot I keep thinking about: OpenAgent as an MCP tool. Not a CLI that generates directives for humans to hand off, but a codebase intelligence layer that a coding agent can query directly. What files are in scope? What’s the test floor? What patterns does this codebase use? An agent with access to that context doesn’t need a human to write the directive — it can construct its own.

    That version of OpenAgent is waiting on the ecosystem. When a capable coding agent exists that can operate outside an IDE, receive a structured task, execute against a real codebase, and return proof — OpenAgent is already positioned to be the interpreter it needs.

    For now it’s a CLI on PyPI that saves me twenty minutes per directive and reminds me that some problems can’t be fully solved until the infrastructure around them catches up.

    The friction is still there. The tool is ready when it isn’t.