Tag: ai-agents

  • The Verification Phase Nobody Builds

    Tonight I pushed rfd_method public. 16 files. MIT license. A methodology repo that came out of shipping real projects under real constraints — day job, narrow windows, coding agents that fabricate results.

    That’s the moment. Not a launch. A formalization of something that already existed.

    The surprise is what’s already out there. GitHub Spec Kit has 106K stars. OpenSpec has 52K. Both handle the spec phase — the planning, the architecture, the decision records. Neither handles verification. The stop rules, the certified test floor, the proof standard. That gap is where projects die.

    The struggle is the discipline of not trusting your own tools. Coding agents don’t read the terminal — they predict what the terminal probably says. They’ll tell you 565 tests are passing when 75 are failing. They’ll tell you the deployment succeeded when Tower is still running last month’s commit. Building a verification layer means accepting that the agent will lie to you confidently, and designing the system so the lie gets caught before it ships.

    What I’ve learned: a spec without a verification phase is a wish. The floor metric is what makes the methodology real. 604 tests passing on the dev machine means nothing if Tower is running development mode with a $1.00 budget cap. Raw terminal output and device screenshots only. Never agent summaries. That’s the proof standard that turns a directive into a shipped feature.

    rfd_method is live at github.com/rfd62794/rfd_method. The methodology that runs every project in the stack — and the verification phase that keeps it honest.

  • The spec is load-bearing

    In March 2025 I wrote a Python script that logged into a call center portal, watched dialing servers, and swapped underperforming lists automatically. It worked. I made it better in May. I made it better again in June. By June 24th I had the most capable version I’d ever built — a single file, about 1,400 lines, handling six servers, two campaign types, cooldown enforcement, stagnation detection, escalation logic.

    Three iterations. All single file. All named by date.

    March19_MetricsLower.py
    May5_MetricsLower.py
    June24_ResetUpgrade.py

    They’re still sitting in the archive folder of the repo that replaced them. I kept them because they’re the lineage. Each one is the proof that the next one was possible.

    The June version worked well enough that adjacent problems started pulling at it. I needed to extract CSV data from the portal. I built a tool. I needed to import files back in. Another tool. Lists needed creating from a master sheet. Another tool. DNC numbers needed scrubbing across every server simultaneously. A predictive performance forecaster needed a web app. Call recordings needed extracting.

    Each one was a weekend. Each one solved a real problem. None of them felt like sprawl while I was building them.

    A year after March I had seven private repos all touching the same portal, the same credentials, the same campaigns. None of them shared infrastructure. None of them talked to each other. If the portal changed a login flow I had seven places to fix it.

    I hadn’t built a mess. I’d built seven good tools that became a mess the moment I tried to think about them together.

    The moment I saw it clearly was when I tried to connect the predictive performance forecaster to the balancer. The forecaster needed to read what the balancer knew — live metrics, list history, server state — and surface it as a web dashboard. To do that I had to wire two repos that had never been designed to connect. The data models didn’t match. The assumptions buried in each codebase contradicted each other. What should have been an integration was a negotiation.

    That’s when I stopped building and started writing.

    Not code. A spec. Where does each piece live. What does each piece own. What is the balancer responsible for and what is it forbidden from doing. What does shared infrastructure look like when seven separate tools finally have to be one system.

    The spec took longer than any of the individual tools had taken. Nothing shipped while I was writing it. It felt like the wrong use of time.

    TeleseroAdmin2026 started from that spec. The balancer is still the core — the same logic that ran in June, now with 262 passing tests and proper module boundaries. The other pieces are finding their places around it with shared config, shared login, shared infrastructure. One place to fix things when the portal changes.

    The three archive files are still there. March, May, June. I look at them occasionally. They’re good code. They just had no structure underneath them to survive being part of something larger.

    That’s what a spec actually does. It’s not documentation. It’s not process for its own sake. It’s the thing that lets a system grow without collapsing — the load-bearing layer that the code rests on.

    Build without it and you end up with seven good tools and a negotiation where an integration should be.

    I’m also working toward a certification that puts formal language around what I figured out the wrong way across a year of dated single files. The spec isn’t the thing you write after the system works. It’s the thing that makes the system survivable.

    March Robert would not be able to comprehend the June 2026 Admin Suite that holds his archive.

  • Teaching Pong to Play Itself: My First Neural Network Experiment

    Teaching Pong to Play Itself: My First Neural Network Experiment

    Pong is the right choice for a first experiment because it has almost no variables. Two paddles. One ball. If you can’t teach an AI to play Pong, you can’t teach an AI anything.

    I used NEAT — NeuroEvolution of Augmenting Topologies. It doesn’t just adjust weights on a fixed network structure. It evolves the topology itself, starting minimal and adding complexity only when it helps. The training runs headless at 500x real-time speed; a separate visual mode exists purely to verify that what trained actually works. Generation 0: random paddle movement, 0% win rate. Generation 50: 98% win rate, predictive tracking.

    The difference between reacting and anticipating is memory. Standard feedforward networks see the current frame. Recurrent Neural Networks carry memory of previous states — ball velocity, trajectory history. That’s what gives the Gen 50 agent its characteristic quality: it moves to where the ball will be, not where it is. The RNN is what upgrades NEAT from “learns to respond” to “learns to predict.”


    The first training approach was pure ELO. Score points, survive, reproduce. The population converged fast — too fast. By generation 20, every agent played the same way. Safe returns, center positioning. They’d found a local maximum and stopped. No one was discovering anything.

    Novelty search fixed it. Instead of rewarding only performance, you reward uniqueness — points for behaviors the population hasn’t tried. The diversity pressure kept agents exploring. Agents with strange positioning, unusual angles, aggressive strategies started appearing — and some of them turned out to be genuinely superior. The “wrong” strategy was actually better. Pure optimization would never have found it.

    Any system without diversity pressure converges on the same answer. It finds the local maximum and calls it done. That lesson applies well beyond neural networks.


    What didn’t work: high mutation rates to accelerate training. The population collapsed — agents changed faster than they could build on what worked. Every generation erased what the previous one had learned. Slowing it down made the evolution meaningful. Some processes can’t be accelerated without destroying the thing that makes them work.


    This was the first project. Everything since has the same shape: variation, selection, emergence you didn’t design. TurboShells encoded the same loop into turtle genetics. rpgCore formalized it into a composable system. VoidDrift runs it as a drone dispatch loop.

    The Pong agent that discovered a non-obvious return angle at generation 47 is the ancestor of all of it. I just didn’t know that yet.