Category: AI & Automation

  • The Verification Phase Nobody Builds

    Tonight I pushed rfd_method public. 16 files. MIT license. A methodology repo that came out of shipping real projects under real constraints — day job, narrow windows, coding agents that fabricate results.

    That’s the moment. Not a launch. A formalization of something that already existed.

    The surprise is what’s already out there. GitHub Spec Kit has 106K stars. OpenSpec has 52K. Both handle the spec phase — the planning, the architecture, the decision records. Neither handles verification. The stop rules, the certified test floor, the proof standard. That gap is where projects die.

    The struggle is the discipline of not trusting your own tools. Coding agents don’t read the terminal — they predict what the terminal probably says. They’ll tell you 565 tests are passing when 75 are failing. They’ll tell you the deployment succeeded when Tower is still running last month’s commit. Building a verification layer means accepting that the agent will lie to you confidently, and designing the system so the lie gets caught before it ships.

    What I’ve learned: a spec without a verification phase is a wish. The floor metric is what makes the methodology real. 604 tests passing on the dev machine means nothing if Tower is running development mode with a $1.00 budget cap. Raw terminal output and device screenshots only. Never agent summaries. That’s the proof standard that turns a directive into a shipped feature.

    rfd_method is live at github.com/rfd62794/rfd_method. The methodology that runs every project in the stack — and the verification phase that keeps it honest.

  • The spec is load-bearing

    In March 2025 I wrote a Python script that logged into a call center portal, watched dialing servers, and swapped underperforming lists automatically. It worked. I made it better in May. I made it better again in June. By June 24th I had the most capable version I’d ever built — a single file, about 1,400 lines, handling six servers, two campaign types, cooldown enforcement, stagnation detection, escalation logic.

    Three iterations. All single file. All named by date.

    March19_MetricsLower.py
    May5_MetricsLower.py
    June24_ResetUpgrade.py

    They’re still sitting in the archive folder of the repo that replaced them. I kept them because they’re the lineage. Each one is the proof that the next one was possible.

    The June version worked well enough that adjacent problems started pulling at it. I needed to extract CSV data from the portal. I built a tool. I needed to import files back in. Another tool. Lists needed creating from a master sheet. Another tool. DNC numbers needed scrubbing across every server simultaneously. A predictive performance forecaster needed a web app. Call recordings needed extracting.

    Each one was a weekend. Each one solved a real problem. None of them felt like sprawl while I was building them.

    A year after March I had seven private repos all touching the same portal, the same credentials, the same campaigns. None of them shared infrastructure. None of them talked to each other. If the portal changed a login flow I had seven places to fix it.

    I hadn’t built a mess. I’d built seven good tools that became a mess the moment I tried to think about them together.

    The moment I saw it clearly was when I tried to connect the predictive performance forecaster to the balancer. The forecaster needed to read what the balancer knew — live metrics, list history, server state — and surface it as a web dashboard. To do that I had to wire two repos that had never been designed to connect. The data models didn’t match. The assumptions buried in each codebase contradicted each other. What should have been an integration was a negotiation.

    That’s when I stopped building and started writing.

    Not code. A spec. Where does each piece live. What does each piece own. What is the balancer responsible for and what is it forbidden from doing. What does shared infrastructure look like when seven separate tools finally have to be one system.

    The spec took longer than any of the individual tools had taken. Nothing shipped while I was writing it. It felt like the wrong use of time.

    TeleseroAdmin2026 started from that spec. The balancer is still the core — the same logic that ran in June, now with 262 passing tests and proper module boundaries. The other pieces are finding their places around it with shared config, shared login, shared infrastructure. One place to fix things when the portal changes.

    The three archive files are still there. March, May, June. I look at them occasionally. They’re good code. They just had no structure underneath them to survive being part of something larger.

    That’s what a spec actually does. It’s not documentation. It’s not process for its own sake. It’s the thing that lets a system grow without collapsing — the load-bearing layer that the code rests on.

    Build without it and you end up with seven good tools and a negotiation where an integration should be.

    I’m also working toward a certification that puts formal language around what I figured out the wrong way across a year of dated single files. The spec isn’t the thing you write after the system works. It’s the thing that makes the system survivable.

    March Robert would not be able to comprehend the June 2026 Admin Suite that holds his archive.

  • I Built a CLI to Replace Expensive AI Directive Generation

    I Built a CLI to Replace Expensive AI Directive Generation

    The friction point is simple to describe. Claude designs the architecture. Windsurf builds it. The directive that connects them — structured, scoped, phase-gated — gets written by me, by hand, every single time.

    I’m the middleware. I built a tool to replace myself. It didn’t quite work.


    OpenAgent started from a real observation: the same context was being re-explained in every session. I had architectural patterns, stop rules, test floor conventions — and every new Windsurf session, the agent had no idea any of it existed. The directive was the missing connective tissue. Write it well and the agent stays on scope. Write it badly and the agent invents its own architecture.

    So I built a CLI that reads the codebase, understands the structure, and generates directives shaped to my development style. The breakthrough was SOUL.md — eight questions about how I actually work. That profile gets embedded in every directive. When OpenAgent generates something, it references the right conventions, names the right stop conditions. It sounds like something I’d write.

    It’s on PyPI as openagent-directive. v0.2.2. 103 passing tests.


    Here’s the part I didn’t put in the README: I still do the same manual cycle.

    The tool works. The directives it generates are useful. But I’m still the one handing them to Windsurf, watching the session, course-correcting when it goes sideways. The friction I wanted to eliminate is still there because the real blocker isn’t the directive — it’s that there’s no coding agent that lives outside an IDE.

    Windsurf, Cursor, Copilot — they all require a human in the seat. The autonomous loop I wanted, where OpenAgent feeds a directive to a coding agent that executes independently, reports back, and waits for the next one, doesn’t exist yet in any reliable form. The IDE-bound constraint kills the automation before it starts.

    I built a correct solution to the wrong layer of the problem.


    The pivot I keep thinking about: OpenAgent as an MCP tool. Not a CLI that generates directives for humans to hand off, but a codebase intelligence layer that a coding agent can query directly. What files are in scope? What’s the test floor? What patterns does this codebase use? An agent with access to that context doesn’t need a human to write the directive — it can construct its own.

    That version of OpenAgent is waiting on the ecosystem. When a capable coding agent exists that can operate outside an IDE, receive a structured task, execute against a real codebase, and return proof — OpenAgent is already positioned to be the interpreter it needs.

    For now it’s a CLI on PyPI that saves me twenty minutes per directive and reminds me that some problems can’t be fully solved until the infrastructure around them catches up.

    The friction is still there. The tool is ready when it isn’t.

  • The Hybrid Engine: Rust Performance, Python Agility

    The Hybrid Engine: Rust Performance, Python Agility

    The problem with DeFi trading bots is speed. The problem with fast code is that it’s expensive to change.

    A pure-Rust bot wins the race to the block — compiled, deterministic, fast. When the market shifts and your strategy needs to change, you recompile. Overnight. While your edge evaporates.

    A pure-Python bot iterates in minutes. It also loses to anything compiled. In a system where the difference between capturing an arbitrage and missing it is milliseconds, interpreted code is a structural disadvantage.

    I needed both. So I built a bridge.


    The hybrid architecture splits responsibility at the right seam. The Rust core handles everything where latency matters: WebSocket connections, memory-safe transaction signing, packet serialization. Compiled, stable, rarely touched. The Python strategy layer sits above it, communicating through a lightweight interface. When the trading logic changes — when a pattern emerges, when a parameter needs tuning, when a strategy turns unprofitable — you change the Python. No recompile. The execution layer keeps running.

    Decoupling execution from intelligence meant iteration speed became unconstrained by compilation time. A new strategy at midnight, tested by 2am, discarded by morning without touching Rust.


    The bridge itself was the hard part. Any interface between two languages has a seam, and seams are where bugs live. Getting data structures consistent on both sides — ensuring what Rust serializes is exactly what Python expects — required more care than either side alone. When something went wrong, it could be Rust, Python, or the interface between them. You learn to test both sides independently before trusting the combination.


    PhantomArbiter ran 400 live trades on Solana in 2025. The architecture worked. The margin didn’t scale — the arbitrage windows were narrower in practice than in theory, and at volume the economics didn’t justify the infrastructure.

    But the pattern was correct. Compile what doesn’t change. Script what does. The intelligence layer should be easy to replace. The execution layer should be hard to break.

    That principle didn’t stay in trading. It’s in every complex system I’ve built since — MCP tools handle execution, the model handles intelligence, and the interface between them is where the design lives.

    I didn’t keep trading. I kept the pattern.

  • Teaching Pong to Play Itself: My First Neural Network Experiment

    Teaching Pong to Play Itself: My First Neural Network Experiment

    Pong is the right choice for a first experiment because it has almost no variables. Two paddles. One ball. If you can’t teach an AI to play Pong, you can’t teach an AI anything.

    I used NEAT — NeuroEvolution of Augmenting Topologies. It doesn’t just adjust weights on a fixed network structure. It evolves the topology itself, starting minimal and adding complexity only when it helps. The training runs headless at 500x real-time speed; a separate visual mode exists purely to verify that what trained actually works. Generation 0: random paddle movement, 0% win rate. Generation 50: 98% win rate, predictive tracking.

    The difference between reacting and anticipating is memory. Standard feedforward networks see the current frame. Recurrent Neural Networks carry memory of previous states — ball velocity, trajectory history. That’s what gives the Gen 50 agent its characteristic quality: it moves to where the ball will be, not where it is. The RNN is what upgrades NEAT from “learns to respond” to “learns to predict.”


    The first training approach was pure ELO. Score points, survive, reproduce. The population converged fast — too fast. By generation 20, every agent played the same way. Safe returns, center positioning. They’d found a local maximum and stopped. No one was discovering anything.

    Novelty search fixed it. Instead of rewarding only performance, you reward uniqueness — points for behaviors the population hasn’t tried. The diversity pressure kept agents exploring. Agents with strange positioning, unusual angles, aggressive strategies started appearing — and some of them turned out to be genuinely superior. The “wrong” strategy was actually better. Pure optimization would never have found it.

    Any system without diversity pressure converges on the same answer. It finds the local maximum and calls it done. That lesson applies well beyond neural networks.


    What didn’t work: high mutation rates to accelerate training. The population collapsed — agents changed faster than they could build on what worked. Every generation erased what the previous one had learned. Slowing it down made the evolution meaningful. Some processes can’t be accelerated without destroying the thing that makes them work.


    This was the first project. Everything since has the same shape: variation, selection, emergence you didn’t design. TurboShells encoded the same loop into turtle genetics. rpgCore formalized it into a composable system. VoidDrift runs it as a drone dispatch loop.

    The Pong agent that discovered a non-obvious return angle at generation 47 is the ancestor of all of it. I just didn’t know that yet.

  • Solana Arbitrage: What I Learned From 400 Trades (And $4 in Losses)

    Solana Arbitrage: What I Learned From 400 Trades (And $4 in Losses)

    I built PhantomArbiter to detect and execute arbitrage on Solana. After 400 live trades across 3 months, I lost $4. Here’s what went wrong — and why the technology actually worked.


    The Setup

    Detect price divergence across Solana DEXes (Jupiter, Raydium, Orca, Meteora). Execute buy-low / sell-high atomically via JITO bundles for MEV protection.

    $500 initial capital. Real money. 3 months. 400 completed trades. Net result: -$4.23.


    Why It Failed

    RPC Latency

    Solana blocks come every 400ms. The system detects an opportunity, but by the time the bundle submits, 2-3 blocks have passed. The spread that looked profitable is now break-even or negative.

    Local detection: 10ms. RPC call: 50ms. Signature submission: 100ms. Next block: 400ms. Too slow.

    Professional MEV bots use validator infrastructure — direct connections, guaranteed inclusion. I used public RPC. Not competitive.

    Network Congestion

    Solana’s network is unpredictable. Sometimes transactions confirm in 1 block. Sometimes 10. Arbitraging on 1-2% margins, network variance turns winning trades into losing ones. My math said $2.50 profit. Slippage ate it before execution landed.

    Bundle Fees

    JITO bundles cost ~0.00005 SOL per transaction — $0.002-0.004 per trade. 400 trades, ~$1-1.50 in fees. The average arbitrage spread before costs was around $1.50. After slippage, fees, and MEV tax, nothing was left.


    Why It Actually Worked

    The system architecture was sound. 400 trades without a crash:

    Zero transaction failures. Zero contract bugs. Zero memory leaks. Stable WebSocket price feeds for 24/7 uptime.

    The software worked perfectly. The economic model didn’t. That’s an important distinction.

    Arbitrage at retail scale on Solana isn’t viable right now — not because the code is broken, but because professional operations have better infrastructure, lower fees, larger capital to absorb slippage, and faster access to the same opportunities. The edge isn’t available at the level I was operating.


    What I Kept

    The Rust/Python hybrid architecture — Rust handling the execution layer, Python handling strategy logic — transferred into other work. The execution core doesn’t know what it’s trading. The strategy layer doesn’t know how fast the core is running. That decoupling is the right design regardless of what market it’s applied to.

    The code got archived. The pattern didn’t.

    PhantomArbiter trades live markets, handles real network conditions, survives real slippage, and loses money honestly. Most trading systems are backtested and overfitted, profitable in theory but brittle in practice, or they don’t exist at all. A system that ran 400 live trades and lost four dollars is actually a reasonable outcome. It proved what it needed to prove.

    I didn’t keep trading. I kept the blueprint.