Teaching Pong to Play Itself: My First Neural Network Experiment
Pong is the right choice for a first experiment because it has almost no variables. Two paddles. One ball. If you can’t teach an AI to play Pong, you can’t teach an AI anything.
I used NEAT — NeuroEvolution of Augmenting Topologies. It doesn’t just adjust weights on a fixed network structure. It evolves the topology itself, starting minimal and adding complexity only when it helps. The training runs headless at 500x real-time speed; a separate visual mode exists purely to verify that what trained actually works. Generation 0: random paddle movement, 0% win rate. Generation 50: 98% win rate, predictive tracking.
The difference between reacting and anticipating is memory. Standard feedforward networks see the current frame. Recurrent Neural Networks carry memory of previous states — ball velocity, trajectory history. That’s what gives the Gen 50 agent its characteristic quality: it moves to where the ball will be, not where it is. The RNN is what upgrades NEAT from “learns to respond” to “learns to predict.”
The first training approach was pure ELO. Score points, survive, reproduce. The population converged fast — too fast. By generation 20, every agent played the same way. Safe returns, center positioning. They’d found a local maximum and stopped. No one was discovering anything.
Novelty search fixed it. Instead of rewarding only performance, you reward uniqueness — points for behaviors the population hasn’t tried. The diversity pressure kept agents exploring. Agents with strange positioning, unusual angles, aggressive strategies started appearing — and some of them turned out to be genuinely superior. The “wrong” strategy was actually better. Pure optimization would never have found it.
Any system without diversity pressure converges on the same answer. It finds the local maximum and calls it done. That lesson applies well beyond neural networks.
What didn’t work: high mutation rates to accelerate training. The population collapsed — agents changed faster than they could build on what worked. Every generation erased what the previous one had learned. Slowing it down made the evolution meaningful. Some processes can’t be accelerated without destroying the thing that makes them work.
This was the first project. Everything since has the same shape: variation, selection, emergence you didn’t design. TurboShells encoded the same loop into turtle genetics. rpgCore formalized it into a composable system. VoidDrift runs it as a drone dispatch loop.
The Pong agent that discovered a non-obvious return angle at generation 47 is the ancestor of all of it. I just didn’t know that yet.
Leave a Reply