Breadcrumb
Brett Adcock's Challenge
The Challenge
Brett Adcock put out a 30 step web challenge. Each step is a unique UI puzzle, stacked popups that need to be dismissed, forms with radio buttons hidden below the fold, decoy buttons that do nothing, cookie banners, code entry fields where the code is buried in the DOM. All under time pressure.
The idea is simple. Can you build something that completes all 30 steps by looking at the screen and taking actions like a human would? No DOM access, no selectors, no cheating. Just vision.
I saw it and immediately knew this was the kind of problem I wanted to throw myself at. Not because the challenge itself is the end goal, but because solving it means building a fully visual computer use agent. Something that sees pixels and produces actions. The challenge is just the benchmark, a controlled environment to validate against. The real product is an agent that can operate any GUI the same way a human does.
This is the intersection of everything I have been wanting to learn. Vision language models, reinforcement learning, training infrastructure, robotics inspired architectures. It is a problem that requires you to understand all of it, not just use an API and prompt engineer your way through.
I have no idea if I can pull it off. But that is kind of the point.