James Fishwick

Why Alignment Verification Might Be Fundamentally Broken

January 17, 2026

We've known since 1936 that universal verification is impossible. Now we're trying it on AI systems that adapt to detection.

For any detector f, you can build a program g that bypasses or defeats it. Any alignment test becomes a signal that says, "Humans are watching."

The Yard, The Sparkly Hat, and The Doomsday Clock

September 25, 2025

Most AI doom talk comes from titans of industry hyping their own power, or obscure nonprofits predicting apocalypse to keep the lights on. Three people who sit outside both camps caught my attention.

Freddie deBoer plays the skeptic, mocking the hype with his "Shitting-in-the-Yard Challenge." Scott Alexander, a rationalist, translates MIRI's doomsday math into metaphors like a toddler in a Ferrari. Then there's Daniel Kokotajlo, who walked away from millions in OpenAI equity to warn about a 2027 AGI arms race.

They don't agree on what's coming, but they converge on the same worry: institutions and incentives that aren't ready for what we're building. When three people with nothing to gain all say something's wrong here, even while they disagree on what, I pay attention.

Tag: ai-alignment

Blog Posts

Why Alignment Verification Might Be Fundamentally Broken

The Yard, The Sparkly Hat, and The Doomsday Clock