Why Alignment Verification Might Be Fundamentally Broken
We've known since 1936 that universal verification is impossible. Now we're trying it on AI systems that adapt to detection.
For any detector f, it is possible to construct a program g that can bypass or defeat it. Any alignment test becomes a signal that says, "Humans are watching."