Peter Damerow’s work on the history of cognition makes the problem precise. Galileo couldn’t have derived his incline plane results purely from first principles—not because he lacked intelligence, but because the cognitive tools required to think that thought didn’t yet exist. They had to be built through material engagement with the apparatus. The concepts weren’t waiting to be discovered by a sufficiently smart mind. They were partly constituted by the physical practice that generated them.
The Einstein Test assumes the opposite: that reasoning is substrate-independent, that a sufficiently capable system can derive results given sufficient raw intelligence. But if Damerow is right, the concepts themselves carry the history of their construction. Einstein’s key moves—reconceiving simultaneity, absorbing Mach’s critique of absolute space, working with the geometry that Riemann built—weren’t retrieved from some Platonic shelf. They were the residue of embodied, instrumental, social practice accumulated over decades.
Cut the knowledge off at 1911 and you’ve removed the labels. The loops that generated the concepts are still in the training data.
Which means a system that passes the Einstein Test hasn’t demonstrated reasoning from first principles. It’s demonstrated something more like very deep interpolation over a conceptual graph—impressive, possibly useful, but not what the test was supposed to show. A positive result wouldn’t confirm AGI. It would confirm that we designed the wrong test.