Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 2 days ago

pimpampoom@lemmy.zip · 16 hours ago

They didn’t take into account the “thinking mode” most model pass when thinking is activated

Kyuuketsuki@sh.itjust.works · edit-2 14 hours ago

Sure they did. They even had a notation on the results table that grok passed expect when reasoning mode was off.

ETA: they even posted all the reasoning texts for the models they tested