BullshitBench tests whether AI models can detect nonsensical questions—or if they’ll confidently answer them anyway. The results are dire.
BullshitBench tests whether AI models can detect nonsensical questions—or if they’ll confidently answer them anyway. The results are dire.