News
An AI researcher put leading AI models to the test in a game of Diplomacy. Here's how the models fared.
OpenAI's o3: The researcher called the reasoning-focused model “a master of deception.” It is said to have won the most number of games, primarily owing to its ability to deceive opponents. In one ...
a livestream of Anthropic’s newest AI model, Claude 3.7 Sonnet, playing a game of Pokémon Red. It’s become a fascinating experiment of sorts, showcasing the capabilities of today’s AI tech ...
Claude 3.7 Sonnet has gone much further in the game than previous Claude models, so there's been progress. However, for those warning that AI will soon be able to take over the world, we're ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results