Grok 4: xAI’s New Model Reached a Surprisingly Weak Result in a Test!

TECH NEWS – The xAI Grok 4 appears to be designed for success in AI performance tests, but it struggles with dynamic, strategic challenges. Grok 4 recently placed fifth in the multi-agent Step Race benchmark, which uses New York Times Connections puzzles to evaluate different AI models’ performance. Even Gemini 2.5 Flash performed better than Grok 4!

 

Given Grok 4’s high scores on various standardized benchmarks, one might assume that the model was gamed through overfitting to perform well on benchmarks. Overfitting occurs when the model loads training data instead of capturing important patterns within the dataset.

This does not mean, however, that xAI Grok 4 is not a useful model. After all, its reasoning capabilities have dramatically improved. It outperforms almost all other models in identifying coding errors and bugs. People are also using the large language model (LLM) to create game code and transpose it to Cursor. However, the model is still not as capable as Elon Musk would have us believe. It’s worth looking at the betting platform Kakshi, where Grok 4 has attracted only medium stakes so far.

Meanwhile, the Financial Times recently reported that xAI, Twitter’s parent company, is targeting a $200 billion valuation in an upcoming funding round. Notably, xAI raised $300 million in June through a secondary equity offering and an additional $10 billion in early July. SpaceX is reportedly investing an additional $2 billion in xAI from a recent $5 billion funding round. (How is it legal for Musk to invest in himself anyway?)

Finally, it seems that Elon Musk is paving the way for Tesla to take a stake in xAI, putting an end to the “hot potato” game of funding between the various Musk-linked entities.

Source: WCCFTech, Github

Avatar photo
theGeek is here since 2019.

No comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.