Elon Musk’s AI venture, xAI, has announced that its latest large language model (LLM), Grok-3, has outperformed top AI models, including ChatGPT, Gemini, and DeepSeek, in a blind evaluation test. According to xAI’s internal analysis, Grok-3 has set a new record score on LMArena, a community-driven AI evaluation platform.
Grok-3 Achieves Record Scores in AI Evaluation
During a livestream on X (formerly Twitter) on Feb. 18, Musk and the xAI team introduced Grok-3, revealing that an early version of the model, codenamed “chocolate,” had been tested on LMArena. The platform, which ranks AI models through blind tests, recorded over a million votes from users comparing chatbot responses.
This is it: The world’s smartest AI, Grok 3, now available for free (until our servers melt).
— xAI (@xai) February 20, 2025
Try Grok 3 now: https://t.co/Tj0afLoxEz
X Premium+ and SuperGrok users will have increased access to Grok 3, in addition to early access to advanced features like Voice Mode pic.twitter.com/YgKavSCiWr
Grok-3 reportedly outperformed OpenAI’s GPT models (o3mini, o1), Deepseek-R1, and Google’s Gemini-2 Flash Thinking by at least 10 points in key areas, including math, science, and coding. In addition, it led across multiple performance categories, such as:
- Style control
- Complex prompts and multi-turn responses
- Creative writing and instruction following
- Coding and mathematical problem-solving
The model reached a milestone score of 1400, with Musk stating that it continues to improve.
Skepticism Surrounding Grok-3’s Ranking
While xAI is celebrating its new AI model’s dominance, LMArena has not independently verified whether Grok-3’s ranking represents a true breakthrough over its competitors. Questions remain about possible external influences, such as audience demographics or biases in the voting process.
Additionally, controversy emerged within xAI when an engineer, Benjamin DeKraker, resigned on Feb. 12 after refusing to delete an X post in which he had ranked Grok-3 lower than ChatGPT.
The ranking currently (my opinion), for code:
— Benjamin De Kraker (@BenjaminDEKR) February 8, 2025
ChatGPT o1-pro
o1
o3-mini
(all kind of tied)
Grok 3 (expected, tbd)
Claude 3.5 Sonnet
DeepSeek
GPT-4o
Grok 2
Gemini 2.0 Pro Series (might be higher, will probably move up)
DeKraker explained that he was given an ultimatum to retract his opinion or face termination, ultimately choosing to leave the company.
Part of me will forever be inside Grok
— Benjamin De Kraker (@BenjaminDEKR) February 16, 2025
Way, wayyyy up inside
Beyond AI benchmarks, Musk revealed xAI’s ambitious plans to integrate Grok into Tesla’s Optimus humanoid robots, aiming to send them on SpaceX’s upcoming Mars mission by the end of 2026. He highlighted that the next optimal Earth-Mars transit window falls in November 2026, presenting a critical opportunity for advancing robotic exploration.
“If all goes well, SpaceX will send Starship rockets to Mars with Optimus robots and Grok,” Musk stated, underscoring his long-term vision for AI-powered automation in space.

Disclaimer: All materials on this site are for informational purposes only. None of the material should be interpreted as investment advice. Please note that despite the nature of much of the material created and hosted on this website, HODL FM is not a financial reference resource and the opinions of authors and other contributors are their own and should not be taken as financial advice. If you require advice of this sort, HODL FM strongly recommends contacting a qualified industry professional.