Elon Musk’s AI venture, xAI, has announced that its latest large language model (LLM), Grok-3, has outperformed top AI models, including ChatGPT, Gemini, and DeepSeek, in a blind evaluation test. According to xAI’s internal analysis, Grok-3 has set a new record score on LMArena, a community-driven AI evaluation platform.

Grok-3 Achieves Record Scores in AI Evaluation

During a livestream on X (formerly Twitter) on Feb. 18, Musk and the xAI team introduced Grok-3, revealing that an early version of the model, codenamed “chocolate,” had been tested on LMArena. The platform, which ranks AI models through blind tests, recorded over a million votes from users comparing chatbot responses.

Grok-3 reportedly outperformed OpenAI’s GPT models (o3mini, o1), Deepseek-R1, and Google’s Gemini-2 Flash Thinking by at least 10 points in key areas, including math, science, and coding. In addition, it led across multiple performance categories, such as:

  • Style control
  • Complex prompts and multi-turn responses
  • Creative writing and instruction following
  • Coding and mathematical problem-solving

The model reached a milestone score of 1400, with Musk stating that it continues to improve.

Skepticism Surrounding Grok-3’s Ranking

While xAI is celebrating its new AI model’s dominance, LMArena has not independently verified whether Grok-3’s ranking represents a true breakthrough over its competitors. Questions remain about possible external influences, such as audience demographics or biases in the voting process.

Additionally, controversy emerged within xAI when an engineer, Benjamin DeKraker, resigned on Feb. 12 after refusing to delete an X post in which he had ranked Grok-3 lower than ChatGPT.

DeKraker explained that he was given an ultimatum to retract his opinion or face termination, ultimately choosing to leave the company.

Beyond AI benchmarks, Musk revealed xAI’s ambitious plans to integrate Grok into Tesla’s Optimus humanoid robots, aiming to send them on SpaceX’s upcoming Mars mission by the end of 2026. He highlighted that the next optimal Earth-Mars transit window falls in November 2026, presenting a critical opportunity for advancing robotic exploration.

“If all goes well, SpaceX will send Starship rockets to Mars with Optimus robots and Grok,” Musk stated, underscoring his long-term vision for AI-powered automation in space.

Welcome Grok: Elon Musk’s AI Chatbot | HODL FM
Elon Musk’s AI chatbot Grok promises to outperform ChatGPT, heralding a new era of conversational AI.
hodl-post-image

Disclaimer: All materials on this site are for informational purposes only. None of the material should be interpreted as investment advice. Please note that despite the nature of much of the material created and hosted on this website, HODL FM is not a financial reference resource and the opinions of authors and other contributors are their own and should not be taken as financial advice. If you require advice of this sort, HODL FM strongly recommends contacting a qualified industry professional.