Elon Musk announces Grok-1.5, nearing GPT-4 level performance

2024/03/29 Innoverview Read

Mere weeks after open-sourcing Grok-1, Elon Musk’s xAI has announced an upgraded version of its proprietary large language model (LLM) — Grok-1.5.

Set to release next week, Grok-1.5 brings enhanced reasoning and problem-solving capabilities and closes in on the performance of known open and closed LLMs, including OpenAI’s GPT-4 and Anthropic’s Claude 3. It is also capable of processing long contexts but remains behind Gemini 1.5 Pro’s context window of up to 1 million tokens.

Musk noted that Grok-1.5 will power xAI’s ChatGPT-challenging chatbot on the X platform, while Grok-2, the successor of the new model, is still in the training phase. He said the next version should be able to “exceed current AI on all metrics” but did not share specifics of when it might become available.

Should be available on ? next week.

Grok 2 should exceed current AI on all metrics. In training now. 

— Elon Musk (@elonmusk) March 29, 2024

What does Grok-1.5 bring to the table?

xAI announced Grok-1 last November, saying that the AI has been modeled after “The Hitchhiker’s Guide to the Galaxy” and can answer almost anything to assist humanity in its quest for understanding and knowledge – regardless of background or political views. On benchmarks such as GSM8K, HumanEval and MMLU, shared by xAI, Grok-1 outperformed Llama-2-70B and GPT-3.5.

Now, with the release of Grok-1.5, the company is building on that work, delivering significant improvements over the previous model across all major benchmarks, including those related to coding and math-related tasks. 

“In our tests, Grok-1.5 achieved a 50.6% score on the MATH benchmark and a 90% score on the GSM8K benchmark, two math benchmarks covering a wide range of grade school to high school competition problems. Additionally, it scored 74.1% on the HumanEval benchmark, which evaluates code generation and problem-solving abilities,” xAI noted in a blog post

On the MMLU benchmark, which evaluates AI models’ language understanding capabilities across diverse tasks, the new model scored 81.3%, beating Grok-1’s 73% by a significant margin. 

Beyond this, xAI also confirmed that Grok-1.5 has a context window of up to 128,000 tokens (tokens are entire parts or subsections of words, images, videos, audio or code). This allows the model to take in and process vast amounts of information in one go – 16 times more than Grok-1, making it more suitable for analyzing, summarizing and extracting information from long documents. It can even handle longer and more complex prompts while still maintaining the instruction-following capability.

Closing in on OpenAI and Anthropic

With enhanced reasoning and problem-solving capabilities, Grok-1.5 not only outperforms its predecessor on benchmarks but also closes in on popular open and closed-source models out there, including Gemini 1.5 Pro, GPT-4 and Claude 3.

For instance, on MMLU, Grok-1.5’s score of 81.3% beats the recently introduced Mistral Large but falls behind Gemini 1.5 Pro (83.7%), GPT-4 (86.4%, as of March 2023), and Claude 3 Opus (86.8%). A similar gap was noted on the GSM8K benchmark, with the xAI model sitting just behind the offerings from Google, OpenAI and Anthropic.

Notably, the only benchmark where Grok-1.5 seemed to have an edge was HumanEval, where it outperformed all models except Claude 3 Opus. xAI expects to continue these improvements and deliver further performance gains with Grok-2, which, according to Musk, should exceed current AI on all metrics. The model is being trained at present.

Brian Roemmele, a tech consultant, said that based on his work with Grok-1, Grok-2 “will be one of the most powerful LLM AI platforms when it is released. It will surpass OpenAI on just about every metric.”

? Based on my research of open source Grok-1, I am confident in saying that Grok-2 will be one of the most powerful LLM AI platforms when it is released. It will surpass OpenAI on just about every metric.

— Brian Roemmele (@BrianRoemmele) March 29, 2024

Availability of Grok-1.5

As for Grok-1.5, xAI plans to start deployment next week. The company says that the model will initially become available to early testers and those already using the Grok chatbot on the X platform (Twitter) – with real-time access to all posts on the platform. The rollout will be phased, with the company improving the model and introducing several new features – probably including a new unhinged fun mode – while gradually making it available to a wider set of users.

Grok has normal mode and fun mode. Tonight, we decided to add an unhinged fun mode. It is next-level ??

— Elon Musk (@elonmusk) March 27, 2024

When Musk made Grok available on X, it was seen as a move to drive up adoption for both Grok and X. He started by making the AI available as part of the platform’s ‘Premium+’ subscription priced at $16 per month. However, just a few days back, the billionaire shared that the chatbot will also be enabled for all Premium subscribers paying $8 per month. In another update, he also confirmed that followers with a certain level of verified subscriber followers will get Premium and Premium+ subscription benefits, including Grok, for free.

(Copyright: VentureBeat Elon Musk announces Grok-1.5, nearing GPT-4 level performance | VentureBeat)