Alibaba launches maths-specific AI models said to outperform LLMs from OpenAI, Google

admin
5 Min Read

“Over the past year, we have dedicated significant efforts to researching and enhancing the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems,” the Qwen team, part of Alibaba’s cloud computing unit, said in a post published on developer platform GitHub on Thursday. Alibaba owns the South China Morning Post.

The latest LLMs – the technology underpinning generative AI services like ChatGPT – were built on the Qwen2 LLMs released by Alibaba in June and covers three models based on their scale of parameters – a machine-learning term for variables present in an AI system during training, which helps establish how data prompts yield the desired output.

Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.

The model with the largest parameter count, Qwen2-Math-72B-Instruct, outperformed proprietary US-developed LLMs in maths benchmarks, according to the Qwen team’s post. Those included GPT-4o, Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro and Meta Platforms’ Llama-3.1-405B.

“We hope that Qwen2-Math can contribute to the community for solving complex mathematical problems,” the post said.

The family of Tongyi Qianwen, also known as Qwen, large language models from Alibaba Group Holding’s cloud computing unit, now includes maths-specific LLMs. Photo: Shutterstock alt=The family of Tongyi Qianwen, also known as Qwen, large language models from Alibaba Group Holding’s cloud computing unit, now includes maths-specific LLMs. Photo: Shutterstock>

The Qwen2-Math AI models were tested on both English and Chinese maths benchmarks, according to the post. These included GSM8K, a data set of 8,500 high-quality linguistically diverse grade school maths problems; OlympiadBench, a high-level bilingual multimodal scientific benchmark; and the gaokao, the mainland’s daunting university entrance examination.

The Qwen team said the new models still have some limitations owing to their “English-only support”. The plan, according to the team, is to shortly release bilingual models, with multilingual LLMs also in the development pipeline.

Alibaba’s maths-specific models further burnish the Hangzhou-based company’s AI credentials after its Qwen-72B-Instruct LLM recently led the world’s top 10 open-source model rankings.

Tongyi Qianwen has been open to third-party developers for over a year. Open source gives public access to a program’s source code, allowing third-party software developers to modify or share its design, fix broken links or scale up its capabilities.

The lofty recognition given to Alibaba Group Holding’s family of large language models shows the company’s rapid progress in artificial intelligence. Photo: Shutterstock alt=The lofty recognition given to Alibaba Group Holding’s family of large language models shows the company’s rapid progress in artificial intelligence. Photo: Shutterstock>

In July, Qwen2-72B-Instruct came in just behind GPT-4o and Claude 3.5 Sonnet in LLM rankings from SuperClue, a benchmarking platform that evaluates models based on metrics such as calculations, logic reasoning, coding, and text comprehension, among others.

The gap between Chinese and US AI models appears to be narrowing, according to SuperClue, which said the mainland has made significant progress in advancing domestic LLMs in the first half of this year.

A separate test published in July by LMSYS – an AI model research organisation supported by the University of California, Berkeley – saw Qwen2-72B ranked 20th, while proprietary models from OpenAI, Anthropic and Google took most of the top-10 slots.

This article originally appeared in the South China Morning Post (SCMP), the most authoritative voice reporting on China and Asia for more than a century. For more SCMP stories, please explore the SCMP app or visit the SCMP’s Facebook and Twitter pages. Copyright © 2024 South China Morning Post Publishers Ltd. All rights reserved.

Copyright (c) 2024. South China Morning Post Publishers Ltd. All rights reserved.

Share This Article
By admin
test bio
Please login to use this feature.