Cohere releases low-cost AI model that uses fewer chips

Cohere co-founder Nick Frosst said the company was able to achieve its results by focusing on building models that will be useful for business customers, as opposed to technology that can do everything and anything.Christopher Katsarov/The Globe and Mail

Artificial intelligence company Cohere Inc. has released a low-cost AI model that it says was built with fewer computational resources than similar offerings from its competitors, some of whom are spending billions of dollars on data centres and chips to support development.

The latest large language model from the Toronto-based company was built for less than US$30-million. Other companies are spending orders of magnitude more than that. Anthropic chief executive Dario Amodei has said that advanced LLMs can cost US$100-million to train, with the costs rapidly rising.

Evaluations provided by Cohere show the model, called Command A, is on par or better than leading models from U.S.-based OpenAI and Chinese company DeepSeek on some tasks, such as coding, answering technical questions and customer service assistance.

Cohere co-founder Nick Frosst said the company was able to achieve its results by focusing on building models that will be useful for business customers, as opposed to technology that can do everything and anything. “We’re training it to be good at the things that our customers want. By being focused on that, we’ve been able to be significantly more efficient than the other players,” he said.

Companies such as OpenAI and Anthropic are trying to develop artificial general intelligence, or AGI, a loosely defined term that refers to systems that are smarter than humans. To get there, these companies believe more computational power is needed. “The people who are saying AI is getting bigger and bigger are the people constantly saying they’re around the corner from AGI,” Mr. Frosst said. “That’s not our focus, nor is that my scientific belief.”

Cohere was founded in 2019 and builds LLMs that can produce and interpret text and can also be used to automate mundane corporate tasks. Generative AI took off in late 2022, when OpenAI released ChatGPT. Since then, the corporate world has become fixated on adopting the technology to capture productivity gains.

Some AI companies have an endless appetite for graphics processing units, the pricey computer chips that power AI models and applications. Elon Musk’s xAI, for example, built a facility consisting of 100,000 GPUs with plans to double that number. OpenAI, Oracle and others are investing some US$500-billion to build a massive AI supercomputer called Stargate.

In contrast, Cohere has access to around 8,500 GPUs, according to Mr. Frosst, who is proud to tout the company’s efficiency. “My mantra these days has been ROI not AGI,” he said. The company also does not have consumer-facing applications such as ChatGPT, which requires a lot of processing power.

Cohere used just 2,000 GPUs in the first phase of building Command A. For customers who want to deploy the model on their own computing infrastructure, they can do so with only two GPUs. Other models can require up to 32.

In January, DeepSeek caused a panicked sell-off in tech stocks after it released details on its generative AI models. The company said it had used just over 2,000 GPUs to build one of its models at a cost of only US$5.6-million, raising questions about the huge sums of money spent by competitors. The details left many industry players in disbelief, with some speculating that DeepSeek could have as many as 50,000 chips that it was not disclosing.

The training costs revealed by DeepSeek might not be the full picture. Building an AI model can take multiple attempts to get right, and DeepSeek’s price tag could refer only to the last try. “It’s not like you just start the process and you’re done. There’s a lot of potential issues on the way,” Gennady Pekhimenko, CEO of machine learning efficiency company CentML, previously told The Globe and Mail.

The US$30-million cost for Command A captures the entire training period. “That’s all the work that went into making it,” Mr. Frosst said.

The real test for success, however, will be whether businesses pay to use it. Cohere, which is also focused on making sure its models are fluent in multiple languages, has found interest outside of North America. It has already developed a Japanese-language LLM with Fujitsu, and recently partnered with LG CNS, the technology services unit of the South Korean conglomerate.

Follow related authors and topics

Interact with The Globe

Latest in

More

Follow related authors and topics

Interact with The Globe