A New Chip Cluster Will Make Massive AI Models Possible

When it comes to the neural networks that power today’s artificial intelligence, sometimes the bigger they are, the smarter they are, too. Recent leaps in machine understanding of language, for example, have hinged on building some of the most enormous AI models ever and stuffing them with huge gobs of text. A new cluster of computer chips could now help these networks grow to almost unimaginable size—and show whether going ever larger may unlock further AI advances, not only in language understanding, but perhaps also in areas like robotics and computer vision.

Cerebras Systems, a startup that has already built the world’s largest computer chip, has now developed technology that lets a cluster of those chips run AI models that are more than a hundred times bigger than the most gargantuan ones around today.

Cerebras says it can now run a neural network with 120 trillion connections, mathematical simulations of the interplay between biological neurons and synapses. The largest AI models in existence today have about a trillion connections, and cost many millions of dollars to build and train. But Cerebras says its hardware will run calculations in about a 50th of the time of existing hardware. Its chip cluster, along with power and cooling requirements, presumably still won’t come cheap, but Cerberas at least claims its tech will be substantially more efficient.

Courtesy of Cerebras

“We built it with synthetic parameters,” says Andrew Feldman, founder and CEO of Cerebras, who will present details of the tech at a chip conference this week. “So we know we can, but we haven’t trained a model, because we’re infrastructure builders and, well, there is no model yet” of that size, he adds.

Today, most AI programs are trained using GPUs, a type of chip originally designed for generating computer graphics but also well-suited for the parallel processing neural networks require. Large AI models are essentially divided up across dozens or hundreds of GPUs, connected using high-speed wiring.

GPU’s still make sense for AI, but as models get larger and companies look for an edge more specialized designs may find their niches. Recent advances and commercial interest has sparked a Cambrian explosion in new chip designs specialized for AI. The Cerebras chip is an intriguing part of that evolution. While normal semiconductor designers split a wafer into pieces to make individual chips, Cerebras packs in much more computational power by using the entire thing, having its many computational units, or cores, talk to each other more efficiently. A GPU typically has a few hundred cores, but Cerebras’s latest chip, called the Wafer Scale Engine Two (WSE-2), has 850,000 of them.

The design can run a big neural network more efficiently than banks of GPUs wired together. But manufacturing and running the chip is a challenge, requiring new methods for etching silicon features, a design that includes redundancies to account for manufacturing flaws, and a novel water system to keep the giant chip chilled.

To build a cluster of WSE-2 chips capable of running AI models of record size, Cerebras had to solve another engineering challenge: how to get data in and out of the chip efficiently. Regular chips have their own memory on-board, but Cerebras developed an off-chip memory box called MemoryX. The company also created software that allows a neural network to be partially stored in that off-chip memory, with only the computations shuttled over to the silicon chip. And it built a hardware and software system called SwarmX that wires everything together.

Photograph: Cerebras

“They can improve the scalability of training to huge dimensions beyond what anybody is doing today,” says Mike Demler, a senior analyst with The Linley Group and a senior editor of The Microprocessor Report.

Demler says it isn’t yet clear how much of a market there will be for the cluster, especially since some potential customers are already designing their own, more specialized chips in-house. He adds that the real performance of the chip, in terms of speed, efficiency, and cost, are as yet unclear. Cerebras hasn’t published any benchmark results so far.

“There’s a lot of impressive engineering in the new MemoryX and SwarmX technology,” Demler says. “But just like the processor, this is highly specialized stuff; it only makes sense for training the very largest models.”

Cerebras’s chips have so far been adopted by labs that need supercomputing power. Early customers include Argonne National Labs, Lawrence Livermore National Lab, pharma companies including GlaxoSmithKline and AstraZeneca, and what Feldman describes as “military intelligence” organizations.

This shows that the Cerebras chip can be used for more than just powering neural networks; the computations these labs run involve similarly massive parallel mathematical operations. “And they’re always thirsty for more compute power,” says Demler, who adds that the chip could conceivably become important for the future of supercomputing.

David Kanter, an analyst with Real World Technologies and executive director of MLCommons, an organization that measures the performance of different AI algorithms and hardware, says he sees a future market for much bigger AI models generally. “I generally tend to believe in data-centric ML, so we want larger datasets that enable building larger models with more parameters,” Kanter says.

According to Feldman, Cerebras plans to expand by targeting a nascent market for massive natural language processing AI algorithms. He says the company has talked to engineers at OpenAI, an AI company in San Francisco that has pioneered use of massive neural networks for language learning as well as robotics and game-playing.

The latest of OpenAI’s algorithms, called GPT-3, can handle language in surprisingly cogent ways, ginning up news articles on a given topic or summarizing content coherently, or even writing computer code, although it is also prone to fits of misunderstanding, misinformation, and occasional misogyny. The neural network behind GPT-3 has around 160 billion parameters.

“In talking to OpenAI, GPT-4 will be about 100 trillion parameters,” Feldman says. “That won’t be ready for several years.”

OpenAI has made GPT-3 accessible to developers and startups via an API, but the company faces increasing competition from startups developing similar language tools. One of the founders of OpenAI, Sam Altman, is an investor in Cerebras. “I certainly think we can make much more progress on current hardware,” Altman says. “But it would be great if Cerebras’s hardware were even more capable.”

Building a model the size of GPT-3 produced some surprising results. Asked whether a version of GPT that’s 100 times larger would necessarily be smarter— perhaps demonstrating fewer errors or a greater understanding of common sense—Altman says it’s hard to be sure but he’s “optimistic.”

Such advances may be at least a few years away. Nearer term, Cerebras is hoping that enough companies will see a need for hardware designed to supersize all sorts of AI models.

More Great WIRED Stories

Read More

Will Knight