Stability AI’s first release, the text-to-image model Stable Diffusion, worked as well as—if not better than—closed equivalents such as Google’s Imagen and OpenAI’s DALL-E. Not only was it free to use, but it also ran on a good home computer. Stable Diffusion did more than any other model to spark the explosion of open-source development around image-making AI last year.
This time, though, Mostaque wants to manage expectations: StableLM does not come close to matching GPT-4. “There’s still a lot of work that needs to be done,” he says. “It’s not like Stable Diffusion, where immediately you have something that’s super usable. Language models are harder to train.”
Another issue is that models are harder to train the bigger they get. That’s not just down to the cost of computing power. The training process breaks down more often with bigger models and needs to be restarted, making those models even more expensive to build.
In practice there is an upper limit to the number of parameters that most groups can afford to train, says Biderman. This is because large models must be trained across multiple different GPUs, and wiring all that hardware together is complicated. “Successfully training models at that scale is a very new field of high-performance computing research,” she says.
The exact number changes as the tech advances, but right now Biderman puts that ceiling roughly in the range of 6 to 10 billion parameters. (In comparison, GPT-3 has 175 billion parameters; LLaMA has 65 billion.) It’s not an exact correlation, but in general, larger models tend to perform much better.
Biderman expects the flurry of activity around open-source large language models to continue. But it will be centered on extending or adapting a few existing pretrained models rather than pushing the fundamental technology forward. “There’s only a handful of organizations that have pretrained these models, and I anticipate it staying that way for the near future,” she says.
That’s why many open-source models are built on top of LLaMA, which was trained from scratch by Meta AI, or releases from EleutherAI, a nonprofit that is unique in its contribution to open-source technology. Biderman says she knows of only one other group like it—and that’s in China.
EleutherAI got its start thanks to OpenAI. Rewind to 2020 and the San Francisco–based firm had just put out a hot new model. “GPT-3 was a big change for a lot of people in how they thought about large-scale AI,” says Biderman. “It’s often credited as an intellectual paradigm shift in terms of what people expect of these models.”