Image Credit: Andriy Onufriyenko/Getty Images
Did you miss a session from MetaBeat 2022? Head over to the on-demand library for all of our featured sessions here.
There’s no shortage of groundbreaking technology underpinning generative AI, but one key innovation is is diffusion models. Inspired by thermodynamic concepts, diffusion models have piqued the public interest, quickly displacing generative adversarial networks (GANs) as the go-to method for AI-based image generation.
These models learn by corrupting their training data with incrementally added noise and then determining how to reverse this noising process in order to recover the original image. After being trained, diffusion models can use these denoising methods to generate new “clean” data from random input. Popular text-to-image generators such as DALL-E 2, Imagen and Midjourney all use diffusion models. Another key entrant in this category is Stability AI, the startup behind the Stable Diffusion model, a powerful, free and open-source text-to-image generator that launched in August 2022.
Founded in 2020 by Emad Mostaque, Stability AI claims to be the world’s first community-driven, open-source artificial intelligence (AI) company that aims to solve the lack of “organization” within the open-source AI community.
“AI promises to solve some of humanity’s biggest challenges. But we will only realize this potential if the technology is open and accessible to all,” said Mostaque. “Stability AI puts the power back into the hands of developer communities and opens the door for groundbreaking new applications. An independent entity in this space supporting these communities can create real value and change.”
Join today’s leading executives at the Low-Code/No-Code Summit virtually on November 9. Register for your free pass today.
The company recently announced $101 million in funding. The oversubscribed round was led by Coatue, Lightspeed Venture Partners and O’Shaughnessy Ventures LLC. In a statement, Stability AI said that it will use the funding to accelerate the development of open-source AI models for image, language, audio, video, 3D and more, for consumer and enterprise use cases globally.
Stable diffusion is truly ‘open’
Much like most of its counterparts, Stable Diffusion aims to enable billions of people to instantly create stunning art. The model itself is based on the work of the CompVis and Runway teams in their widely used latent diffusion model, as well as insights from Stability AI’s lead generative AI developer Katherine Crowson’s conditional diffusion models, Dall-E 2 by OpenAI, Imagen by Google Brain, and many others.
The core dataset was trained on LAION-Aesthetics, a subset of LAION-5B, which was created using a new CLIP-based model that filtered LAION-5B based on how “beautiful” an image was, based on ratings from Stable Diffusion’s alpha testers. On consumer GPUs, Stable Diffusion uses less than 10 GB of VRAM to generate images with 512 x 512 pixels in a matter of seconds. This enables researchers and, eventually, the general public, to run the program under a variety of conditions, democratizing image generation.
The model was trained on Stability AI’s 4,000 A100 Ezra-1 AI ultracluster. The company has been testing the model at scale with more than 10,000 beta testers creating 1.7 million images a day.
The emphasis on open source distinguishes Stable Diffusion from other AI art generators. Stability AI has made public all of the details of its AI model, including the model’s weights, which anyone can access and use. Stable Diffusion, unlike DALL-E or Midjourney, has no filters or limitations on what it can generate, including violent, pornographic, racist or otherwise harmful content.
“The open way that Stable Diffusion’s image generation model was released — allowing users to run it on their own machines, not just via API — has made it a landmark event for AI,” said Andrew Ng, Ph.D., a globally recognized leader in AI. He is founder and CEO of DeepLearning AI, and founder and CEO of Landing AI.
Since launching, Stable Diffusion has been downloaded and licensed by more than 200,000 developers globally.
Turning imagination into reality with DreamStudio
Stability AI also offers a consumer-facing product, DreamStudio, which the company describes as “a new suite of generative media tools engineered to grant everyone the power of limitless imagination and the effortless ease of visual expression through a combination of natural language processing and revolutionary input controls for accelerated creativity.” The product currently has a million registered users from more than 50 countries who have collectively created more than 170 million images.
While the Stable Diffusion model has been made open source by Stability AI, the DreamStudio website is a service designed to enable anyone to access such creative tools without the need for software installation, coding knowledge, or a heavy-duty local GPU — but it does come with a cost. All new users will get a one-time bonus of 200 free DreamStudio credits. At default settings, users will be charged one credit per image. Depending on the image resolution and step count users choose (size, Cfg scale, seed, steps, and image count), the cost-per-image at non-default settings can go as low as 0.2 credits per image or as high as 28.2 credits per image. Once the free credits run out, users will need to buy more. Generated images are always saved in history, and you can integrate them with your existing applications using the API.
The fuzzy future
While Stability AI’s business strategy still remains fuzzy, in a recent interview with ML enthusiast and YouTuber Yannic Kilcher, Mostaque said that he is already in talks with “governments and large organizations” to offer Stable Diffusion’s tech. “We’ve negotiated a large number of deals, so we’ll be profitable at the door, compared to large corporations that lose most of their money,” he added.
“At Coatue, we believe that open-source AI technologies have the power to unlock human creativity and achieve a broader good,” explained Sri Viswanath, general partner at Coatue. “Stability AI is a big idea that dreams beyond the immediate applications of AI. We are excited to be part of Stability AI’s journey, and we look forward to seeing what the world creates with Stability AI’s technology.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.