Artificial intelligence that creates realistic three-dimensional images could be run on a laptop and make it faster and easier to create animated films
22 June 2022
By Alex Wilkins
Artificial intelligence models could soon be used to instantly create or edit near-photorealistic three-dimensional scenes on a laptop. The tools could help artists working on games and CGI in films or be used to create hyperrealistic avatars.
AIs have been able to produce realistic 2D images for some time, but 3D scenes have proved to be trickier due to the sheer computing power required.
Now, Eric Ryan Chan at Stanford University in California and his colleagues have created an AI model, EG3D, that can generate random images of faces and other objects in high resolution together with an underlying geometric structure.
“It’s among the first [3D models] to achieve rendering quality approaching photorealism,” says Chan. “On top of that, it generates finely detailed 3D shapes and it’s fast enough to run in real time on a laptop.”
EG3D and its predecessors use a type of machine learning called a generative adversarial network (GAN) to produce images. These systems turn two neural networks against each other by using one to generate images and another to judge their accuracy. They repeat this process many times until the result is realistic.
Chan’s team used features from existing high-resolution 2D GANs and added a component that can convert these images for 3D space. “By breaking down the architecture into two pieces… we solve two problems at once: computational efficiency and backwards compatibility with existing architectures,” says Chan.
However, while models like EG3D can produce 3D images that are near photorealistic, they can be difficult to edit in design software, because although the result is an image we can see, how the GANs actually produce it is a mystery.
Another new model could be able to help here. Yong Jae Lee at the University of Wisconsin-Madison and his colleagues have created a machine learning model called GiraffeHD, which tries to extract features of a 3D image that are manipulatable.
“If you’re trying to generate an image of a car, you might want to have control over the type of car,” says Lee. It could also potentially let you determine the shape and colour, and the background or the scenery in which the car is actually situated.
GiraffeHD is trained on millions of images of a specific type, such as a car, and looks for latent factors – hidden features in the image that correspond to categories, such as car shape, colour or camera angle. “The way our system is designed enables the model to learn to generate these images in a way where these different factors become separate, like controllable variables,” says Lee.
These controllable features could eventually be used to edit 3D-generated images, so users could edit precise features for desired scenes.
Details of these models are being revealed at the Computer Vision and Pattern Recognition conference in New Orleans, Louisiana, this week.
EG3D and Giraffe HD are part of a wider move towards using AIs to create 3D images, says Ivor Simpson at the University of Sussex, UK. However, there are still problems to iron out in terms of wider applicability and algorithmic bias. “They can be limited by the data you put in,” says Simpson. “If a model is trained on faces, then if someone has a very different face structure which it’s never seen before, it might not generalise that well.”