NVIDIA’s GauGAN2 artificial intelligence (AI) can now use simple written phrases to generate an appropriate photorealistic image. The deep learning model is capable of making different scenes in just three or four words.
GauGAN is NVIDIA’s AI application that was used to turn simple doodles into photorealistic masterpieces in 2019, a technology that was eventually transformed into the NVIDIA Canvas app earlier this year. Now NVIDIA has advanced AI even further, where it only needs a brief description to generate a “photo”.
NVIDIA says that the deep learning model behind GauGAH gives everyone the opportunity to create beautiful scenes, and now it’s even easier than it has ever been. Users can simply enter a phrase like “sunset at a beach” and the AI will generate the scene in real time as each word is added. Adding an adjective like “sunset at a rocky beach” or replacing “sunset” with “afternoon” or “rainy day” and the model will change the picture based on what is called generative adversarial networks (GAN).
“At the touch of a button, users can generate a segmentation map, a high-level contour that shows the location of objects in the scene,” says NVIDIA. “From there, they can switch to drawing, adjusting the scene with rough sketches using labels like sky, wood, stone and river, so the smart paintbrush can incorporate these doodles into stunning images.”
NVIDIA says the demo is one of the first to combine multiple modalities within a single GAN network. GauGan2 combines segmentation mapping, inpainting and text-to-image generation in a single model, which NVIDA says makes it a powerful tool that allows users to create photorealistic art with a mix of words and drawings. The goal is to make it faster and easier to transform an artist’s vision into a high-quality AI-generated image. NVIDIA says that compared to other state-of-the-art models specifically for text-to-image or segmentation card-to-image applications, GauGAN2 produces greater variety and higher quality image sets.
“Instead of having to draw any element of an imaginary scene, users can enter a short sentence to quickly generate the key features and theme of an image, such as a snow-capped mountain range,” says NVIDIA. “This starting point can then be customized with sketches to make a particular mountain higher or add a few trees to the foreground or clouds in the sky.”
While the realistic imagery is probably the most impressive, GauGAN2 is not limited to that kind of recreation. Artists can also use the demo to depict supernatural, fictional landscapes. NVIDIA is showing a scene that recreates something similar to Star Wars fictional planet Tatooine, where the desert scene was originally created by the model but another sun is added afterwards.
“It’s an iterative process where each word the user enters in the text box adds more to the AI - created image.”
The text-to-image feature can be tested on NVVIDIA AI demos, where anyone can try to create custom scenes with text prompts and further adjust them with quick sketches to create more refined results.