A Brief Explanation Of StyleGAN
The explosion in technological evolution when it comes to GANs (Generative adversarial networks) in the past few years has been exponential. From a black and white 64×64 resolution render to a now near picture-perfect 1024×1024 resolution picture in 7 short years.
It all starts with the dataset. The model is very particular when it comes to the parameters of the images. First, the resolution must be by a power of 2. This could be 128×128, 512×512, 1024×1024 etc. Let’s say you want to generate faces using StyleGAN. The Neural Network must have many images to reference upon while training, as it has to evaluate not only the facial features but the texture of the skin and more that goes into generating a realistic photo.
When it comes to training a model, a popular comparison is the game of Cops & Robbers. The robber, who for sake of comparison is an identity counterfeiter, tries to create fake identities, while the cop tries to catch fake IDs. The generator network (the robber) generates an image, and the discriminator network (the cop) looks at the image and determines if it is real. If either network is correct, the loser will slightly tweak itself with the goal of becoming more accurate to the comparison image. However, a tweak isn’t always successful, which is why you may see the occasional drop in quality only for it to bounce back in later iterations. This process is repeated thousands of times until the user stops the neural network. After the training process is complete, we can now ask the robber to generate images for whatever purpose we choose to use it for.
With the original GAN network, this game of Cops And Robbers was the extent of the process. Two simple neural networks (the discriminator and generator) combined to make a complex network. However, human nature cannot simply be satisfied, and others attempted to improve on this design. This is why in 2015, Alec Radford, a college student studying machine learning, improved on the original GAN to make the individual neural networks became more complex Convolutional Neural Networks. This change to convolutional networks meant that during the training, the generator would only generate a part of the image, which let the AI focus on more specific features, allowing a more detailed final image. A series of other versions from this evolutionary leap were created, such as CoGAN, ArtGAN, and DiscoGANS that were created to improve or specialize the original project.
There was one major problem with the newer GANs, however. The goal of the neural network generator was for the generated images to trick the discriminator, not necessarily to make a high-quality image. This means that if the generator created a high-quality image, the discriminator would easily just say “It’s too high quality!” and throw it out. This meant the generator got a little tricky, as robbers do. It would create images that look to the human eye noticeably off, which the discriminator would qualify as real. Obviously, this hurts the final output of the model, even though it technically reached its goal.
This issue was addressed in a major way in 2018 with NVIDIAs revolutionary StyleGAN. This used a dynamic approach to generating images. First, it would use the original GANs simple neural networks on very low resolutions, such as 4×4 resolution. This is used for about 100 iterations, in which the discriminator has trouble comparing the real and fake due to its poor quality. This process gradually increases the complexity of the networks and pixel size, eventually ending in the full resolution. Not only that, but we can now add custom characteristics, such as blonde hair (augmentations) to increase our input into what the network generates.
That was 2018, now it’s 2021 and things have still improved. We are still using the 2018 StyleGAN as a base, which speaks to its quality. With StyleGAN2-ADA, the discriminator is becoming, well, more discriminating. This increases the quality of the final generation. Additionally, the speed and efficiency of training have become much more accessible, allowing individuals to train a model themselves.
If you want to generate art yourself using the latest version of the GAN technology, you can check out my tutorial here