The StyleGAN model presented by NVIDIA’s research lab is an incredible demonstration of the capabilities of Generative Adversarial Network. The image below presents some of the awe-inspiring images generated with this architecture!

As Deep Learning researchers, we want to understand how this model works and what recent advancements lead to this amazing result. This blog post will cover how the StyleGAN model builds on their original Progressively Growing GAN with the Adaptive Instance Normalization used in state-of-the-art Neural Style Transfer.

Before explaining the StyleGAN architecture, we will begin with understanding the AdaIN, (Adaptive Instance Normalization layer). This technique was presented from Huang et al. in their paper, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization”. The following equations are for Batch Normalization, Instance Normalization, and Adaptive Instance Normalization:

Batch Normalization is one of the central ideas in training Deep Neural Networks. This operation normalizes intermediate feature activations so that the distribution of activations follows a gaussian distribution parameterized by the gamma and beta parameters. The gamma parameter essentially defines the height of the gaussian peak and the beta parameter defines the standard deviation of feature activations form this peak.

Batch normalization derives the gamma and beta parameters based on batches of training samples. For example, you would typically train a Deep Network by giving it something like 32 or 64 images at a time. Batch normalization would derive gamma and beta based on the batch statistics. Contrastingly, instance normalization derives gamma and beta based a single image. Hopefully this makes it clear how Instance Normalization differs from Batch Normalization.

Going further than this is the idea of Adaptive Instance Normalization. This is a popular mechanism from Neural Style Transfer. Neural Style Transfer in its original form combines content and style features through an iterative optimization process. This optimization process combines features through the use of a Gram Matrix. Adaptive Instance Normalization presents another way to combine features via the gamma and beta normalization parameters. Exploring exactly how this works is a subject of future work. The Adaptive Instance Normalization equation is show once again below:

In Style Transfer, the x and y pairs of Adaptive Instance normalization refer to the content and style image. In the StyleGAN model, the y is derived from the latent input vector, z. However, this z is first passed through a series of fully-connected layers to derive a better representation w. The motivation for this is presented later in this post. The image below depicts the entire StyleGAN, we will continue to describe the individual components in the article:

The Style-Based Generator starts the upsampling convolution process from a learned constant value. This means that each sample starts from the exact same vector. The variation of images is added via the gamma and beta parameters of Adaptive Instance Normalization controlled by the latent vector z. The y parameters of AdaIN are derived from w. To add additional variation to the process, gaussian noise is concatenated with intermediate features through a naive broadcasting operation.

The mapping network used is a very intriguing detail to the StyleGAN. The network parametrizes the mapping from z to the gamma and beta, (also frequently referred to as gains and biases in the context of Batch Normalization). It is also interesting to see how deep this mapping network is, consisting of 8 Fully-Connected layers from z to w!

Another interesting mechanism they use is style-mixing. This refers to using one latent code z1 up to a certain point of w and then switching to another latent code z2. Looking at the diagram, this can be seen as using z1 to derive the first two AdaIN gain and bias parameters, and then using z2 to derive the last two AdaIN gain and bias parameters.

The results of the StyleGAN model are not only impressive for their incredible image quality, but also their control over latent the space. The following image depicts the amazing latent space interpolation this model is capable of:

In addition to the latent space interpolation capabilities, varying the injected noise allows for interesting variations in the resulting images as well. Note the subtle changes in small details with changes to the noise vector illustrated below:

The authors of StyleGAN are very determined to advance how we think of and measure diversity in generated images. They present two very interesting strategies for calculating the quality of latent space interpolation: Perceptual Path length and Linear separability.

Perceptual Path length refers to the idea of using a pre-trained network to classify perceptual distance between images. In a well-trained smooth latent space, changing the latent code z should not result in dramatically different images. They measure the distance perceptually, meaning that they pass images through a pre-trained network and take the distance between their intermediate activations, opposed to doing some kind of Euclidean or Manhattan distance between images in the pixel space. The following table depicts how added mechanisms to the StyleGAN model improves this Perceptual Path length metric:

# Concluding Thoughts from Henry AI Labs

The StyleGAN model is one of the most interesting Generative Adversarial Network architectures out there. We are very happy to provide an intuitive explanation of this, and hope that it motivates further exploration from the Deep Learning research community. We are looking at testing these components in a smaller network generating CIFAR-10 data. As discussed in another article on Henry AI Labs, CIFAR-10 data has very high variance compared to facial images. We are looking to combat this with the Class Splitting GAN model to focus on subsets of the dataset. We also highly recommend checking out the interpolation video of StyleGAN provided by the authors of this paper. Thanks for reading and checking out Henryailabs.com!

Tweet A Style-Based Generator Architecture for Generative Adversarial Networks. Tero Karras, Samuli Laine, Timo Aila