The Splitting GAN model is one of the most underrated contributions to GAN research! Conditional GANs set the scene on the fire by incorporating additional class labels as input to the generator and discriminator. This was shown to greatly stabilize training and overall, the quality of generated images. This clearly evidences that some kind of solution to GAN stability exists in the data space. Incorporating class labels explicitly defines separations in the decision space. The question many works such as AC-GAN, cGANs with projection discriminator, and text-to-image GANs seek to answer is: How can we get more out of conditional information?
The Splitting GAN model explores a very interesting idea for extending class labeling. This works in both the unsupervised and supervised setting. The fundamental idea is to cluster images based on the intermediate activation before classification in the discriminator.
To explain this derivation further, a discriminator is structured as a typical Convolutional Neural Network. It progressively downsamples the images until they are a low resolution image, sometimes flattening the features into vectors and passing these through Dense / Fully-Connected layers. The final classification layer has 1 logit, the probability prediction predicting wether the image is real or fake. However, the layer before might be a vector of activations of some dimension, say 128 x 1. These activations are taken and used to form the clusters in this study.
The image below further depicts this idea:
The intermediate activations from the discriminator are taken after 20 epochs of training. These activations are then clustered to form new class labels. The image below shows how CIFAR-10 images in the supervised setting are further devided from one class such as ‘horse’, to two sub-classes:
In the image above, images on the right and left represent sub-classes found by clustering the discriminator’s intermediate representations.
The researchers in the study implement this using the AC-GAN conditional GAN extension, as well as the improved Wasserstein-GAN loss function with ResNet layers in the model architecture. The images below show the Inception score results on CIFAR-10 in the supervised and unsupervised setting, as well as STL-10 images downsampled to 48 x 48.
Concluding thoughts from Henry AI Labs
We are looking at ways to improve GAN training. Conditional GANs are one of the most interesting phenomena with GAN training. The Splitting GAN model presents a very interesting extension to this idea. We are additionally interested in how this is implemented. For example, rather than taking the intermediate activations from the discriminator during training. The clusters could be formed through an auxiliary classification model, or some kind of auto-encoder. This is a foundational idea with many opportunities to be extended.Tweet Class-Splitting Generative Adversarial Networks. Guillermo L. Grinblat, Lucas C. Uzal, Pablo M. Granitto