Generative Adversarial Networks, (referred to as GANs throughout the survey), are one of the most interesting ideas, and the primary research focus of Henry AI Labs. This survey will introduce you to the timeline of GAN development, as well as quick descriptions of some of the most interesting papers published on this subject.
The high-level overview of GANs can be quickly described via the cop versus counterfeiter anecdote. Imagine a counterfeiter is trying to produce money such that the cop cannot tell if the money is real or fake. Each round the counterfeiter makes some money and the cop sends back a signal as to which money looks the most real. This continues until eventually the counterfeiter is able to fool the cop.
Already this anecdote helps shed a light on some of the primary issues that plague GAN training. As the counterfeiter starts making fake money, the money is very bad and the cop may not classify any of it as real money. Due to this, the counterfeiter is never able to pick up any signal from the cop and learn how to generate better money. This describes the GAN training problem in which the discriminator loss quickly converges to 0.
Another GAN probelm that can be described with this anecdote is the issue of mode collapse. Mode collapse describes the phenomenon in which the counterfeiter produces thes ame fake money every time. Maybe stepping a little outside of our analogy, we can imagine designing counterfeit money as a very high-dimensional space with many local maxima. The cop may be fooled by a certain set of fake money which causes the counterfeiter to get stuck in this local maxima and eventually produce the same fake money over and over again.
The ideal counterfeiter, trained through this process, should exhibit some variability in the fake money samples at each iteration. This is maybe the biggest issue with describing GANs with this analogy because the counterfeiter is just after one type of money which fools the co. There is no additional utility to the counterfeiter if the money is diverse and fooling the cop. Stepping back into the idea of Generative Adversarial Networks. We frequently want them to learn a distribution of data that looks real, rather than a single real-looking data point.
The original paper from Goodfellow establishes the GAN framework. It also presents the concept of a 'non-saturating' loss.
The DCGAN model presented from Radford et al. is the next logical step to take with your understanding of GANs. This model builds on the original GAN formulation most notably by adding upsampling convolutional laeyrs in the generator, and additional offering many heuristics, such as the use of Leaky ReLU activations, to help stabilize training.
Salimans et al. continue the discussion of the DCGAN model and offer more improvements to the internal architecture of this model. Most notably they use trick ssuch as feature matching, virtual batch normalization, and one-sided label smoothing.
Diverging from this discussion, Mirza presents the conditional GAN framework. This technique dramatically improved the quality of GAN outputs, as well as stabilization during training. Mirza presents a fairly naive way of conditioning GAns, and the complexity in which GAN conditioning is implemented is a frequent topic covered in futrue GAN research works.
Following the conditional GAN, the AC-GAN model buidls on this by adding an additional task to the discriminator based on the auxiliary informatin. The discriminator is no longer tasked with classifying images as just real or fake, but rather classifying the label they belong to, as in a traditional image classifier. This idea is also used in the InfoGAN paper which tasks the discriminator with predicting the latent space vector z. Another interesting idea related to this is the Multiple Generator model. In this model, their is a many-to-one mapping of generators to discriminators and the discrimiantor predicts the image as real or to whcih generator produced the image. This idea is intended to force diversity in the generated data.
Orthogonal to the discussion of the internal architecture of DCGANs or the design of conditioning in conditional GANs, is the use of multi-scale architectures. This idea is most popularly demonstrated in the Progressive Growing of GANs paper. This paper produes high-resolution images of faces which continue to serve as a demonstration of the potential with GANs. Other papers that use this approach include Laplacian Pyramid GAN and StackGAN. Multi-scale models refer to the general iea of breaking the task of generating high-resolution images into more tractable sub-problems. For example, first generating 8x8 images, then 16x16, and so on up to 1024x1024.