Super-Resolution is one of the most interesting applications of Deep Learning. To introduce the task, super-resolution refers to the problem of taking an image of size 64 x 64 for example, and upsampling it to 256 x 256. This resolution-increasing upsampling operation is done in such a way that it adds detail to the image.
One of the applications of this is for the use of Graphics Engines. Graphics Engines are currently constrained by the large size of data virtual environments require. Super-resolution offers the opportuinty to store samller images in memory and then use a model to infer the corresponding high-resolution objects. Thus, devies such as mobile phones or virtual reality headsets have to store much less data to recreate the virtual environments.
Naively this is done through a nearest neighbor interpolation. In this technique, you would slide an n x m kernel filter over the image and infer the high resolution neighbors based on the local statistics of the kernel. Their are many ideas in between this technique and the use of super-resolution CNNs and GANs, but this is out of the scope of this survey.
In their paper, Learning a Deep Convolutional Network for Image Super-Resoluton, Dong et al. show how a Convolutional Neural Network can be trained to learn a mapping from low to high resolution. This is done by breaking images up into smaller patches (16 x 16 crops for example), and learn the low to high resolution mapping in this way. The mapping is done through a series of convolutions and training follows many of the same principles as regular image classification models. The model is trained by taking the pixel wise loss between the ground truth high resolution patch and the predicted patch.
This idea worked fairly well, but was really revolutionized with two additional loss terms, feature and adversarial losses. The idea of feature losses was largely popularized in a paper titled, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, by Johnson et al. The idea of this is to take intermediate features from a pre-trained neural network and use this to calculate the loss between the predicted and ground truth high resolution patches. Intuitively, this is denoted as the ‘perceptual loss’, because it is based on the learned representations of an advanced image classification model.
In addition to the use of perceptual / feature losses, the adversarial loss present in Generative Adversarial Networks had a massive impact on the development of Super-Resolution models. The paper, Photo-Realistic Single Image Super-Resolution Using a Generative Adversairal Network, from Ledig et al. is one of the best demonstrations of this idea. From an intuition perspective, GANs are perfectly suited for the task of image super-resolution. GANs solve the problem of generating data which is indistinguishable from the training data. In super-resolution this means that the resulting patch should be indistinguishable from other high resolution patches. This helps solve one of the problems with many Deep Learning probelms, especially evident in super-resolution, problems where there are more than one correct answer. For example, there are many different ways to upsample an image patch to its high resolution counterpart. The pixel wise loss function completely fails to account for this.