What are Data Augmentations?
Data Augmentations refer to a set of transformations that alter the training data such that invariances are baked into the dataset, rather than needing to be learned by a Neural Net. Image data is the easiest way to gain an intuition for Data Augmentations.
Say you want an image classifier to detect cats versus dogs, if your training data only contains cats perfectly upright and centered in the frame, your classifier will be biased towards this orientation. Say the classifier sits in a pet feeder that identifies the type of pet before dispensing food. Sometimes the cat will approach the classifier from a different orientation. We don’t want slight translations or rotations to cause the classifier to accidentally feed the cat with the dog food.
Thus, in order ot avoid the classifier being biased towards particular orientations, you would augment the training images of cats to include left and right translations, rotations, and increased brightness.
Data augmentation has been used since AlexNet set the Computer Vision world on fire in 2012 by using Convolutional Neural Nets for ImageNet classification.
One problem related to implementing data augmentation is that the augmentations that work well for one dataset may not translate well to another one. The most simple example of this would be horizontal flipping in CIFAR-10 compared to MNIST. On the CIFAR-10 dataset, horizontal flipping helps encode translational information about images such as boats, horses, and cats. However, in MNIST horizontal flipping completes changes the labels of numbers.
Thus, it is an important research challenge to develop a way of searching for data augmentations that can be used with any dataset.
Meta-learning generally refers to the practice of using either an auxiliary neural net or another form of search algorithm to find the hyperparameters for another neural network. In this case, we are referring to using a Recurrent Neural Network trained with Proximal Policy Optimization to train a Convolutional Neural Netwrok on the task of Image Classification. We are training this Recurrent Network to choose which Data Augmentations will result in the best Image Classifier.
The Meta Learner will search through a discrete set of 16 augmentation functions, 10 magnitudes, and 11 probabilities. In the end, the Meta-Learner will find a policy consisting of 5 sub-policies. Each of these sub-policies contains an augmentation function, the probability of applying it, and the magnitude of the operation. In turn, the augmentations are sampled from the distribution defined by these probabilities.
AutoAugment is a Meta Learning strategy presented by researchers at Google Brain. This paper is a continuation of a series of interesting Meta Learning Papers headlined by Barret Zoph and Quoc Le. These papers include Neural Architecture Search and Searching for Activation Functions.
Defining the Search Space
AutoAugment searches through a discrete search space defined as follows:
The algorithm will find 5 policies and each policy contains 5 sub-policies as depicted above.
Each sub-policy consists of 2 out of 16 potential image processing functions to be applied in sequence such as (ShearX, Invert, Solarize, …)
Each operation contains 2 additional parameters, a probability of applying the processing function, and a magnitude to apply the augmentation with.
There are 16 processing functions, 11 probabilities (uniformly spaced), and 10 magnitudes for each processing function (uniformly spaced). Thus the search space for one sub-policy is (16 x 10 x 11)2 and the search space for the entire policy consisting of 5 sub-policies is (16 x 10 x 11)10.
Using RL to search for Augmentations
Naturally, a search space of (16 x 10 x 11)10 makes searching for a policy with exhaustive search completely impossible. Thus, the AutoAugment uses a Reinforcement Learning-based search to find an augmentation policy. AutoAugment uses a very similar Meta Learning strategy to papers such as Neural Architecture Search and Searching for Activation Functions. A controller Recurrent Network is connected to a 30 node softmax activation which predicts the augmentation policy used to train a child network. In this case, the child network consists of a WideResNet-40-2 trained for 120 epochs on a reduced dataset. The authors note that it is better to train with more epochs than use a larger dataset when time constrained.
Concluding thoughts from Henry AI Labs
AutoAugment is a very promising strategy for Meta-Learning data augmentations. As mentioned in other articles featured on Henry AI Labs such as “Searching for Activation Functions”, we are interested in seeing how search algorithms such as RL can be used to improve on the design of Deep Neural Networks. All of these search algorithms from Data Augmentations to Activation Functions are limited by the constraints of the designed space. In this example, the AutoAugment search can only choose between 16 augmentations and only has a discrete set of magnitudes as well. At Henry AI Labs, we are trying to see how we can extend this algorithm to run with less constraints, while still running in a reasonable time. Thanks for reading and checking out Henry AI Labs!