Henry AI Labs

Medical Image Augmentation

Generative Adversarial Networks (GANs) have many exciting applications such as image editing, text-to-image, super-resolution, amongst others. One of the most practical applications of GANs is to combat problems with limited datasets. Training Deep Neural Networks requires enormous amounts of data. Some application domains unfortunately do not have access to big datasets due to the manual labor cost and expert labeling needed to build them. Medical images are the primary example of this.

This post will cover the experimental results from Frid-Adar et al. in their paper, “GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification”. The article is organized as follows: First, we will describe the Liver Lesion data, then we will look at the GAN architectures tested (DCGAN versus ACGAN), and finally we will look at the performance results found and other interesting ideas presented in the paper.

This image shows where the Liver Lesion ROI images are derived from. Expert radiologists identify the regions that should be classified with the CNN model. These regions are divided into three groups for these experiments. These groups are depicted in the image below:

The plot above depicts the different lesion categories. It is interesting to inspect the datasets and try and gain an intuition for what kind of visual features you might expect to be useful. For example, the ‘Cysts’ images appear to be categorized by dense, dark pixels.

The image below shows how the CNN is trained to learn these visual features:

The Convolutional Neural Network architecture used for classification is shown above. They use a very small network compared to other experiments such as ResNet, DenseNet, or VGG. Even with this network, which is fairly small within the context of Deep Learning, they suffer dramatically from overfitting with their dataset.

Shifting our focus to the dataset and data augmentation, the first interesting detail is the exact size and distribution of their dataset. They experiment with 182 liver lesion images, belonging to three classes. These classes are distributed 53, 64, and 65. For this reason, they also present sensitivity and specificity metrics, due to the distributional bias in the accuracy metric. The image below provides a quick reminder of the sensitivity and specificity metrics used to evaluate classifiers:

Firstly, the researchers look into using classic augmentations to enhance their dataset. This includes rotations, flips, translations, and scaling. They report producing 480 augmented images per each image in their original training dataset. This number is determined to be optimal through experimenting with different augmented dataset sizes.

The image below depicts how classical augmentations transform images, (results of rotations, flipping, translation, and scaling):

The use of classical augmentations takes their classification accuracy from 57% to 78.6%. To get the next performance boost, they turn to adding data generated from a Generative Adversarial Network.

They test two very popular GAN models, DCGAN and ACGAN. The DCGAN is most straightforward model to understand, improving on the original GAN design by incorporating upsampling convolutional layers, amongst other details such as Batch Normalization and Leaky ReLU activations with the slope set to 0.2. The ACGAN model is more complex to implement, but still fairly intuitive to understand. This extension provides an auxiliary conditional label as input to the generator and discriminator. For example, the generator receives a random noise vector and a one-hot encoded class label. The AC-GAN takes the conditional input one step further, tasking the discriminator with classifying the label of the image in addition to the real versus fake task. The image below describes these models:

The input to the DCGAN model is only individual classes. For example, the training set used to train the DCGAN would only consist of ‘cyst’ images. Frid-Adar find much better results using the DCGAN generated data. This is shown in the plot below:

The third method, ACGAN discriminator, refers to using the discriminator as a classifier. The ACGAN is trained by having the discriminator label the images as real or fake, as well as classifying the label of the image. It is interesting to see the discriminator separated and repurposed this way. Unfortunately, it doesn’t outperform the more traditional method of training a CNN on scratch with all the data. However, it is a promising idea.

In conclusion, adding the DCGAN data improves the classifier form 78.6% to 85.7%. The authors also present an interesting t-SNE visualization to show how adding GAN-generated data strengthens the separability of the three Liver Lesion classes.

This is a very interesting contribution for studies like this. The plot on the left shows the data before adding GAN-generated samples. There is a substantial overlap between the blue and green classes which seems to be dramatically alleviated with the plot on the right. I would be additionally interested in seeing this plot where the GAN-samples are colored differently from the original data. This could be an interesting marker of the distribution learned by the generator and how it manifests itself into a classifier’s feature space.

They also present how expert radiologists perform on identifying GAN-generated data from real data:

The experts are still able to beat random guessing at 62.5% and 58.6%. However, with all the research and advancements in GANs, it seems unlikely that this trend will continue.

To wrap up this post, the following image provides a great high-level overview of the experiment we have just covered:

Concluding thoughts from Henry AI Labs

With our interest in Generative Adversarial Networks, naturally we are looking at all the ways they can be applied to real problems. This is a very interesting example of how GANs can be used to enhance datasets. These enhanced datasets can be used to train the next generation of high performance Deep Learning models. Noting that these Deep Networks are data-starven and will usually perform better when given access to larger datasets. This is also an interesting solution to problems which arise due to limited data. Learning with limited data is generally a very interesting topic in Deep Learning and it is interesting to see how GANs can aid combatting this. Thanks for reading and checking out Henryailabs.com!

GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification. Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitati, Jacob Goldberger, Hayit Greenspan

Related Articles

Henry AI Labs