Henry AI Labs

Computer Vision

Computer Vision is one of the primary applications of Deep Learning. Recall that the primary idea of Deep Learning is to use Deep Neural Networks to learn representations from raw data. Image data is one of the best examples of this expression, ‘raw data’. Image data are typically stored as RGB matrices of some height x width dimensions. Larger resolution images have higher values for the height and width, and lower resolution images have smaller values.

Image data is ubiquitious in many software applications such as the vision systems for robotic navigation and self-driving cars, processing photos uploaded to facebook and other social media sites, images of products on amazon, and many more. Much of the information in our world can be represented visually, thus making computer vision, the idea of teaching algorithms to interpret the visual world, is very interesting.

Deep Learning excels at processing image data compared to traditional Machine Learning approaches. These traditional approaches constructed features describing images primarily based on things such as color histograms and optical flow. Deep Learning is able to take raw image data as input and convert it into features which can be used for tasks such as classification and regression. Broadly put, Deep Learning applications in Computer Vision largely fall into three categories, image recognition, object localization, and semantic segmentation, however, there are other applications of Computer Vision as well which will be discussed in this survey such as super-resolution and image editing.

Image classification is the best way to get started with Deep Learning in Computer Vision. The task of image classification describes learning representations from raw image data and then using these representations to classify images into a set of labels. For example, classifying an image as a cat or a dog.

For image classification, some of the most popular datasets used to describe new ideas are the CIFAR-10 and ImageNet datasets. CIFAR-10 consists of 50,000 training and 10,000 testing images belonging to 10 categories such as horse, bird, or car. One issue with CIFAR-10 is that the images are very small, (32 x 32 x 3). This is practically useful, since these smaller iamges can easily fit into memory and allow for training with larger batch sizes. However, it is interesting to think if this low resolution sacrifices some information in the image. One interesting area of future work that he are looking at with Henry AI Labs is use super-resolution techniques on these images to see if it is easier to classify higher resolution images.

On the other hand, ImageNet is a dataset containing about 1.2 million images belonging to 1,000 different classes. One interesting problem to consider when dealing with ImageNet is how you are going to resize the images, since they are not all uniformly sized.

There are many anecdotes of how image classifiers are useful on real-world problems. One very famous example is the story of a farmer using an image classifier built with TensorFlow to identify different types of cucumbers. One way which we have used image classifiers at Henry Ai Labs is to construct a basketball highlight reel generator. This generator filters the clips based on the presence of a rim in the image. Thus the classifier detects wether the image belongs to either 'rim' or 'no rim' to edit videos.

Building your first image classifier will help you get familair with the basic syntax of building a convolutional neural network. The next steps to improving the classification accuracy likely revolve around either using data augmentation or building an advanced model such as a ResNet, DenseNet, or Inception Network.

Interesting Papers in Computer Vision

AlexNet is an essential paper to read for understanding the hype around Deep Convolutional Networks and their application to image classification: R-CNNs Fully-Connected Nets for Semantic Segmentation ResNet DenseNet Inception Network

Simulated Ad Box