The quest to improve the performance of Deep Convolutional Networks has been headlined by architectures that enable training deeper networks. The most famous of these are ResNets which have skip connections that propagate information from previous layers ahead with the use of skip connections. Traditional CNNs struggle with training very deep networks because the gradient vanishes as it is repeatedly back propagated through the network.
This article will present another architecture enabling the training of Deep CNNs known as DenseNet. The model is named as such due to the connectivity pattern differentiating the architecture. The picture below is a great way to begin to understand the DenseNet CNN:
As shown in the picture above, all of the feature maps from the previous layer are used as input to the next layer. This is different from ResNets because only the feature maps from the previous and next layer are concatenated together via a skip connection in ResNets. In DenseNets, the feature maps are concatenated as they are, without any identity mappings. To further describe this connectivity pattern, a traditional CNN with L layers has L connections, where a series of L Dense Blocks thus contains L(L+1) / 2, due to every preceding input map being connected to its ancestors along the feed-forward processing chain.
The function below further summarizes the idea of DenseNet connectivity:
This function depicts that the input to the next layer is the result of the all the previous feature maps
An intuition for why DenseNets are more parameter efficient
The authors of DenseNet present an interesting description of Neural Nets as being ‘algorithms with state’. Each sequential computational unit takes the state and modifies it in some way before passing it on to further layers. It is counter-intuitive to think that the connectivity of DenseNet would make it more parameter efficient. However, framing the context of Neural Networks as a series of state-modifying functions, traditional Neural Net functions need to learn to preserve important characteristics of the state from layer to layer. This preservation takes up many of the parameters in a Deep Neural Network. The DenseNet model does not need to learn these preservational mappings and is thus more efficient.
Returning to technical details of the DenseNet implementation, the convolutions in each layer of a Dense Block do not change the spatial dimensions. This is done to facilitate the concatenation of previous feature maps into the input of the next layer. If the convolutions were to change the spatial dimensions, then the unevenly sized tensor will cause errors in the next convolutional layer. If the convolutions in the Dense Block were to reduce the spatial resolution, then something like an identity mapping that is used in ResNets would be necessary.
Repeatedly concatenating all of the feature maps from the previous layers can result in an input with a very large depth. For this reason, the DenseNet model adds intermediate 1x1 convolutions designed to reduce the depth of the feature maps, pictured below:
The picture above shows the overall architecture of the DenseNet CNN model. A Dense Block consists of a series of convolutions with dense connections as has been described in the article. Following a dense block is a feature map with a very large depth, (e.g. for a feature map h x w x c, c is very large). To reduce the depth of the feature map, a 1x1 convolution is performed. Spatial resolutions are then halved through the use of a max pooling operation.
The picture above presents a more complete picture of exactly how the DenseNet model reported in this paper is implemented.
The following picture shows how the DenseNet model outperforms the ResNet model on the CIFAR-10 classification task:
Aside from the discussion on DenseNets, we can see an interesting performance increase in ResNets when the depth of the network grows from 110 Layers to 1001, (6.41% error rate to 4.62%). It also interesting to observe that the ResNets perform dramatically better with data augmentation compared to without it, whereas the DenseNets do not demonstrate such a performance degradation. Finally, we will comment on the obvious display with this chart, the DenseNet outperforms the ResNet significantly with a much lower error rate of 3.6% with data augmentation. It is interesting to also note that the DenseNet that achieves this result does so with 15.3 Million parameters.
Concluding Thoughts from Henry AI Labs
The DenseNet is a very interesting component in a line of work on extending the CNN architecture. The intuitive argument relating DenseNets to global state machines is very interesting. At Henry AI Labs, we are looking into how DenseNet connectivity can be incorporated into GAN architectures, particularly the generator component, and how we can extend meta-learning algorithms to be able to potentially find connectivity patterns such as this. Thank you for reading and checking out Henry AI Labs!