Time for showdown! Lets assemble the layers, bring forward our model solvers and try to train the CNN we implemented from scratch on the oh so popular MNIST dataset and see how well we can do.
Thanks to active research, we are much better equipped with various optimization algorithms than just vanilla Gradient Descent. Lets discuss two more different approaches to Gradient Descent - Momentum and Adaptive Learning Rate.
Lets dig a little deep into how we convert the output of our CNN into probability - Softmax; and the loss measure to guide our optimization - Cross Entropy.
Overfitting has always been the enemy of generalization. Dropout is very simple and yet very effective way to regularize networks by reducing coadaptation between the neurons. More discussion and implementation follows.
Batch Normalization is new technique that gives relaxation while initializing the network, allows higher learning rate and allows us to train very deep networks. Very promising! Lets derive the math for forward and backward pass step by step by hand and implement the BatchNorm layer!
Pooling layers are important building block of CNNs. They summarize the activation maps and keep the number of network parameters low. Time to implement Maxpool!
What makes CNN special is of course the Convolution Layers. Inspired by how visual cortex in animals work, these layers extract features independent of where they occur in the images. Lets derive the math and implement our own Conv Layer!
Convolution Neural Networks revolutionized Computer Vision, beat World Champion at Go and made deep learning happen. Lets examine the core ideas behind these amazing CNNs - Local Receptive Fields, Shared Weights, Pooling and ReLU.