## Implementing Convolutional Neural Networks

A series of posts to understand the concepts and mathematics behind Convolutinal Neural Networks and implement your own CNN in Python and Numpy. Complete source code can be found here: https://github.com/parasdahal/deepnet.

​ ​

## Introduction to Convolutional Neural Networks

Convolution Neural Networks revolutionized Computer Vision, beat World Champion at Go and made deep learning happen. Lets examine the core ideas behind these amazing CNNs - Local Receptive Fields, Shared Weights, Pooling and ReLU.

​ ​

## Convolution Layer - The core idea behind CNNs

What makes CNN special is of course the Convolution Layers. Inspired by how visual cortex in animals work, these layers extract features independent of where they occur in the images. Lets derive the math and implement our own Conv Layer!

​ ​

## Maxpool Layer - Summarizing the output of Convolution Layer

Pooling layers are important building block of CNNs. They summarize the activation maps and keep the number of network parameters low. Time to implement Maxpool!

​ ​

## BatchNorm Layer - Understanding and eliminating Internal Covariance Shift

Batch Normalization is new technique that gives relaxation while initializing the network, allows higher learning rate and allows us to train very deep networks. Very promising! Lets derive the math for forward and backward pass step by step by hand and implement the BatchNorm layer!

​ ​

## Dropout Layer - The unconventional regularization technique

Overfitting has always been the enemy of generalization. Dropout is very simple and yet very effective way to regularize networks by reducing coadaptation between the neurons. More discussion and implementation follows.

​ ​

## Classification and Loss Evaluation - Softmax and Cross Entropy Loss

Lets dig a little deep into how we convert the output of our CNN into probability - Softmax; and the loss measure to guide our optimization - Cross Entropy.

​ ​

## Solving the model - SGD, Momentum and Adaptive Learning Rate

Thanks to active research, we are much better equipped with various optimization algorithms than just vanilla Gradient Descent. Lets discuss two more different approaches to Gradient Descent - Momentum and Adaptive Learning Rate.

​ ​

## Putting it all together and Classifying MNIST dataset

Time for showdown! Lets assemble the layers, bring forward our model solvers and try to train the CNN we implemented from scratch on the oh so popular MNIST dataset and see how well we can do.