Building a Convolutional Neural Network from Scratch for MNIST Digit Recognition

A deep dive into constructing a CNN from first principles for handwritten digit recognition using the MNIST dataset, with educational code and step-by-step explanations.

deep learningconvolutional neural networkmnistpythonmachine learningtutorial

25/04/2024

Introduction

In the world of machine learning, convolutional neural networks (CNNs) have revolutionized how we approach image classification tasks. This project, cnn-from-scratch, is a hands-on implementation that guides you through building a CNN from the ground up, specifically for recognizing handwritten digits using the MNIST dataset. The goal is to provide an educational experience that demystifies the inner workings of CNNs and neural networks in general.

View the repo on GitHub

Overview

This project, cnn-from-scratch, is an educational implementation of a Convolutional Neural Network (CNN) designed to recognize handwritten digits from images in the MNIST dataset. The repository demonstrates how to build a neural network from the ground up, focusing on understanding and constructing the essential building blocks without relying on high-level deep learning frameworks.

Main Language: Jupyter Notebook (Python)
License: MIT
Topics: Convolutional Neural Network, Digit Recognition, MNIST, Machine Learning, Deep Learning

What is MNIST?

MNIST is a well-known dataset containing 60,000 training and 10,000 testing grayscale images of handwritten digits (0-9), each sized 28x28 pixels. It is commonly used for benchmarking image classification algorithms and neural networks. More information about MNIST can be found on the Wikipedia .

Project Goal

The primary goal is to build a simple artificial neural network that can predict the digit shown in each image from the MNIST dataset. This is achieved by constructing key neural network components manually, which helps users understand the inner workings of neural networks.

How It Works

Importing Libraries

The project utilizes the following Python libraries:

numpy for linear algebra computations
matplotlib for visualizing images, loss, and accuracy
dill for saving and loading the model state
tqdm for progress bars
keras.datasets for loading the MNIST dataset

Model Structure

Base Layer Class

A core element is the BaseLayer class, which acts as the parent for all neural network layers. It includes:

forward method: Handles the forward pass, computing the output for a given input.
backpropagation method: Handles the backward pass, computing gradients and updating weights as needed.

This modular approach allows extending the network with various types of layers (convolutional, activation, pooling, etc.) by inheriting from BaseLayer.

Saving and Loading Models

The model's state can be saved and loaded using dill, making it easy to persist training progress and resume later.

Training and Evaluation

The model is trained on the MNIST dataset, learning to map pixel values to digit classes (0-9).
The notebook visualizes training progress, including loss and accuracy curves, to help users understand model performance.

Notable Features

Didactic Approach: The code is written to be educational, with clear separation between different neural network components.
Manual Implementation: Core neural network operations are implemented from scratch, providing insight into how deep learning works under the hood.
Visualization: Plots of training loss and accuracy are included for better interpretability.
Persistence: The ability to save and load models aids in experimentation and reproducibility.

Design and Code Structure

The main code is contained within a Jupyter Notebook (digit-recognition.ipynb), making it easy to follow and experiment interactively.
The project is organized with reusable classes for layers, facilitating extension and customization.
The repository includes checkpoints and sample outputs for reference.

Conclusion

cnn-from-scratch is an excellent resource for anyone wanting to learn how convolutional neural networks operate at a low level. By walking through this project, you can gain a deeper appreciation of the mathematics and programming required to build modern image classifiers.