Dive into the world of Convolutional Neural Networks (CNNs) by building your own image classifier using PyTorch. This blog explores the steps involved, referencing the tiesen243/cnn repository on GitHub.
CNNPyTorchImage ClassificationDeep Learning
Introduction
Convolutional Neural Networks (CNNs) are a powerful type of deep learning architecture that excel at image recognition and classification tasks. In this blog, we'll embark on a hands-on journey to build a CNN model using PyTorch, a popular deep learning framework.
We'll be referencing the tiesen243/cnn repository on GitHub as a guide. This repository provides a basic framework for building a CNN model, allowing us to understand the core concepts without getting bogged down in complex details.
Getting Started
Prerequisites
Before we dive into building our image classifier, make sure you have the following prerequisites installed on your system:
Anaconda or Miniconda
Nvidia GPU (optional but recommended for faster training) with CUDA support. You can check if your GPU has CUDA support by running nvidia-smi in the terminal.
Setting Up the Environment
To get started, we'll create a new conda environment with packages from the environment.yml file provided in the repository. Run the following commands in your terminal:
conda env create -f environment.ymlconda activate ml
This will set up a new conda environment named ml with all the necessary packages for our project.
Dataset Preparation
For this project, we'll be using the MNIST dataset, a popular dataset of handwritten digits from pytorch torchvision. The dataset consists of 60,000 training images and 10,000 test images.
We can load the MNIST dataset using PyTorch's torchvision library and create data loaders for training and testing as follows:
Now that we have our dataset ready, let's build our CNN model. We'll define a simple CNN architecture with two convolutional layers followed by two fully connected layers. Here's the code snippet for building the model:
In this __init__ method, we define the layers of our CNN model and move the model to the GPU if CUDA is available.
history is used to store the training history of the model.
layers is a list of layers in the model, initialized as an empty list.
And add method allows us to add layers to the model dynamically.
Finally, the summary method provides a summary of the model architecture. For example, we can call model.summary((1, 28, 28), 64) to get a summary of the model with input shape (1, 28, 28) and batch size 64.
def forward(self, x): for layer in self.layers: x = layer(x) return x
In this method, we pass the input x through each layer of the model sequentially to get the final output. This is the core functionality of our CNN model.
Define the configuration for the CNN model:
class CNN(nn.Module)
def config(self, loss: nn.Module, optimizer: optim.Optimizer): if not isinstance(loss, nn.Module): raise TypeError("loss must be a torch.nn.Module instance") if not isinstance(optimizer, optim.Optimizer): raise TypeError("optimizer must be a torch.optim.Optimizer instance") self.criterion = loss self.optimizer = optimizer
This method allows us to configure the loss function and optimizer for our model. We can set the loss function and optimizer using this method before training the model.
loss is the loss function used for training the model (e.g., CrossEntropyLoss, MSELoss). For classification tasks, we typically use nn.CrossEntropyLoss.
optimizer is the optimization algorithm used to update the model parameters during training (e.g., SGD, Adam). We can set the optimizer using this method. For example, model.config(nn.CrossEntropyLoss(), optim.Adam(model.parameters(), lr=0.001)).
Define the training loop:
class CNN(nn.Module)
def fit(self, train_loader: DataLoader, epochs: int = 10, verbose: bool = True): for epoch in range(epochs): self.train() # Split the train set into train and validation set train_set, val_set = random_split(train_loader.dataset, [50000, 10000]) train_set = DataLoader(train_set, batch_size=64, shuffle=True) val_set = DataLoader(val_set, batch_size=64, shuffle=True) loss_list = [] for images, labels in tqdm(train_set, desc=f"Epoch {epoch+1}/{epochs}"): images, labels = images.to(device), labels.to(device) self.optimizer.zero_grad() outputs = self(images) loss = self.criterion(outputs, labels) loss_list.append(loss.item()) loss.backward() self.optimizer.step() with torch.no_grad(): self.eval() total = 0 accuracy = 0 val_loss = [] for images, labels in val_set: images, labels = images.to(device), labels.to(device) outputs = self(images) total += labels.size(0) predicted = torch.argmax(outputs, dim=1) accuracy += (predicted == labels).sum().item() val_loss.append(self.criterion(outputs, labels).item()) # Calculate the mean loss and accuracy mean_val_loss = sum(val_loss) / len(val_loss) mean_val_acc = 100 * (accuracy / total) loss = sum(loss_list) / len(loss_list) self.history.append((loss, mean_val_loss, mean_val_acc)) if verbose: print( f"Loss: {loss:.4f}, Val Loss: {mean_val_loss:.4f}, Val Accuracy: {mean_val_acc:.2f}%" ) return self.history
This method trains the model on the training set for a specified number of epochs. It also calculates the validation loss and accuracy after each epoch.
In each epoch, we iterate over the training set, compute the loss, backpropagate the gradients, and update the model parameters using the optimizer. We also evaluate the model on the validation set to monitor its performance.
First, we set the model to training mode using self.train(). Then, we split the training set into a new training set and a validation set using random_split. We create new data loaders for the training and validation sets.
Next, we iterate over the training set using tqdm to display a progress bar. We move the images and labels to the GPU if available. We zero out the gradients using self.optimizer.zero_grad().
We pass the images through the model to get the outputs and calculate the loss using the specified loss function. We append the loss to the loss_list for tracking.
We backpropagate the loss and update the model parameters using self.optimizer.step().
After training, we evaluate the model on the validation set. We calculate the validation loss and accuracy by comparing the predicted labels with the ground truth labels.
Finally, we calculate the mean loss and accuracy for the epoch and store them in the self.history list. We print the loss, validation loss, and validation accuracy if verbose is set to True.
The method returns the training history, which contains the training loss, validation loss, and validation accuracy for each epoch.
Define the prediction method:
class CNN(nn.Module)
def predict(self, x): predicted = [] with torch.no_grad(): self.eval() for images, _ in x: images = images.to(device) outputs = self(images) predicted.append(torch.argmax(outputs, 1)) return predicted
This method allows us to make predictions using the trained model. We pass a batch of images through the model and return the predicted labels.
Putting It All Together:
class CNN(nn.Module): def __init__(self, layers: list[nn.Module]): super().__init__() self.history = [] self.layers = nn.ModuleList(layers) self.to(device) def add(self, layer: nn.Module): self.layers.append(layer) def forward(self, x: torch.Tensor): for layer in self.layers: x = layer(x) return x def summary(self, input_shape, batch_size): summary(self, input_shape, batch_size) def config(self, loss: nn.Module, optimizer: optim.Optimizer): if not isinstance(loss, nn.Module): raise TypeError("loss must be a torch.nn.Module instance") if not isinstance(optimizer, optim.Optimizer): raise TypeError("optimizer must be a torch.optim.Optimizer instance") self.criterion = loss self.optimizer = optimizer def fit(self, train_loader: DataLoader, epochs: int = 10, verbose: bool = True): for epoch in range(epochs): self.train() # Split the train set into train and validation set train_set, val_set = random_split(train_loader.dataset, [50000, 10000]) train_set = DataLoader(train_set, batch_size=64, shuffle=True) val_set = DataLoader(val_set, batch_size=64, shuffle=True) loss_list = [] for images, labels in tqdm(train_set, desc=f"Epoch {epoch+1}/{epochs}"): images, labels = images.to(device), labels.to(device) self.optimizer.zero_grad() outputs = self(images) loss = self.criterion(outputs, labels) loss_list.append(loss.item()) loss.backward() self.optimizer.step() with torch.no_grad(): self.eval() total = 0 accuracy = 0 val_loss = [] for images, labels in val_set: images, labels = images.to(device), labels.to(device) outputs = self(images) total += labels.size(0) predicted = torch.argmax(outputs, dim=1) accuracy += (predicted == labels).sum().item() val_loss.append(self.criterion(outputs, labels).item()) # Calculate the mean loss and accuracy mean_val_loss = sum(val_loss) / len(val_loss) mean_val_acc = 100 * (accuracy / total) loss = sum(loss_list) / len(loss_list) self.history.append((loss, mean_val_loss, mean_val_acc)) if verbose: print( f"Loss: {loss:.4f}, Val Loss: {mean_val_loss:.4f}, Val Accuracy: {mean_val_acc:.2f}%" ) return self.history def predict(self, x): predicted = [] with torch.no_grad(): self.eval() for images, _ in x: images = images.to(device) outputs = self(images) predicted.append(torch.argmax(outputs, 1)) return predicted
Training the Model
Now that we have defined our CNN model, we can train it on the MNIST dataset. Here's how you can train the model:
This code snippet defines the layers for the CNN model. We use two convolutional layers followed by ReLU activation functions, max-pooling, and fully connected layers. The model architecture is summarized using the summary method.
The first convolutional layer has 1 input channel, 32 output channels, a kernel size of 3, and padding of 1.
The second convolutional layer has 32 input channels, 64 output channels, a kernel size of 3, and padding of 1.
The max-pooling layer has a kernel size of 2 and a stride of 2.
The first fully connected layer has 64 _ 14 _ 14 input features and 128 output features.
The second fully connected layer has 128 input features and 10 output features (corresponding to the 10 classes in the MNIST dataset).
The softmax layer is used to compute the class probabilities.
Next, we configure the model with the loss function and optimizer:
We use the CrossEntropyLoss as the loss function and the Adam optimizer with a learning rate of 0.001.
Finally, we train the model on the MNIST dataset:
Train the model
history = model.fit(train_loader, epochs=10)
This code snippet trains the model on the training set for 10 epochs. The training loop prints the training loss, validation loss, and validation accuracy after each epoch.
Results:
Epoch 1/10: 100%|██████████| 782/782 [00:05<00:00, 137.83it/s]Loss: 1.6583, Val Loss: 1.4880, Val Accuracy: 97.40%Epoch 2/10: 100%|██████████| 782/782 [00:05<00:00, 140.73it/s]Loss: 1.4849, Val Loss: 1.4832, Val Accuracy: 97.85%Epoch 3/10: 100%|██████████| 782/782 [00:05<00:00, 132.55it/s]Loss: 1.4787, Val Loss: 1.4797, Val Accuracy: 98.15%Epoch 4/10: 100%|██████████| 782/782 [00:05<00:00, 142.57it/s]Loss: 1.4766, Val Loss: 1.4770, Val Accuracy: 98.45%Epoch 5/10: 100%|██████████| 782/782 [00:05<00:00, 135.03it/s]Loss: 1.4744, Val Loss: 1.4749, Val Accuracy: 98.62%Epoch 6/10: 100%|██████████| 782/782 [00:05<00:00, 142.32it/s]Loss: 1.4734, Val Loss: 1.4752, Val Accuracy: 98.59%Epoch 7/10: 100%|██████████| 782/782 [00:05<00:00, 148.41it/s]Loss: 1.4723, Val Loss: 1.4731, Val Accuracy: 98.84%Epoch 8/10: 100%|██████████| 782/782 [00:05<00:00, 139.67it/s]Loss: 1.4717, Val Loss: 1.4701, Val Accuracy: 99.11%Epoch 9/10: 100%|██████████| 782/782 [00:05<00:00, 134.47it/s]Loss: 1.4712, Val Loss: 1.4710, Val Accuracy: 99.03%Epoch 10/10: 100%|██████████| 782/782 [00:05<00:00, 138.27it/s]Loss: 1.4702, Val Loss: 1.4725, Val Accuracy: 98.90%
And yeah, you can see. This model just needs 2-3 epochs to reach 98% accuracy. I used 10 epochs to make the output more readable. You can try with fewer epochs to fasten the training process.
Then, you can plot the training history to visualize the training and validation loss and accuracy:
Plot the training history
plt.figure(figsize=(10, 5))plt.subplot(1, 2, 1)plt.plot([x[0] for x in history], label="Train Loss")plt.plot([x[1] for x in history], label="Val Loss")plt.xlabel("Epoch")plt.legend()plt.subplot(1, 2, 2)plt.plot([x[2] for x in history], label="Val Accuracy")plt.xlabel("Epoch")plt.legend()plt.show()
This code snippet plots the training and validation loss on the left and the validation accuracy on the right. You can visualize how the model performs during training.
Testing the Model
After training the model, we can evaluate its performance on the test set. Here's how you can test the model:
x = [images for images, _ in test_loader]y_true = [labels for _, labels in test_loader]y_pred = model.predict(test_loader)
This code snippet gets the images and labels from the test loader and makes predictions using the trained model. We compare the predicted labels with the ground truth labels to evaluate the model's performance.
Next, we calculate the accuracy of the model on the test set:
total = 0correct = 0device = torch.device("cuda" if torch.cuda.is_available() else "cpu")for true, pred in zip(y_true, y_pred): true, pred = true.to(device), pred.to(device) total += len(true) correct += (true == pred).sum().item()accuracy = 100 * (correct / total)print(f"Test Accuracy: {accuracy:.2f}%") # Test Accuracy: 98.58%
Finally, we plot a random selection of test images along with their true and predicted labels:
_, axs = plt.subplots(2, 5, figsize=(15, 6))for i in range(10): random_idx = torch.randint(0, 64, (1, 2)).squeeze() axs[i // 5, i % 5].imshow(x[random_idx[0]][random_idx[1]].squeeze(), cmap="gray") color = "green" if y_true[random_idx[0]][random_idx[1]] == y_pred[random_idx[0]][random_idx[1]] else "red" axs[i // 5, i % 5].set_title( f"True: {y_true[random_idx[0]][random_idx[1]]}, Pred: {y_pred[random_idx[0]][random_idx[1]]}", color=color, ) axs[i // 5, i % 5].axis("off")plt.show()
This code snippet plots a grid of 10 test images along with their true and predicted labels. The true labels are displayed in green, while the incorrect predictions are displayed in red.
Save and Load the Model
You can save the trained model to a file and load it later for inference. Here's how you can save and load the model:
torch.save(model.state_dict(), "mnist_cnn.pth")
This code snippet saves the model's state dictionary to a file named mnist_cnn.pth.
To load the model for inference, you can use the following code:
This code snippet loads the model architecture and the trained weights from the file mnist_cnn.pth. The model is then set to evaluation mode for inference.
Web UI for Image Classification
You can create a simple web UI to interact with the image classifier. Here's a basic example using Flask:
app.py
# Import librariesimport cv2import flaskimport flask_corsimport numpy as npfrom cnn import CNNfrom torch import nn, loadfrom torchvision import transforms# Load the model (in the previous section)model = ...# Initialize the Flask appapp = flask.Flask(__name__)flask_cors.CORS(app)def preprocess_image(image): preprocess = transforms.Compose( [ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)), ] ) image = cv2.imdecode(np.frombuffer(image.read(), np.uint8), cv2.IMREAD_GRAYSCALE) # Read the image image = cv2.normalize(image, None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX) # Normalize the image image = cv2.resize(image, (28, 28)) # Resize the image to 28x28 return preprocess(image).unsqueeze(0).to("cuda")@app.route("/predict", methods=["POST"])def predict(): image = flask.request.files.get("image") image = preprocess_image(image) prediction = model(image).argmax().item() return flask.jsonify({"prediction": prediction})@app.route("/")def index(): return flask.render_template("index.html")if __name__ == "__main__": app.run()
Then, create an HTML template for the web UI. I use Tailwind CSS for styling, so make sure to include the Tailwind CSS CDN in your HTML file.
Finally, create a JavaScript file to handle the interactions with the web UI and make predictions using the model. In this example, I use Vue.js for the frontend logic:
This JavaScript file sets up the canvas for drawing digits, captures the drawn image, sends it to the Flask server for prediction, and displays the predicted digit on the web UI. The predict method sends the drawn image to the Flask server for prediction, and the clear method clears the canvas and resets the prediction. The initDrawings method initializes the canvas for drawing. The canvas is set up to capture mouse events for drawing digits. The mounted lifecycle hook calls the initDrawings method when the Vue app is mounted. The Vue app is mounted on the #app element in the HTML template. You can run the Flask app by executing the following command in your terminal:
python src/app.py
This will start the Flask app, and you can access it in your browser at http://localhost:5000.
Conclusion
In this blog, we've explored the process of building a Convolutional Neural Network (CNN) image classifier using PyTorch. We've covered the steps involved in defining the CNN model, training it on the MNIST dataset, and evaluating its performance on the test set.
We've also demonstrated how to save and load the trained model for future use and create a simple web UI for interacting with the image classifier.
I hope this blog has provided you with a solid foundation for building your own image classifier using CNNs and PyTorch. Feel free to experiment with different model architectures, datasets, and hyperparameters to further enhance your understanding of CNNs and deep learning.
For more details, you can refer to the tiesen243/cnn repository on GitHub.