DeepMultiNet: Neural Networks for Multi-Class Classification

AI Generated image: Neural Networks for Multi-Class Classification of digits

Featured Project: DeepMultiNet – Neural Networks for Multi-Class Classification

This project demonstrates how neural networks can be effectively trained and evaluated for multi-class classification using handwritten digit recognition as a real-world example.

🛠️ Tools

Python: Programming language for implementing the neural network.
NumPy & SciPy: For data preprocessing and numerical computations.
Matplotlib: To visualize digits, training accuracy, and misclassified outputs.
Jupyter Notebook: For interactive experimentation and display of predictions.

🎯 Goals

To build a basic neural network capable of handling multi-class classification tasks.
To explore model accuracy through predictions vs. actual label comparisons.
To understand challenges in classification by analyzing misclassifications.

🌟 Impacts

Strengthens understanding of neural network fundamentals for classification tasks.
Provides a visual and intuitive grasp of model performance and error analysis.
Lays the groundwork for expanding into deeper and more complex architectures.

Introduction
Loading and Preprocessing Data: Multi-class Classification & Dataset
Visualizing Handwritten Digits
Model Prediction and Training Accuracy Evaluation
Challenges in Classification: Analyzing the Misclassified Digits
Exploring Pre-Trained Model Weights for Handwritten Digit Recognition
Predictions vs. Actual Labels: Analyzing Neural Network Accuracy
Acknowledgment:

Introduction

DeepMultiNet is a cutting-edge project focused on leveraging neural networks for accurate and scalable multi-class classification. This project explores the use of deep learning architectures to classify data into multiple categories, employing techniques such as softmax activation, one-vs-all strategies, and optimization algorithms like gradient descent.

Key components of DeepMultiNet include:

Feature Engineering & Normalization: Preprocessing data to improve model performance.
Model Architecture Design: Implementing fully connected neural networks for classification.
Training & Optimization: Utilizing backpropagation, regularization, and adaptive learning rate methods.
Evaluation & Performance Metrics: Assessing accuracy, precision, recall, and F1-score to ensure robust classification.

DeepMultiNet aims to enhance classification accuracy across diverse datasets, making it a valuable model for real-world applications in image recognition, text categorization, and medical diagnostics.

Codes run:

        
            %matplotlib inline
            import numpy as np
            import matplotlib.pyplot as plt
            import pandas as pd
            import scipy.io #Used to load the OCTAVE *.mat files
            import scipy.misc #Used to show matrix as an image
            import matplotlib.cm as cm #Used to display images in a specific colormap
            import random #To pick random images to display
            from scipy.special import expit #Vectorized sigmoid function

            datafile = r'd:\mlprojects\data\ex3data1.mat'
            mat = scipy.io.loadmat( datafile )
            X, y = mat['X'], mat['y']
            #Insert a column of 1's to X as usual
            X = np.insert(X,0,1,axis=1)
            print("'y' shape: %s. Unique elements in y: %s"%(mat['y'].shape,np.unique(mat['y'])))
            print("'X' shape: %s. X[0] shape: %s"%(X.shape,X[0].shape))
            #X is 5000 images. Each image is a row. Each image has 400 pixels unrolled (20x20)
            #y is a classification for each image. 1-10, where "10" is the handwritten "0"

Output:

    
        'y' shape: (5000, 1). Unique elements in y: [ 1  2  3  4  5  6  7  8  9 10]
        'X' shape: (5000, 401). X[0] shape: (401,)

Loading and Preprocessing Data: Multi-class Classification & Dataset

The dataset used in this project consists of handwritten digit images stored in a .mat file, which is loaded using scipy.io.loadmat. Each image is represented as a 20 × 20 grayscale matrix, flattened into a 1×400 feature vector. The dataset consists of 5,000 samples, each labeled with a digit from 1 to 10 (where "10" represents the digit "0").

After loading the data:

Feature Matrix (X): A (5000, 400) matri where each row represents a flattened image of size 20×20 pixels.
Class Labels (y): A (5000, 1) vector containing labels from 1 to 10, indicating the digit in the image.

To facilitate model training, a bias term (1) is inserted as the first column in X, increasing its dimension to (5000, 401). This allows the neural network model to learn an intercept term for classification.

Interpreting the Output:

Unique elements in y: The dataset contains 10 unique classes (digits 1–9 and 10 as ‘0’).
Shape of y: (5000, 1), confirming that there are 5,000 labeled examples.
Shape of X: (5000, 401), where 401 features include 400 pixel values + 1 bias term.
Shape of X[0]: (401,), confirming that each row contains one image representation.

Visualizing Handwritten Digits

Run the following codes in Jupyter Notebook cells:

                
                    def getDatumImg(row):
                        """
                        Function that is handed a single np array with shape 1x400,
                        crates an image object from it, and returns it
                        """
                        width, height = 20, 20
                        square = row[1:].reshape(width,height)
                        return square.T

                        from PIL import Image  # Import Pillow

                    def displayData(indices_to_display=None):
                        """
                        Function that picks 100 random rows from X, creates a 20x20 image from each,
                        then stitches them together into a 10x10 grid of images, and shows it.
                        """
                        width, height = 20, 20
                        nrows, ncols = 10, 10
                        if not indices_to_display:
                            indices_to_display = random.sample(range(X.shape[0]), nrows * ncols)

                        big_picture = np.zeros((height * nrows, width * ncols))

                        irow, icol = 0, 0
                        for idx in indices_to_display:
                            if icol == ncols:
                                irow += 1
                                icol = 0
                            iimg = getDatumImg(X[idx])
                            big_picture[irow * height : irow * height + iimg.shape[0], icol * width : icol * width + iimg.shape[1]] = iimg
                            icol += 1

                        fig = plt.figure(figsize=(6, 6))
                        big_picture = (big_picture - big_picture.min()) / (big_picture.max() - big_picture.min()) * 255  # Normalize to 0-255
                        img = Image.fromarray(big_picture.astype(np.uint8))  # Convert to 8-bit image
                        plt.imshow(img, cmap=cm.Greys_r)
                        plt.axis("off")  # Hide axes for better visualization
                        plt.show()

                    displayData()

After running the codes displayed handwritten image.

Image 1: Hanwritten digits

To better understand the dataset, we visualize a subset of 100 randomly selected handwritten digits using the function displayData(). This function extracts images from the dataset, reconstructs them into 20 × 20 grayscale images, and arranges them into a 10 × 10 grid for display.

How the Code Works?

1. Extracting Individual Images:

The function getDatumImg(row) reshapes a flattened 1×400 vector into a 20 × 20 grayscale image and transposes it for correct orientation.

2. Generating a Composite Image:

The displayData() function selects 100 random images from the dataset.
It reconstructs each image and places them into a 10 × 10 grid.
The final image is normalized and displayed using Matplotlib and PIL (Pillow).

Interpreting the Output

The displayed grid contains 100 handwritten digits from the dataset.
Each digit represents a class label from 1 to 10 (where 10 corresponds to digit '0').
This visualization helps in verifying the dataset's quality and variety, ensuring that the neural network is trained on diverse handwritten styles.

This visualization is a crucial step in the DeepMultiNet project, as it provides insight into the dataset and helps in debugging potential preprocessing issues.

                
                    from scipy import optimize

                    #Hypothesis function and cost function for logistic regression
                    def h(mytheta,myX): #Logistic hypothesis function
                        return expit(np.dot(myX,mytheta))

                        #A more simply written cost function than last week, inspired by subokita:
                    def computeCost(mytheta,myX,myy,mylambda = 0.):
                        m = myX.shape[0] #5000
                        myh = h(mytheta,myX) #shape: (5000,1)
                        term1 = np.log( myh ).dot( -myy.T ) #shape: (5000,5000)
                        term2 = np.log( 1.0 - myh ).dot( 1 - myy.T ) #shape: (5000,5000)
                        left_hand = (term1 - term2) / m #shape: (5000,5000)
                        right_hand = mytheta.T.dot( mytheta ) * mylambda / (2*m) #shape: (1,1)
                        return left_hand + right_hand #shape: (5000,5000)

                    def costGradient(mytheta,myX,myy,mylambda = 0.):
                        m = myX.shape[0]
                        #Tranpose y here because it makes the units work out in dot products later
                        #(with the way I've written them, anyway)
                        beta = h(mytheta,myX)-myy.T #shape: (5000,5000)

                        #regularization skips the first element in theta
                        regterm = mytheta[1:]*(mylambda/m) #shape: (400,1)

                        grad = (1./m)*np.dot(myX.T,beta) #shape: (401, 5000)
                        #regularization skips the first element in theta
                        grad[1:] = grad[1:] + regterm
                        return grad #shape: (401, 5000)                          

                    def optimizeTheta(mytheta,myX,myy,mylambda=0.):
                        result = optimize.fmin_cg(computeCost, fprime=costGradient, x0=mytheta, \
                                                args=(myX, myy, mylambda), maxiter=200, disp=False,\
                                                full_output=True)
                                                # First we tried maxiter = 50. For better performance
                                                # we have increased epochs i.e. maxiter to 200.
                        return result[0], result[1]

                    def buildTheta():
                        """
                        Function that determines an optimized theta for each class
                        and returns a Theta function where each row corresponds
                        to the learned logistic regression params for one class
                        """
                        mylambda = 0.
                        initial_theta = np.zeros((X.shape[1],1)).reshape(-1)
                        Theta = np.zeros((10,X.shape[1]))
                        for i in range(10):
                            iclass = i if i else 10 #class "10" corresponds to handwritten zero
                            print("Optimizing for handwritten number %d..."%i)
                            logic_Y = np.array([1 if x == iclass else 0 for x in y])#.reshape((X.shape[0],1))
                            itheta, imincost = optimizeTheta(initial_theta,X,logic_Y,mylambda)
                            Theta[i,:] = itheta
                        print("Done!")
                        return Theta

                    Theta = buildTheta()

Output after running the above codes:

    
        Optimizing for handwritten number 0...
        Optimizing for handwritten number 1...
        Optimizing for handwritten number 2...
        Optimizing for handwritten number 3...
        Optimizing for handwritten number 4...
        Optimizing for handwritten number 5...
        Optimizing for handwritten number 6...
        Optimizing for handwritten number 7...
        Optimizing for handwritten number 8...
        Optimizing for handwritten number 9...
        Done!

Description about above codes:

In this part of the project, the goal was to implement multi-class classification for handwritten digits using logistic regression. The dataset was trained to recognize digits from 0 to 9, where each class represents a different digit. The main components of this implementation include the logistic hypothesis function, cost function, cost gradient, and optimization techniques to train the model effectively.

The hypothesis function (h) computes the predicted probability of the input belonging to the class of interest. The cost function (computeCost) is used to measure the error in the predictions, while the cost gradient (costGradient) helps in updating the model parameters to minimize this error using gradient descent. Regularization has been included in the cost function and gradient to avoid overfitting.

For the optimization process, we used Conjugate Gradient Descent (optimize.fmin_cg), which is more efficient for this problem compared to standard gradient descent, particularly with a large number of features.

The model was trained for 10 classes, with each class corresponding to a digit (0-9). We improved the performance of the model by increasing the number of iterations (epochs) from 50 to 200, which allowed the model to converge better.

Output Description:

The code iterates through all 10 digits (0-9) and optimizes the model parameters (theta) for each digit using logistic regression. For each digit class, a binary classification problem is solved where the model learns to predict whether the image belongs to that specific digit or not. The optimization runs successfully for all 10 classes, as indicated by the printed outputs:

This output confirms that the model has learned the parameters for all 10 classes, and the optimization process was completed successfully.

Model Prediction and Training Accuracy Evaluation

                
                    def predictOneVsAll(myTheta, myrow):
                        """
                        Function that computes a hypothesis for an individual image (row in X)
                        and returns the predicted integer corresponding to the handwritten image
                        """
                        classes = [10] + list(range(1,10))  # Convert range to a list
                        hypots  = [0] * len(classes)

                        # Compute a hypothesis for each possible outcome
                        for i in range(len(classes)):
                            hypots[i] = h(myTheta[i], myrow)

                        return classes[np.argmax(np.array(hypots))]  

                        # Purpose of these codes to observe the trsining set accuracy
                        n_correct, n_total = 0., 0.
                        incorrect_indices = []
                        for irow in range(X.shape[0]):
                        n_total += 1
                        if predictOneVsAll(Theta,X[irow]) == y[irow]: 
                            n_correct += 1
                        else: incorrect_indices.append(irow)
                        print("Training set accuracy: %0.1f%%"%(100*(n_correct/n_total)))

Outputs: after running the above codes

            
                Training set accuracy: 97.2%

Explanations:

We implemented a One-vs-All Logistic Regression Classifier for recognizing handwritten digits. After training the model using logistic regression with gradient descent optimization, we evaluated its performance on the training dataset.

Using our trained model, we classified each handwritten digit and compared it with the actual label. The model achieved an impressive 97.2% training accuracy, demonstrating its effectiveness in recognizing handwritten digits. However, misclassification still occurs in some cases, particularly with poorly written digits.

Challenges in Classification: Analyzing the Misclassified Digits

        
            #Let's try to find, the poorly written handwritten digits:
            displayData(incorrect_indices[:100])
            displayData(incorrect_indices[100:200])
            displayData(incorrect_indices[200:300])

Outputs: After running the above codes:

        
        Displayed 76 handwritten images in 3 image boxes each 10 X 10 rows and columns

Image 2: Poorly written 76 handwritten digits outputs.

The above 76 digits represent misclassified handwritten digits from our multi-class classification model using neural networks. These digits are the ones our model will predict incorrectly.

These digits are misclassified samples and these are reasons why why the model struggled with them. Some possible reasons could include:

Poor contrast or unclear handwriting in the dataset.
Insufficient training data for certain digits.
The model struggling with certain digit shapes (e.g., distinguishing between 3 and 8).
The need for further improvements like data augmentation or convolutional neural networks (CNNs).

These misclassified samples can be useful in the "Error Analysis".

Exploring Pre-Trained Model Weights for Handwritten Digit Recognition:

        
            #We have been provided with a set of network parameters (Θ(1),Θ(2)) 
            #already trained by us. These are stored in ex3weights.mat
            datafile = r'd:\mlprojects\data\ex3weights.mat'
            #datafile = 'data/ex3weights.mat'
            mat = scipy.io.loadmat( datafile )
            Theta1, Theta2 = mat['Theta1'], mat['Theta2']
            print("Theta1 has shape:",Theta1.shape)
            print("Theta2 has shape:",Theta2.shape)

            OUTPUTS:
            Theta1 has shape: (25, 401)
            Theta2 has shape: (10, 26)

The output represents the dimensions of the trained neural network parameters stored in ex3weights.mat. The shapes of Theta1 and Theta2 indicate the structure of a two-layer neural network used for multi-class classification.

Theta1 (25, 401)

This corresponds to the weight matrix between the input layer and the hidden layer.
There are 25 hidden units, and each unit has 401 parameters (400 input features + 1 bias term).

Theta2 (10, 26)

This corresponds to the weight matrix between the hidden layer and the output layer.
There are 10 output classes (digits 0-9), and each unit has 26 parameters (25 hidden units + 1 bias term).

We loaded pre-trained neural network parameters from a .mat file and verifies their dimensions. This confirms that the network is structured with:

400 input features (likely representing pixel values from a 20x20 image).
25 hidden units in a single hidden layer.
10 output units, each corresponding to a digit (0-9).

This step is crucial in implementing forward propagation to predict handwritten digits using a trained model.

Predictions vs. Actual Labels: Analyzing Neural Network Accuracy

Run the following codes:

        
            import matplotlib.pyplot as plt
            import numpy as np
            import random

            def propagateForward(row,Thetas):
                """
                Function that given a list of Thetas, propagates the
                Row of features forwards, assuming the features already
                include the bias unit in the input layer, and the 
                Thetas need the bias unit added to features between each layer
                """
                features = row
                for i in range(len(Thetas)):
                    Theta = Thetas[i]
                    z = Theta.dot(features)
                    a = expit(z)
                    if i == len(Thetas)-1:
                        return a
                    a = np.insert(a,0,1) #Add the bias unit
                    features = a

            def predictNN(row, Thetas):
                """
                Function that takes a row of features, propagates them through the
                NN, and returns the predicted integer that was hand written
                """
                classes = list(range(1,10)) + [10]  # Convert range to a list before concatenation
                output = propagateForward(row, Thetas)
                return classes[np.argmax(np.array(output))]

            # Pick some of the incorrectly classified images and display them
            for x in range(5):
                i = random.choice(incorrect_indices)
                fig = plt.figure(figsize=(3, 3))

                # Remove bias term (first column) before reshaping
                img_array = X[i][1:].reshape(20, 20).T  # Skip first element, then reshape

                plt.imshow(img_array, cmap="gray")  # Display correctly
                plt.axis("off")  # Hide axes

                predicted_val = int(predictNN(X[i], myThetas))  # Convert to integer
                actual_val = int(y[i].item())  # FIX: Extract scalar value correctly

                # Adjust label if necessary
                if predicted_val == 10:
                    predicted_val = 0  # Convert 10 to 0 if required

                # Display prediction and actual label
                fig.suptitle(f"Predicted: {predicted_val}, Actual: {actual_val}", 
                            fontsize=14, fontweight="bold")

                plt.show()

Output images and 77.33% corrected predictions:

After trying 6 times, it predicted 4 times correct digits and 2 times wrong digits as shown in the image below:

Predicted handwritten digits

Predicted results:

Handwritten digit "2", correctly predicted as "2"

Handwritten digit "8", correctly predicted as "8"

Handwritten digit "1", correctly predicted as "1"

Handwritten digit "3", wrongly predicted as "9"

Handwritten digit "2", wrongly predicted as "0"

Handwritten digit "9", correctly predicted as "9"

Correct orediction 77.33%

Explanations:

The code performs forward propagation through a neural network model to classify handwritten digits. The results show:

Generated images of handwritten digits from the dataset.
Predictions for each digit using the trained neural network.
The model achieves 77.33% accuracy, meaning 77.33% of the displayed digits were correctly classified, while 33.66% were misclassified.

Breakdown of the Code Execution:

1. propagateForward(row, Thetas)

This function performs forward propagation through multiple layers of the neural network.
Uses the sigmoid activation function (expit()).
Appends a bias unit to each layer before passing to the next.

2. predictNN(row, Thetas)

Calls propagateForward() to compute the network’s final output.
Determines the predicted digit by selecting the index of the highest probability (np.argmax()).
Adjusts label mapping (e.g., converting class 10 to digit 0).

3. Displaying Misclassified Digits (Cell 16)

Randomly selects incorrectly classified images from incorrect_indices.
Reshapes the flattened input (20×20 pixels) to display as an image.
Shows each image alongside predicted and actual labels for analysis.

Performance of our Model Classifications:

We would like to evaluate evaluates the model’s classification performance by displaying misclassified images and their predictions.

Key points are:

The model uses neural network forward propagation to classify handwritten digits.
Predictions are compared with actual labels, achieving 80% accuracy.
Misclassified images provide insights into potential areas for improvement.
Possible reasons for misclassification include similar-looking digits, poor image quality, and insufficient training data for certain digits.
Further improvements could involve data augmentation or using CNNs for feature extraction.

Project Conclusion:

In this project, we initially used 50 epochs to train the model and observed satisfactory results. To further improve the prediction accuracy, we increased the number of epochs to 200. Despite these enhancements, the model’s performance is still affected by the challenges posed by poor handwriting in the dataset, such as difficulties in distinguishing between similar digits like 7 and 2. This limitation suggests that further improvements are necessary. As part of our ongoing work in Deep Learning, we plan to enhance the model’s ability to handle such challenges using advanced techniques like data augmentation and Convolutional Neural Networks (CNNs). These techniques, which we will explore in our Deep Learning projects, are expected to significantly improve accuracy and robustness in digit classification.

Acknowledgments

I sincerely thank Prof. Andrew NG (DeepLearning.AI, Stanford University) for his inspiring courses that laid the foundation for this project.