Anomaly detection & Recommender Systems

AI Generated image: Anomaly detection & Recommender Systems

📊 Anomaly Detection & Recommender Systems

This project explores how Gaussian-based models detect outliers in data and how collaborative filtering powers personalized recommender systems, all within a machine learning framework.

🔧 Tools

Python (NumPy, SciPy, Matplotlib, Seaborn)
Gaussian Distribution and Multivariate Gaussian models
Collaborative Filtering algorithms
Data visualization for anomaly and rating insights

🎯 Goals

To apply Gaussian models for identifying anomalies in low- and high-dimensional datasets.
To build a simple movie recommender system using collaborative filtering techniques.
To understand the visual and mathematical intuition behind anomaly detection and user-item rating patterns.

🌟 Impacts

Enhances skills in detecting irregularities in complex datasets.
Introduces core principles behind real-world recommender engines like those used by Netflix or Amazon.
Reinforces the connection between statistical modeling and user experience personalization.

Introduction
Anomaly Detection
Data Visualization
Gaussian Distribution
Visualizing Gaussian Probability Contours
Differences Between Single & Multivarite Gaussian
Anomaly Detection with Gaussian Contours
Anomaly Detection in High-Dimensional Data
Movie Ratings Heatmap
What is Collaborative Filtering?
Recommender System
Conclusion
References
Acknowledgements

Introduction

In today's data-driven world, businesses and industries rely on artificial intelligence (AI) to detect unusual patterns and provide personalized recommendations. This project explores two critical machine learning techniques: Anomaly Detection and Recommender Systems.

We implement Gaussian models for anomaly detection, identifying outliers in data such as fraudulent transactions or system failures. Additionally, we build a collaborative filtering-based recommender system, improving personalized suggestions for users in domains like e-commerce, streaming services, and content platforms.

Anomaly Detection

Anomaly detection is the process of identifying rare events or outliers in datasets that do not conform to expected patterns.

We use Gaussian Distribution to model normal data behavior.
Threshold-based methods help identify unusual patterns.
Applications: Fraud detection, network security, medical diagnosis.

Data Visualization Lab works

We begin by loading the dataset, which contains network performance data (Latency and Throughput). The dataset is stored in a .mat file and is loaded using SciPy.

Code for Data Loading

                
                    import scipy.io
                    datafile = r'd:\mlprojects\data\ex8data1.mat'
                    mat = scipy.io.loadmat(datafile)
                    X = mat['X']  # Training data
                    ycv = mat['yval']  # Cross-validation labels
                    Xcv = mat['Xval']  # Cross-validation data

Dataset File:

The dataset is stored in ex8data1.mat, a MATLAB/Octave file that contains sample network latency and throughput data. The file is loaded using scipy.io.loadmat(datafile).

Variables Loaded:

X: The training set, a 2D NumPy array where each row represents a network measurement (Latency & Throughput).
ycv: Cross-validation labels, used to evaluate model performance.
Xcv: The cross-validation set, used for validation.

Scatter Plot of the Dataset

The following function plots the dataset. Each point represents a network measurement with Latency (X-axis) and Throughput (Y-axis).

            
                def plotData(myX, newFig=False):
                    if newFig:
                        plt.figure(figsize=(8,6))
                    plt.plot(myX[:,0], myX[:,1], 'b+')
                    plt.xlabel('Latency [ms]', fontsize=16)
                    plt.ylabel('Throughput [mb/s]', fontsize=16)
                    plt.grid(True)
                    plt.show()
            
                plotData(X)

The output is shown below:

Image 1: Anomaly Detection scatter graph

Explanation:

This function plots the dataset as a scatter plot.
myX[:,0] represents the Latency (x-axis).
myX[:,1] represents the Throughput (y-axis).
Each point is marked as a blue "+" ('b+').
The function adds:
- X-axis Label: "Latency [ms]"
- Y-axis Label: "Throughput [mb/s]"

Implementing the Gaussian Distribution

Computing the Gaussian Probability Density

In this step, we define the function gaus() to compute the probability density for given data points based on Gaussian distribution parameters (mean and variance).

Code Explanation

Python codes:

        
            def gaus(myX, mymu, mysig2):
                """
                Function to compute the Gaussian probability density for given feature matrix `myX`,
                using mean (`mymu`) and variance (`mysig2`).
                If `mysig2` is a vector, it is converted into a diagonal covariance matrix.
                """
                m = myX.shape[0]  # Number of data points
                n = myX.shape[1]  # Number of features

                # Convert variance to a diagonal covariance matrix if needed
                if np.ndim(mysig2) == 1:
                    mysig2 = np.diag(mysig2)

                # Compute normalization factor
                norm = 1. / (np.power((2 * np.pi), n / 2) * np.sqrt(np.linalg.det(mysig2)))

                # Compute exponent term for Gaussian distribution
                myinv = np.linalg.inv(mysig2)  # Inverse of covariance matrix
                myexp = np.zeros((m, 1))  # Initialize exponent term

                for irow in range(m):  # Iterate over all data points
                    xrow = myX[irow]  # Extract single data point
                    myexp[irow] = np.exp(-0.5 * ((xrow - mymu).T).dot(myinv).dot(xrow - mymu))

                return norm * myexp  # Return computed Gaussian probability

Key Points

✅ Converts variance into a covariance matrix if needed
✅ Uses multivariate Gaussian formula to calculate probability density
✅ Computes the probability density function (PDF) of each point in myX

Estimating Parameters for Gaussian Distribution

Computing Mean (μ) and Variance (σ²)

To fit our data into a Gaussian distribution, we estimate the mean (μ) and variance (σ²).

Python codes:

        
            def getGaussianParams(myX, useMultivariate=True):
                """
                Given a dataset `myX` (m x n), this function computes the mean (μ) and variance (σ²) for each feature.

                It can return:
                - Univariate Gaussian (separate for each feature)
                - Multivariate Gaussian (full covariance matrix)
                """
                m = myX.shape[0]  # Number of examples
                mu = np.mean(myX, axis=0)  # Compute mean for each feature

                if not useMultivariate:  # Univariate Gaussian
                    sigma2 = np.sum(np.square(myX - mu), axis=0) / float(m)  # Variance for each feature
                else:  # Multivariate Gaussian
                    sigma2 = ((myX - mu).T.dot(myX - mu)) / float(m)  # Full covariance matrix

                return mu, sigma2

Key Points

✅ Computes mean vector for features
✅ Computes variance for either univariate or multivariate Gaussian
✅ Returns different covariance matrices depending on useMultivariate

Estimating Mean and Variance for Dataset

    
        Python:

        mu, sig2 = getGaussianParams(X, useMultivariate=True)

✅ Uses getGaussianParams() to estimate mean and covariance matrix of dataset X
✅ The result (mu, sig2) is later used to compute Gaussian probabilities

Visualizing Gaussian Probability Contours

Plotting Gaussian Contours

To visualize how the Gaussian distribution fits our dataset, we plot probability contours.

Python codes:

        
            def plotContours(mymu, mysigma2, newFig=False, useMultivariate=True):
                delta = 0.5  # Grid step size
                myx = np.arange(0, 30, delta)
                yy = np.arange(0, 30, delta)
                meshx, meshy = np.meshgrid(myx, myy)

                # Compute Gaussian probabilities for each point in the grid
                coord_list = [entry.ravel() for entry in (meshx, meshy)]
                points = np.vstack(coord_list).T
                myz = gaus(points, mymu, mysigma2)

                # Reshape probabilities to match grid
                myz = myz.reshape((myx.shape[0], myx.shape[0]))

                if newFig:  # Optionally create a new figure
                    plt.figure(figsize=(6, 4))

                # Define contour levels
                cont_levels = [10**exp for exp in range(-20, 0, 3)]

                # Plot contour lines
                mycont = plt.contour(meshx, meshy, myz, levels=cont_levels)

                plt.title('Gaussian Contours', fontsize=16)

Key Points

✅ Creates a grid of values covering the feature space
✅ Computes Gaussian probability for each grid point
✅ Plots contour lines for probability density

Combining Scatter Plot and Contour Plot

Final Visualization

Now, we combine scatter and contour plots to visualize our dataset.

Python codes:

        
            # Create a figure and axis first
            fig, ax = plt.subplots(figsize=(8,6))

            # Scatter plot
            ax.plot(X[:,0], X[:,1], 'b+', label="Data points")
            ax.set_xlabel('Latency [ms]', fontsize=16)
            ax.set_ylabel('Throughput [mb/s]', fontsize=16)

            # Gaussian contours for single-variable Gaussian
            useMV = False
            plotContours(*getGaussianParams(X, useMV), newFig=False, useMultivariate=useMV)

            plt.title('Scatter Plot with Gaussian Contours (Single-variable)')
            plt.legend()
            plt.grid(True)
            plt.show()  # Show first combined plot

            # Create another figure for multivariate Gaussian
            fig, ax = plt.subplots(figsize=(8,6))

            # Scatter plot
            ax.plot(X[:,0], X[:,1], 'b+', label="Data points")
            ax.set_xlabel('Latency [ms]', fontsize=16)
            ax.set_ylabel('Throughput [mb/s]', fontsize=16)

            # Gaussian contours for multivariate Gaussian
            useMV = True
            plotContours(*getGaussianParams(X, useMV), newFig=False, useMultivariate=useMV)

            plt.title('Scatter Plot with Gaussian Contours (Multivariate)')
            plt.legend()
            plt.grid(True)
            plt.show()  # Show second combined plot

After running the above codes, output is a OUTPUT is a Gaussian Contour graph with single variable

Gaussian Contour Plot Analysis

Single-variable Gaussian Contour Plot

Image 2: Gaussian Contour graph single variable

The first image represents a scatter plot with Gaussian contours generated assuming that the features (Latency and Throughput) are independent. The probability distribution for each feature is computed separately, ignoring any potential correlation between them. As a result, the contours appear as axis-aligned concentric ellipses.

Multivariate Gaussian Contour Plot

Image 3: Gaussian Contour graph multivariate

The second image represents a scatter plot with Gaussian contours generated using a multivariate Gaussian distribution. Here, the relationship between the features is taken into account, meaning the covariance between Latency and Throughput is considered. This leads to a more realistic representation of the data distribution, potentially causing the contours to tilt or elongate based on the actual covariance.

Key Differences and Significance between Single & Multivarite Gaussian

Single-variable Gaussian:

Covariance: Not considered (assumes independence)
Contour Shape: Circular/Elliptical (axis-aligned)
Accuracy: Less precise for correlated data
Application: Simpler but less realistic

Multivariate Gaussian:

Covariance: Considered (captures feature correlation)
Contour Shape: Can be tilted based on covariance
Accuracy: More accurate representation
Application: More realistic for real-world scenarios

Conclusion

While the single-variable Gaussian assumption is useful for simpler models, the multivariate Gaussian approach provides a more accurate representation of real-world data distributions, especially when features are correlated.

Anomaly Detection with Gaussian Contours

Compute F1 Score

        
            def computeF1(predVec, trueVec):
                """ 
                F1 = 2 * (P*R)/(P+R)
                where P is precision, R is recall
                Precision = "of all predicted y=1, what fraction had true y=1"
                Recall = "of all true y=1, what fraction predicted y=1?
                Note predictionVec and trueLabelVec should be boolean vectors.
                """
                #print predVec.shape
                #print trueVec.shape
                #assert predVec.shape == trueVec.shape

                P, R = 0., 0.
                if float(np.sum(predVec)):
                    P = np.sum([int(trueVec[x]) for x in range(predVec.shape[0]) \
                                if predVec[x]]) / float(np.sum(predVec))
                if float(np.sum(trueVec)):
                    R = np.sum([int(predVec[x]) for x in range(trueVec.shape[0]) \
                                if trueVec[x]]) / float(np.sum(trueVec))
                
                return 2*P*R/(P+R) if (P+R) else 0

The F1 score balances precision and recall for anomaly detection. It is calculated as:

\[ F1 = 2 \times \frac{P \times R}{P + R} \]

Where:

Precision (P): The fraction of detected anomalies that are actual anomalies.
Recall (R): The fraction of actual anomalies that were detected.

Select Best Threshold

        
            def selectThreshold(myycv, mypCVs):
                """
                Function to select the best epsilon value from the CV set
                by looping over possible epsilon values and computing the F1
                score for each.
                """
                # Make a list of possible epsilon values
                nsteps = 1000
                epses = np.linspace(np.min(mypCVs),np.max(mypCVs),nsteps)
                
                # Compute the F1 score for each epsilon value, and store the best 
                # F1 score (and corresponding best epsilon)
                bestF1, bestEps = 0, 0
                trueVec = (myycv == 1).flatten()
                for eps in epses:
                    predVec = mypCVs < eps
                    thisF1 = computeF1(predVec, trueVec)
                    if thisF1 > bestF1:
                        bestF1 = thisF1
                        bestEps = eps
                        
                print("Best F1 is %f, best eps is %0.4g."%(bestF1,bestEps))
                return bestF1, bestEps

This function loops through 1000 possible thresholds (epsilon values) and selects the best one by maximizing the F1 score.

The output of this process is:

Best F1 Score: 0.875
Best Epsilon: 9.075e-05

Compute Probabilities and Threshold

        
            pCVs = gaus(Xcv, mu, sig2)
            bestF1, bestEps = selectThreshold(ycv,pCVs)

The selected threshold is applied to these probabilities to detect anomalies.

Plot Anomalies

        
            def plotAnomalies(myX, mybestEps, newFig=False, useMultivariate=True):
                ps = gaus(myX, *getGaussianParams(myX, useMultivariate))
                anoms = np.array([myX[x] for x in range(myX.shape[0]) if ps[x] < mybestEps])

                if newFig:
                    fig, ax = plt.subplots(figsize=(6, 4))
                else:
                    ax = plt.gca()  # Use existing axes

                ax.scatter(anoms[:,0], anoms[:,1], s=80, facecolors='none', edgecolors='r')

The function marks data points as anomalies if their probability is below the selected threshold. These anomalies are displayed as red circles on the graph.

Plot Full Anomaly Detection Graph

        
            plt.figure(figsize=(6, 4))  # Create a single figure
            plotData(X, newFig=False)   # Ensure newFig=False
            plotContours(mu, sig2, newFig=False, useMultivariate=True) 
            plotAnomalies(X, bestEps, newFig=False, useMultivariate=True)  
            plt.show()  # Ensure everything is drawn on the same figure

Image 4: Gaussian Contour anamoly detection graph

The final plot combines:

The original dataset (blue points).
Gaussian contours (purple lines) representing probability density.
Detected anomalies (red circles).

In this analysis, we used a Gaussian-based anomaly detection model to identify unusual data points in a dataset. The process involved:

Anomaly Detection in High-Dimensional Data

High-Dimensional Dataset

The dataset used in this anomaly detection example is loaded from an external MATLAB file. The feature set Xpart2 consists of 1000 examples with 11 features each.

Python Codes:

        
            datafile = r'd:\mlprojects\data\ex8data2.mat'
            mat = scipy.io.loadmat(datafile)
            Xpart2 = mat['X']
            ycvpart2 = mat['yval']
            Xcvpart2 = mat['Xval']
            print('Xpart2 shape is ', Xpart2.shape)

Output:
Xpart2 shape is (1000, 11)

The dataset is loaded using scipy.io.loadmat, and the training data (Xpart2), cross-validation data (Xcvpart2), and labels (ycvpart2) are extracted.

Next, the Gaussian parameters (mean and variance) for anomaly detection are estimated, and probability distributions are computed.

Python Codes:

        
            mu, sig2 = getGaussianParams(Xpart2, useMultivariate=False)
            ps = gaus(Xpart2, mu, sig2)
            psCV = gaus(Xcvpart2, mu, sig2)

            # Using Gaussian parameters from the full training set,
            # figure out the p-value for each point in the CV set
            pCVs = gaus(Xcvpart2, mu, sig2)

            # Select the best threshold for anomaly detection
            bestF1, bestEps = selectThreshold(ycvpart2, pCVs)
            anoms = [Xpart2[x] for x in range(Xpart2.shape[0]) if ps[x] < bestEps]
            print('# of anomalies found: ', len(anoms))

Explanation

getGaussianParams(Xpart2, useMultivariate=False): Computes mean and variance for each feature assuming independent Gaussian distribution.
gaus(Xpart2, mu, sig2): Computes probability densities for each data point in the training and validation sets.
selectThreshold(ycvpart2, pCVs): Determines the best threshold bestEps by optimizing the F1 score.
Identifies anomalies by checking if their probability is below bestEps.

Outputs:

        Best F1 is 0.615385, best eps is 1.379e-18.
        # of anomalies found: 117

The best F1 score of 0.615 is achieved with a threshold of 1.379e-18, resulting in 117 detected anomalies.

Recommender Systems

Recommender systems enhance user experience by predicting preferences based on past interactions. Recommender systems are widely used in various applications such as movie recommendations, e-commerce, and music streaming. These systems work by analyzing user preferences and suggesting relevant content based on past interactions. They are typically built using collaborative filtering or content-based filtering techniques.

Movie Ratings Dataset

The dataset used in this recommender system consists of movie ratings provided by users:

Y (1682x943 matrix) - Contains ratings (1 to 5) given by users to movies.
R (1682x943 matrix) - A binary matrix indicating whether a user has rated a movie (1 if rated, 0 otherwise).

For example, the average rating for the movie Toy Story is computed as follows:

    print('Average rating for movie 1 (Toy Story): %0.2f' % \
    np.mean([ Y[0][x] for x in range(Y.shape[1]) if R[0][x] ]))

Movie Ratings Heatmap

The following code was used to generate the heatmap:

            import matplotlib.pyplot as plt

            fig = plt.figure(figsize=(6,6*(1682./943.)))
            plt.imshow(Y, aspect='auto', cmap='viridis')
            plt.colorbar()
            plt.ylabel('Movies (%d)' % nm, fontsize=20)
            plt.xlabel('Users (%d)' % nu, fontsize=20)
            plt.title('Movie Ratings Heatmap')
            plt.show()

After running the above codes, the heat map is drawn as below:

Image 5: Movie recommender rating heat map

To visualize the ratings matrix, a heatmap is generated where:

Rows represent movies (1682 total)
Columns represent users (943 total)
Brighter colors indicate higher ratings
Darker colors indicate lower ratings or no rating

This visualization helps in understanding rating patterns, spotting popular movies, and detecting sparsity in the dataset.

What is Collaborative Filtering?

Collaborative Filtering Learning Algorithm:

Collaborative filtering is a machine learning technique used to build recommendation systems by leveraging user-item interactions. It predicts a user's preference for an item based on past behaviors of similar users. This is widely applied in platforms like Netflix and Amazon.

Loading Movie Parameters

We start by loading the movie rating data and reducing the dataset for faster computation.

        
            import scipy.io

            datafile = r'd:\mlprojects\data\ex8_movieParams.mat'
            mat = scipy.io.loadmat(datafile)
            X = mat['X']
            Theta = mat['Theta']
            nu = int(mat['num_users'])
            nm = int(mat['num_movies'])
            nf = int(mat['num_features'])

            # Reducing dataset size for faster execution
            nu = 4; nm = 5; nf = 3
            X = X[:nm,:nf]
            Theta = Theta[:nu,:nf]
            Y = Y[:nm,:nu]
            R = R[:nm,:nu]

Explanation: The dataset is loaded, and the matrices for movie features (X) and user preferences (Theta) are extracted. The dataset is then reduced to a smaller size for efficiency.

Flattening and Reshaping Parameters

To use optimization algorithms, we need to flatten and reshape our matrices.

        
            import numpy as np

            def flattenParams(myX, myTheta):
                return np.concatenate((myX.flatten(), myTheta.flatten()))

            def reshapeParams(flattened_XandTheta, mynm, mynu, mynf):
                reX = flattened_XandTheta[:int(mynm*mynf)].reshape((mynm,mynf))
                reTheta = flattened_XandTheta[int(mynm*mynf):].reshape((mynu,mynf))
                return reX, reTheta

Explanation:
The flattenParams function combines the X and Theta matrices into a single vector for optimization, while reshapeParams reconstructs them back to their original shapes.

Collaborative Filtering Cost Function

        
            def cofiCostFunc(myparams, myY, myR, mynu, mynm, mynf, mylambda=0.0):
                myX, myTheta = reshapeParams(myparams, mynm, mynu, mynf)
                term1 = np.multiply(myX.dot(myTheta.T), myR)
                cost = 0.5 * np.sum(np.square(term1 - myY))

                # Regularization
                cost += (mylambda / 2.0) * np.sum(np.square(myTheta))
                cost += (mylambda / 2.0) * np.sum(np.square(myX))
                return cost

Explanation:
This function calculates the cost for collaborative filtering by comparing predicted ratings (X * Theta.T) to actual ratings in Y. The cost function also includes regularization terms to prevent overfitting.

Evaluating the Cost Function

        
            print('Cost with nu = 4, nm = 5, nf = 3 is %0.2f.' % 
            cofiCostFunc(flattenParams(X,Theta), Y, R, nu, nm, nf))

            print('Cost with lambda = 1.5 is %0.2f.' % 
            cofiCostFunc(flattenParams(X,Theta), Y, R, nu, nm, nf, mylambda=1.5))

Expected Output:

        Cost with nu = 4, nm = 5, nf = 3 is 22.22.
        Cost with lambda = 1.5 is 31.34.

Explanation:
The cost is computed with and without regularization. A higher lambda value increases the cost due to added penalties on complexity.

Collaborative Filtering Gradient

        
            def cofiGrad(myparams, myY, myR, mynu, mynm, mynf, mylambda=0.0):
                myX, myTheta = reshapeParams(myparams, mynm, mynu, mynf)
                term1 = np.multiply(myX.dot(myTheta.T), myR) - myY
                Xgrad = term1.dot(myTheta) + mylambda * myX
                Thetagrad = term1.T.dot(myX) + mylambda * myTheta
                return flattenParams(Xgrad, Thetagrad)

Explanation: This function calculates the gradients for X and Theta, which are necessary for optimizing our cost function. Regularization terms are added to prevent overfitting.

Collaborative Filtering Learning Algorithm

Collaborative filtering is a technique used in recommendation systems that predicts user preferences based on past interactions. Instead of relying on explicit attributes of items, it uses matrix factorization to analyze user-item interactions.

The primary idea is to factorize the user-item rating matrix into two lower-dimensional matrices:

X (Movie Features Matrix): Represents latent features of movies.
Theta (User Preference Matrix): Represents latent features of users.

The algorithm optimizes these matrices to minimize the difference between predicted and actual ratings using gradient descent. Regularization is applied to prevent overfitting.

Gradient Checking

Verifies computed gradients using numerical approximation.

        
            def cofiGrad(myparams, myY, myR, mynu, mynm, mynf, mylambda = 0.):
                myX, myTheta = reshapeParams(myparams, mynm, mynu, mynf)
                term1 = myX.dot(myTheta.T)
                term1 = np.multiply(term1,myR)
                term1 -= myY
                Xgrad = term1.dot(myTheta)
                Thetagrad = term1.T.dot(myX)
                Xgrad += mylambda * myX
                Thetagrad += mylambda * myTheta
                
                return flattenParams(Xgrad, Thetagrad)

            print("Checking gradient with lambda = 0...")
            checkGradient(flattenParams(X,Theta),Y,R,nu,nm,nf)
            print("\nChecking gradient with lambda = 1.5...")
            checkGradient(flattenParams(X,Theta),Y,R,nu,nm,nf,mylambda = 1.5)

Learning movie recommendations

Loading Movie Data and User Ratings

        
            movies = []

            with open('d:\mlprojects\data\movie_ids.txt') as f:       # Movie list
                for line in f:
                    movies.append(' '.join(line.strip('\n').split(' ')[1:]))

            datafile = r'd:\mlprojects\data\ex8_movies.mat'
            mat = scipy.io.loadmat( datafile )
            Y = mat['Y']
            R = mat['R']
            nf = 10   # Use 10 features

Reads movie list from movie_ids.txt and loads the dataset ex8_movies.mat containing:

Y: Movie ratings by users
R: Binary mask (1 if rated, 0 if not)

Adding User Ratings

        
            # Add ratings to the Y matrix, and the relevant row to the R matrix
            myR_row = my_ratings > 0
            Y = np.hstack((Y,my_ratings))
            R = np.hstack((R,myR_row))
            nm, nu = Y.shape

Appends custom ratings to Y and updates the dataset.

Normalizing Ratings

        
            def normalizeRatings(myY, myR):
                # The mean is only counting movies that were rated
                Ymean = np.sum(myY,axis=1)/np.sum(myR,axis=1)
                Ymean = Ymean.reshape((Ymean.shape[0],1))
                return myY-Ymean, Ymean    

            Ynorm, Ymean = normalizeRatings(Y,R)

Ensures that movies with no ratings are properly adjusted by subtracting the mean rating.

Train & extract trained parameters & predict the Model

        
            # TRAIN
            X = np.random.rand(nm,nf)
            Theta = np.random.rand(nu,nf)
            myflat = flattenParams(X, Theta)
            
            # Regularization parameter of 10 is used (as used in the homework assignment)
            mylambda = 10.
            
            # Training the actual model with fmin_cg
            result = scipy.optimize.fmin_cg(cofiCostFunc, x0=myflat, fprime=cofiGrad, \
                                        args=(Y,R,nu,nm,nf,mylambda), \
                                            maxiter=50,disp=True,full_output=True)
            
            # EXTRACT
            resX, resTheta = reshapeParams(result[0], nm, nu, nf)
            
            #PREDICT
            prediction_matrix = resX.dot(resTheta.T)
            my_predictions = prediction_matrix[:,-1] + Ymean.flatten()

Uses gradient descent with fmin_cg to optimize X and Theta matrices.
Extract trained parameters, optimized parameters are reshaped into matrices for prediction.
Computes predicted ratings by multiplying X and Theta.T.

Sorting and Displaying Top Recommendations

        
            # Sort my predictions from highest to lowest
            pred_idxs_sorted = np.argsort(my_predictions)
            pred_idxs_sorted[:] = pred_idxs_sorted[::-1]
            
            print("Top recommendations for you:")
            for i in range(10):
                print('Predicting rating %0.1f for movie %s.' % \
                    (my_predictions[pred_idxs_sorted[i]],movies[pred_idxs_sorted[i]]))

            print("\nOriginal ratings provided:")
            for i in range(len(my_ratings)):
                if my_ratings[i] > 0:
                    print('Rated %d for movie %s.' % (my_ratings[i],movies[i]))

Outputs: Sorts and displays the highest-rated movies for recommendation as shown below:

        
            Top recommendations:
            Predicting rating 8.3 for movie Shawshank Redemption, The (1994).
            Predicting rating 8.3 for movie Star Wars (1977).
            Predicting rating 8.2 for movie Schindler's List (1993).
            Predicting rating 8.2 for movie Titanic (1997).
            Predicting rating 8.2 for movie Raiders of the Lost Ark (1981).
            Predicting rating 8.2 for movie Wrong Trousers, The (1993).
            Predicting rating 8.1 for movie Close Shave, A (1995).
            Predicting rating 8.1 for movie Usual Suspects, The (1995).
            Predicting rating 8.0 for movie Casablanca (1942).
            Predicting rating 8.0 for movie Good Will Hunting (1997).

            Original ratings provided:
            Rated 4 for movie Toy Story (1995).
            Rated 3 for movie Twelve Monkeys (1995).
            Rated 5 for movie Usual Suspects, The (1995).
            Rated 4 for movie Outbreak (1995).
            Rated 5 for movie Shawshank Redemption, The (1994).
            Rated 3 for movie While You Were Sleeping (1995).
            Rated 5 for movie Forrest Gump (1994).
            Rated 2 for movie Silence of the Lambs, The (1991).
            Rated 4 for movie Alien (1979).
            Rated 5 for movie Die Hard 2 (1990).
            Rated 5 for movie Sphere (1998).

Conclusion

In this project, we explored two powerful machine learning techniques: Anomaly Detection and Collaborative Filtering for Recommender Systems. Through Anomaly Detection, we identified rare or suspicious patterns in data using Gaussian distribution-based probability estimations, making it valuable for fraud detection, network security, and quality control. On the other hand, Collaborative Filtering allowed us to build personalized recommendation systems by leveraging user preferences and item similarities, demonstrating its effectiveness in applications like movie recommendations, e-commerce, and content curation.

These concepts form a strong foundation in machine learning and data science. While our implementation covered essential techniques, further enhancements can be achieved through deep learning-based anomaly detection and hybrid recommendation systems. As we move forward, integrating these advancements can significantly improve accuracy, scalability, and real-world applicability.

This concludes our study on Anomaly Detection and Recommender Systems. Stay tuned for future projects exploring more advanced AI and ML concepts! 🚀

References

1. "Anomaly Detection in Machine Learning" - Research Papers

2. "Recommender Systems: Algorithms and Applications" - AI Journal

Acknowledgments

I sincerely thank Prof. Andrew NG (DeepLearning.AI, Stanford University) for his inspiring courses that laid the foundation for this project.

Anomaly detection & Recommender Systems

📊 Anomaly Detection & Recommender Systems

🔧 Tools

🎯 Goals

🌟 Impacts

Contents

Introduction

Anomaly Detection

Data Visualization Lab works

Code for Data Loading

Implementing the Gaussian Distribution

Computing Mean (μ) and Variance (σ²)

Estimating Mean and Variance for Dataset

Visualizing Gaussian Probability Contours

Plotting Gaussian Contours

Final Visualization

Gaussian Contour Plot Analysis

Key Differences and Significance between Single & Multivarite Gaussian

Anomaly Detection with Gaussian Contours

Plot Full Anomaly Detection Graph

Anomaly Detection in High-Dimensional Data

Outputs:

Recommender Systems

Movie Ratings Dataset

Movie Ratings Heatmap

What is Collaborative Filtering?

Collaborative Filtering Learning Algorithm:

Loading Movie Parameters

Flattening and Reshaping Parameters

Collaborative Filtering Cost Function

Evaluating the Cost Function

Collaborative Filtering Gradient

Collaborative Filtering Learning Algorithm

Learning movie recommendations

Conclusion

References

Acknowledgments

Contact Me