Decoding Human Emotions: The Power of AI in Facial Emotion Detection

This project demonstrates custom facial emotion detection using a YOLO model trained on image datasets collected from various sources. Leveraging TensorFlow, OpenCV, and Python, the model accurately classifies emotions across multiple categories.

Featured Project

Tools: TensorFlow, OpenCV, Python
Goal: Detect facial emotions in static images from diverse datasets
Impact: Achieved 75%+ accuracy across 7 emotion categories like happy, surprise, angry, fear, sad, disgust and neutral.

Introduction
Understanding Facial Emotion Recognition
Model Selection and Training
Preparing Our Custom Dataset from Roboflow
Upload Faceial_emotion_detection folder to Google Drive
Google Colab Configuration
Why Use a T4 GPU on Google Colab Instead of a CPU?
YOLO Model Training Practical Implementation
YOLO Model Custom Detection & Tests
Emotion Detection Table
Recommendations
Conclusion
References

1. Introduction

Training YOLOv12 for Forest Fire Detection

AI Generated image: The Power of AI in Facial Emotion Detection

Introduction to Facial Emotion Detection

Facial emotion detection is an innovative application of artificial intelligence and computer vision that interprets human emotions by analyzing facial expressions. It involves using deep learning algorithms to recognize and classify emotions such as Angry, Disgust, Fear, Happy, Neutral, Sad, and Surprise from facial images or videos.

This technology has revolutionized sectors such as customer service, healthcare, marketing, and security by providing valuable insights into human behavior. By identifying emotional states in real-time, businesses can enhance user experiences, offer personalized interactions, and monitor well-being.

Advances in convolutional neural networks (CNNs) and deep learning frameworks like TensorFlow and PyTorch have significantly improved the accuracy and reliability of facial emotion detection systems. As these technologies continue to evolve, they contribute to the growing capabilities of AI in understanding and responding to human emotions.

2. Understanding Facial Emotion Recognition

2.1 The Science Behind Emotion Detection

Facial emotion detection is a remarkable application of artificial intelligence (AI) that leverages computer vision and deep learning algorithms to interpret human emotions from facial expressions. Understanding how these systems work requires exploring both the psychological foundation of facial expressions and the technological advancements in AI.

Understanding Facial Expressions

Facial expressions are a universal language of human emotions, conveyed through intricate movements of facial muscles. According to the research by psychologist Paul Ekman, there are seven universally recognized facial expressions:

Angry
Disgust
Fear
Happy
Neutral
Sad
Surprise

These expressions provide critical cues that AI models analyze to determine emotional states.

How AI Interprets Facial Expressions

AI models interpret facial expressions through a series of systematic processes. The key steps include:

1. Facial Detection

The system uses algorithms like YOLOv12 (You Only Look Once) to detect and extract faces from images or video frames.

2. Feature Extraction

Once a face is detected, AI algorithms analyze key facial landmarks like the eyes, eyebrows, nose, and mouth using techniques such as:

Convolutional Neural Networks (CNNs)
Facial Landmark Detection
Edge Detection

3. Emotion Classification

With the extracted facial features, the AI model applies a pre-trained neural network to classify the expression into one of the seven categories. Models like YOLOv12 and EfficientNet are often fine-tuned for this purpose.

Example: Using YOLOv12 for Emotion Detection

Here is a simple code example demonstrating how to load a YOLOv12 model and perform facial emotion detection:

        
            from ultralytics import YOLO

            # Load the pre-trained YOLO model
            model = YOLO('yolov12.pt')

            # Perform detection on an image
            results = model('path_to_image.jpg')

            # Display results
            results.show()

By understanding both the science and technology behind emotion detection, we can better appreciate how AI is transforming human-computer interactions and advancing applications in fields like mental health, customer service, and security monitoring.

2.2 Understanding the Importance of Emotion Detection

Facial emotion detection is revolutionizing how humans interact with machines. By analyzing facial expressions, AI models can interpret human emotions, enhancing user experience and creating personalized interactions. This technology is making significant strides in various fields, offering improved services and smarter systems.

2.2.1 Applications in Various Industries

1. Healthcare

Emotion detection aids in mental health monitoring by recognizing signs of depression, anxiety, or stress. Therapists can use this technology to gather insights into a patient’s emotional well-being during virtual sessions.

2. Customer Service

Businesses use AI-driven emotion analysis to assess customer satisfaction. Call centers and chatbots equipped with emotion detection can respond empathetically, providing more effective support.

3. Education

In e-learning environments, emotion detection helps educators gauge student engagement and understanding. By identifying confusion or boredom, educators can adjust lessons for better comprehension.

4. Marketing and Advertising

Brands analyze customer reactions to advertisements using emotion detection. Understanding emotional responses enables marketers to create campaigns that resonate better with their target audience.

5. Security and Surveillance

Emotion detection enhances security systems by identifying suspicious behavior through facial expressions. It’s widely used in airports and other high-security areas to monitor crowd behavior.

Conclusion

Emotion detection is an essential component of human-computer interaction, offering invaluable insights into human emotions. Its applications across healthcare, education, customer service, and security demonstrate its potential to transform industries. By incorporating emotion detection, companies can enhance user experiences and build smarter, more empathetic AI systems.

2.3 Facial Landmarks and Expression Mapping

Facial landmarks are key points on a face that represent important anatomical locations, such as the eyes, nose, mouth, and jawline. These landmarks are used by AI systems to map facial expressions and identify emotions.

How AI Detects Facial Landmarks

AI models utilize deep learning techniques, particularly convolutional neural networks (CNNs), to detect and localize facial landmarks. This involves:

Detecting the face using bounding boxes.
Identifying 68 to 468 key landmarks using specialized landmark detection models.
Analyzing the geometric relationships between these landmarks.

Expression Mapping for Emotion Recognition

Once the facial landmarks are detected, AI systems compare the patterns to labeled datasets of facial expressions. Key components include:

Eye Movement and Shape: For detecting emotions like happiness, anger, or sadness.
Mouth Curvature: A smile or frown is a critical indicator.
Brow Position: Raised brows often indicate surprise or fear.

Example: AI Facial Landmark Detection Using Python

        
            import cv2
            import dlib

            # Load pre-trained model
            face_detector = dlib.get_frontal_face_detector()
            landmark_predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

            # Load image
            image = cv2.imread('face_image.jpg')
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

            # Detect faces
            faces = face_detector(gray)
            for face in faces:
            landmarks = landmark_predictor(gray, face)
            for n in range(68):
            x, y = landmarks.part(n).x, landmarks.part(n).y
            cv2.circle(image, (x, y), 1, (0, 255, 0), -1)

            cv2.imshow('Facial Landmarks', image)
            cv2.waitKey(0)
            cv2.destroyAllWindows()

In this example, the code uses the Dlib library to detect facial landmarks and visualize them on an image.

Conclusion

Facial landmark detection and expression mapping play a critical role in recognizing emotions. By accurately analyzing facial movements and muscle patterns, AI systems can interpret human emotions effectively. This technology has vast applications in fields such as healthcare, automotive safety, and entertainment.

3. Model Selection and Training

3.1 Choosing the Right AI Model for Facial Emotion Recognition

3.1.1 Convolutional Neural Networks (CNNs)

CNNs are widely used for image classification tasks. They consist of convolutional layers that extract features from images, followed by pooling and fully connected layers for classification.

Example:

A simple CNN model might include layers like:

        
            Conv2D(filters=64, kernel_size=(3,3), activation='relu', input_shape=(48, 48, 1))
            MaxPooling2D(pool_size=(2,2))
            Flatten()
            Dense(7, activation='softmax')

3.1.2 VGG (Visual Geometry Group)

VGG models are deep CNN architectures with small 3x3 convolutional filters. They are effective for image recognition tasks and can be fine-tuned for facial emotion detection.

Example:

VGG16 pre-trained on ImageNet can be used with transfer learning:

        
            base_model = VGG16(weights='imagenet', include_top=False, input_shape=(48, 48, 3))
            x = Flatten()(base_model.output)
            x = Dense(7, activation='softmax')(x)
            model = Model(inputs=base_model.input, outputs=x)

3.1.3 ResNet (Residual Networks)

ResNet uses skip connections to solve the vanishing gradient problem. It's particularly effective for deeper models and is often used for emotion recognition with complex datasets.

Example:

Using ResNet50 for facial emotion detection:

        
            base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(48, 48, 3))
            x = GlobalAveragePooling2D()(base_model.output)
            x = Dense(7, activation='softmax')(x)
            model = Model(inputs=base_model.input, outputs=x)

Comparison Table

Criteria	CNN	VGG	ResNet
Model Depth	Shallow	16-19 layers	50-152 layers
Parameter Count	Few	~138M	~25M (ResNet-50)
Accuracy	Moderate	High	Very High
Training Time	Fast	Slower	Efficient with Residuals

Choosing the right model depends on your dataset size, computational resources, and accuracy needs. For smaller datasets, CNNs are a practical choice, while VGG or ResNet models provide better accuracy for larger datasets.

3.2 Transfer Learning for Emotion Detection

3.2.1 What is Transfer Learning?

Transfer learning involves taking a pre-trained model, often built on large datasets like ImageNet, and fine-tuning it for a specific task. In emotion detection, transfer learning can significantly reduce training time and improve accuracy.

3.2.2 Why Use Transfer Learning for Emotion Detection?

Reduces the need for large labeled datasets.
Accelerates model convergence.
Provides robust feature extraction from pre-trained models.

3.2.3 Example: Using VGG16 for Emotion Detection

Here’s a step-by-step example of how to implement transfer learning using VGG16 in TensorFlow for facial emotion recognition.

Step 1: Import Required Libraries

        
            import numpy as np
            import tensorflow as tf
            from tensorflow import keras
            from tensorflow.keras import layers
            from tensorflow.keras.applications import VGG16

Step 2: Load Pre-Trained VGG16 Model

        
            base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
            base_model.trainable = False # Freeze the convolutional base

Step 3: Add Custom Layers for Emotion Classification

        
            x = layers.Flatten()(base_model.output)
            x = layers.Dense(256, activation='relu')(x)
            x = layers.Dropout(0.5)(x)
            out = layers.Dense(7, activation='softmax')(x) # 7 classes of emotions
            
            model = keras.Model(inputs=base_model.input, outputs=out)

Step 4: Compile and Train the Model

        
            model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
        
            # Assuming train_data and val_data are preprocessed datasets
            model.fit(train_data, epochs=20, validation_data=val_data)

Conclusion

Using transfer learning with models like VGG16 or ResNet can yield highly accurate results for facial emotion detection. Experimenting with hyperparameters, data augmentation, and fine-tuning the base model can further enhance performance.

3.3 Training Process for Emotion Detection

3.3.1 Model Architecture

Choosing the right architecture is key for effective emotion detection. Common choices include Convolutional Neural Networks (CNNs) and Residual Networks (ResNet). These models are designed to extract complex features from images.

Example - CNN Architecture

Using a simple CNN for emotion detection with layers like convolution, pooling, and dense layers.

        
            model = Sequential([
              Conv2D(64, (3,3), activation='relu', input_shape=(48,48,1)),
              MaxPooling2D(pool_size=(2,2)),
              Flatten(),
              Dense(128, activation='relu'),
              Dense(7, activation='softmax')
            ])

3.3.2 Hyperparameter Tuning

Hyperparameters such as learning rate, batch size, and number of epochs significantly affect model performance. Tools like GridSearchCV or Optuna can help automate the tuning process.

Example - Learning Rate Adjustment

Using a learning rate scheduler to dynamically adjust the learning rate during training:

        
            def scheduler(epoch, lr):
              return lr * 0.95 if epoch % 10 == 0 else lr

3.3.3 Optimization Techniques

Optimizers like Adam, SGD, or RMSprop are used to minimize the loss function. Regularization techniques like dropout and batch normalization prevent overfitting.

Example - Using Adam Optimizer

        
            model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

Conclusion

By carefully designing the architecture, tuning hyperparameters, and applying effective optimization techniques, you can achieve high accuracy in facial emotion detection tasks. Evaluate your model using validation data and iteratively improve it for the best results.

3.4 Overview of YOLOv12 for Facial Emotion Detection

3.4.1 Introduction to YOLOv12

YOLOv12 (You Only Look Once version 12) is a state-of-the-art object detection model designed for real-time applications. It excels in speed and accuracy, making it an ideal choice for facial emotion detection. By employing a single neural network to process images, YOLOv12 identifies multiple objects and their locations within milliseconds.

3.4.2 Why YOLOv12 for Real-Time Facial Emotion Detection?

Facial emotion detection requires rapid and precise analysis of expressions. YOLOv12's architecture is optimized for low-latency performance, ensuring quick and reliable predictions even on devices with limited computational power.

3.4.3 Key Advantages of YOLOv12

Speed: YOLOv12 is designed for real-time applications, capable of processing images at up to 60 frames per second.
Accuracy: It uses advanced convolutional layers and improved anchor box selection for more precise predictions.
Efficiency: YOLOv12 is lightweight, making it suitable for deployment on edge devices such as smartphones or embedded systems.
Versatility: Supports multi-object detection, including recognition of subtle facial expressions across different environments.

3.4.4 Example Use Case

Imagine a virtual assistant that uses a camera to monitor a user's emotional state during a conversation. Using YOLOv12, the assistant can detect facial emotions in real-time, adjusting its tone or responses accordingly. This enhances the user experience by making interactions more human-like.

Conclusion

With its exceptional speed, accuracy, and efficiency, YOLOv12 offers a powerful solution for facial emotion detection applications. Its ability to analyze facial expressions in real-time opens up a wide range of possibilities in healthcare, customer service, and human-computer interaction.

3.5 Setting Up the Environment for Facial Emotion Detection

3.5.1 Using Google Colab

Google Colab is a free and easy-to-use platform that supports GPU acceleration. Follow these steps to set up your environment:

Go to Google Colab.
Click on File → New Notebook.
Ensure GPU is enabled: Go to Runtime → Change runtime type and select GPU.

3.5.2 Using Local Machine

If you prefer to run the project on your local machine, follow these steps:

Ensure Python (3.8 or above) is installed using python --version.
Install virtualenv using pip install virtualenv.
Create a virtual environment: virtualenv yolov12_env
Activate the environment:
1. Windows: yolov12_env\Scripts\activate
2. Linux/Mac: source yolov12_env/bin/activate

3.5.3 Install Required Libraries

After setting up the environment, install the following libraries:

        
            !pip install ultralytics torch torchvision numpy opencv-python

Explanation:

ultralytics: Contains the YOLOv12 model for object detection.
torch: PyTorch framework for deep learning.
torchvision: Useful for image transformations and model utilities.
numpy: For efficient numerical computations.
opencv-python: For image processing and video analysis.

3.5.4 Verification

Verify installations using the following commands:

        
            import torch
            import ultralytics
            import numpy as np
            import cv2
            
            print("Torch version:", torch.__version__)
            print("Ultralytics version:", ultralytics.__version__)
            print("Numpy version:", np.__version__)
            print("OpenCV version:", cv2.__version__)

Once all the libraries are installed and verified, you're ready to proceed with model training and emotion detection!

3.6 Downloading the Custom Dataset from Roboflow

Roboflow is a popular platform for creating and managing custom datasets for machine learning tasks. It offers a simple way to download datasets using API links, which can be integrated directly into your machine learning project. Here's how you can download and load a facial emotion dataset from Roboflow.

Step 1: Create and Upload Your Dataset to Roboflow

Create an Account on Roboflow: If you haven’t already, create a Roboflow account here.
Upload the Facial Emotion Dataset: After signing in, you can upload your custom dataset. If you have labeled facial emotion images, you can create a project, label the data, and prepare it for download.
Generate an API Key: Go to the project page, and under the "API" tab, you can find the API key and dataset version you need to use.

Step 2: Downloading the Dataset via the Roboflow API

Roboflow provides an easy way to download datasets directly using the API. You need to install the roboflow Python library and then use the API to get your dataset.

Install the Required Libraries

First, install the roboflow library using pip if it's not already installed:

        
            pip install roboflow

Fetching the Dataset Using the API Key

Once the Roboflow library is installed, you can use it to fetch your dataset. Here's a sample code to demonstrate how you can download the dataset:

        
            from roboflow import Roboflow
    
            # Initialize Roboflow
            rf = Roboflow(api_key="your_roboflow_api_key")  # Replace with your API key
            
            # Get the project and version
            project = rf.workspace().project("your_project_name")  # Replace with your project name
            dataset = project.version(1).download("yolov5")  # "yolov5" indicates YOLO format, other formats are also available
            
            # Verify that the dataset is downloaded
            print("Dataset downloaded to:", dataset.location)

Step 3: Loading and Preparing the Dataset

After downloading the dataset, you can load it into your project. The dataset will typically be in a format like YOLOv5, which includes a train/, valid/, and test/ folder, containing labeled images and a .txt file for each image.

For example:

train/: Contains training images
valid/: Contains validation images
test/: Contains test images
Each image has a corresponding label file with annotations in YOLO format.

Here’s how you can load the dataset in your project:

        
            import os
            import cv2
            import numpy as np
            
            # Define the paths
            train_path = "path/to/train/images/"
            labels_path = "path/to/train/labels/"
            
            # Load one of the images and its label
            image_path = os.path.join(train_path, "image1.jpg")
            label_path = os.path.join(labels_path, "image1.txt")
            
            # Load image
            image = cv2.imread(image_path)
            
            # Load label file
            with open(label_path, "r") as file:
                labels = file.readlines()
            
            # Print the image and label information
            print(f"Image: {image_path}, Labels: {labels}")

Step 4: Integrating the Dataset into Your Model

Once you’ve downloaded and processed the dataset, you can now use it for training your model (e.g., YOLOv5, YOLOv12, or any other model) by pointing the model to the dataset’s directory.

Here’s how you might pass the dataset to a YOLOv12 model:

        
            from ultralytics import YOLO
            
            # Load your YOLOv12 model
            model = YOLO('yolov12.yaml')
            
            # Train the model with the Roboflow dataset
            model.train(data="path/to/roboflow_dataset/data.yaml", epochs=30, batch=8)

The data.yaml file contains important information like the number of classes, class names, and paths to the training, validation, and test datasets.

Conclusion

Roboflow makes it incredibly easy to download and load custom datasets for various machine learning tasks, including facial emotion detection. By leveraging their API and proper dataset integration, you can quickly set up your models and train them with high-quality labeled data.

4. Preparing Our Custom Dataset from Roboflow

The brain tumor dataset, collected from the reputable Roboflow platform, serves as the foundation for training and evaluating the YOLOv12 model. This dataset is meticulously annotated and labeled to ensure accurate detection of brain tumors in medical imaging.

Dataset Structure

A well-organized directory structure is maintained for seamless model training, typically resembling:

        
            /dataset
            │
            ├── images
            │   ├── train
            │   ├── val
            │   └── test
            ├── labels
            │   ├── train
            │   ├── val
            │   └── test
            └── data.yaml

The dataset is systematically organized into three primary folders, each playing a crucial role in the model development process:

Train: Contains the majority of images and corresponding annotations for model training.
Validation (Val): Used to fine-tune model hyperparameters and prevent overfitting.
Test: Reserved for evaluating the final performance of the trained model.
The data.yaml file contains paths to the datasets, class names, and other configuration details required for YOLOv12 training.

With a properly prepared dataset, the YOLOv12 model can be efficiently trained to detect brain tumors, contributing to faster and more accurate diagnoses in medical imaging.

4.1 Data Annotation

Each image is carefully annotated using tools like LabelImg or Roboflow to draw bounding boxes around brain tumors.
The annotations are saved in YOLO format (.txt files) containing class labels and coordinates of the bounding boxes.

4.2 Data Augmentation

To improve model robustness, various augmentation techniques such as rotation, flipping, scaling, and brightness adjustment are applied.
Augmented images help the model learn from diverse scenarios, reducing overfitting.

4.3 Dataset Splitting

The dataset is divided into three subsets:
1. Training Set (70%) - For model learning
2. Validation Set (20%) - For hyperparameter tuning
3. Test Set (10%) - For final evaluation

4.4 Our File Structure

Facial emotion file structure.

Human facial emotion images in /train/images folders.

Annotated txt files for each image, with the same name of images.

The downloaded dataset contains 1,227 images and 1,227 txt files.They are distributed as:

train 943 images and 943 txt label Files (77%)
val 185 images and 185 txt label Files (15%)
test 99 images and 99 txt label Files (8%)

Contents of data.yaml file.

The yaml file contains 7 classes brain tumors:

angry
disgust
fear
happy
neutral
sad
surprise

5. Upload Facial Emotion to Google MyDrive

4.1 Uploaded facial_emotion_detection Folder to Google Drive under AI_demos folder:

Loaded facial_emotion_detection folder to Google Drive

The uploaded facial_emotion_detection folder contains test, train and val folders including data.yaml file.

6. Google Colab Configuration

6.1 Setup Google Colab:

Setting up Google Colab.

Right click inside workspace AI_demos folder
Click: more
Click: Google Colaboratory

6.2 Rename Google Colab:

Rename Google Colab Notebook brain_tumor.ipynb.

7. Why Use a T4 GPU on Google Colab Instead of a CPU?

Use of T4 GPU on Google Colab

Google Colab offers free access to GPUs like the NVIDIA T4, which is a powerful hardware accelerator specifically designed for deep learning tasks. Using a GPU instead of a CPU can dramatically improve the speed and efficiency of your machine learning workflows.

✅ Benefits of Using a T4 GPU on Google Colab

Faster Training and Inference: GPUs can process large amounts of data in parallel, making them ideal for deep learning models.
Free Access: Google Colab provides free GPU access, and users can upgrade to Colab Pro or Pro+ for longer usage and better resources.
Efficient for Large Datasets: With up to 16GB memory, T4 GPUs handle complex models and large datasets efficiently.
TensorFlow and PyTorch Support: Accelerate training using CUDA-enabled frameworks like TensorFlow and PyTorch.
Easy Setup: Simply select Runtime → Change runtime type → GPU to connect a GPU in Colab.

❗ Drawbacks of Working with a Normal CPU

Slow Processing: CPUs are slower for tasks involving large datasets and complex computations.
Limited Cores and Memory: CPUs have fewer cores compared to GPUs, limiting their ability to handle parallel processing.
Inefficient for Neural Networks: Neural networks rely heavily on matrix operations, which GPUs handle far better than CPUs.
Resource Exhaustion: CPU memory may become insufficient when training large-scale models, leading to system crashes or slowdowns.

📌 Final Thoughts on GPU

For deep learning projects, using a T4 GPU on Google Colab is a game-changer. It significantly reduces training time, improves model accuracy with larger datasets, and provides a seamless experience for machine learning experiments. While CPUs are suitable for small-scale tasks, they are inefficient for advanced AI projects.

7.1 GPU Setting in Colab:

Click on Edit menu on Colab

Click on Notebook setting on Colab

Select T4 GPU and save

Check whether GPU is available or not.

    
        GPU is available: True
        GPU name: Tesla T4

Output indicates that GPU is available.

    
        CODES to check GPU availability:

        import torch
        print("GPU is available:", torch.cuda.is_available())
        print("GPU name:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU found")

Mount Google drive

    
        Codes:

        from google.colab import drive
        drive.mount('/content/drive')

Clone YOLOV12 Github in a Google Colab Cell

Created yolov12 Folder in Google Drive

After mounting Google drive:

Go to folder: AI_demos
Clone YOLOV12 Repo
Go to newly created yolov12 folder inside AI_demos folder.

        
            Codes:
            
            #Mount Google drive to colab cell
            from google.colab import drive
            drive.mount('/content/drive')

            %cd /content/drive/MyDrive/AI_demos

            #Clone yolov12 Repo
            !git clone https://github.com/sunsmarterjie/yolov12.git

            #Go to newly created yolov12 folder
            %cd yolov12

Copy codes from yolov12 github repo

        
            CODES:
            
            #Codes to install
            wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
            conda create -n yolov12 python=3.11
            conda activate yolov12
            !pip install -r requirements.txt
            !pip install -e .

We shall have to install lines 1, 4 and 5 above in Colab Notebook cells. However, as we are worlikg in Google Colab environments in Google drive, codes in lines 2 and 3 are not needed. These two lines of codes are needed if we work in a local computer in a separate environment.

Installed first command copying from yolov12 repo

Displays that Flash_Attention installed

Installed fourth command copying from yolov12 repo

Installed fifth command copying from yolov12 repo

Installed Ultralytics

        
            CODES:
            
            #Ultralytics installation
            !pip install ultralytics

Installed Ultralytics

    
        YOLOV12 Clone Command:

        !git clone https://github.com/sunsmarterjie/yolov12.git
        %cd yolov12

8. YOLO Model Training Practical Implementation

Download Train Code from github and Modify

Copied Training Codes & Modify as our requirements

Modified Training codes as per our requirements

We have modifed the codes as below:

data='data.yaml' instead of 'coco.yaml'
epochs=80, instead of 600 to reduce time from 6+ hours to 48 minutes
batch=16, instead of 256 to prevent out of GPU memory ERROR message
device="0", to set T4 GPU

    
        OUR MODIFIED CODES:
            
        from ultralytics import YOLO

        model = YOLO('/content/drive/MyDrive/AI_demos/yolov12/yolov12/ultralytics/cfg/models/v12/yolov12.yaml')

        # Train the yolo12 model forest_fire dataset
        results = model.train(
        data='/content/drive/My Drive/AI_demos/face_emotion_detection/data.yaml',
        epochs=80,  # Reduce to 80 epochs from 600 epocs to save time 
        batch=16,    # Reduce batch size (Try 16, 8, or even 4)
        imgsz=640,
        scale=0.5,
        mosaic=1.0,
        mixup=0.0,
        copy_paste=0.1,
        device="0",
        )

Display shows the locations of created last.pt & best.pt files that are saved.

Notes:

80 epochs took 0.797 hours.
Therefore, suggested 600 epochs would take 6+ hours.
To reduce train time we have modified epochs = 80.

Directory locations of created last.pt & best.pt files that are saved.

9. YOLO Model Custom Detection & Tests

Test the YOLO model to detect fire and smoke on an image named.

Test the model to detect Brain tumor on an image.

    
        CODES:

        from ultralytics import YOLO

        # Load the trained YOLO model using the correct path
        model_path = '/content/yolov12/runs/detect/train/weights/best.pt'
        print("File exists:", os.path.exists(model_path))

        # Load the model and run inference
        if os.path.exists(model_path):
            model = YOLO(model_path)
            results = model('/content/drive/MyDrive/AI_demos/face_emotion_images/face2.png')
            results[0].show()
        else:
            print("Model file not found!")

Similarly we have testeg tumors on the following images:

happy.png
surprise.jpg
angry.png
fear.png
sad.png
disgust.png
neutral.png

After Training YOLOV12 Model, Brain Tumor Detection Results

10. Emotion Detection Table

Serial	Images Before Emotion Detection	Images After Emotion Detection	Results
1			Correct
2			Correct
3			Correct
4			Correct
5			Wrong
6			Correct
7			Correct
8			Wrong

Note:

Using an online GPU, after training the YOLOv12 model with a custom dataset, it successfully detected 75% cases of Human Facial Emotions. This limitation is primarily due to insufficient training. While it was recommended to run the training for 600 epochs for optimal performance, we completed only 80 epochs to reduce training time. The process took approximately 0.797 hours. Based on this, running the full 600 epochs would require an estimated 6 hours using GPU.

However, without a GPU, using a normal CPU would significantly increase the training time. It would take about 13 hours to run 80 epochs. Therefore, completing 600 epochs would take approximately 200 hours, equivalent to about 8.33 days.

11. Recommendations

Please try to use online INVIDIA T4 GPU in Google Colab instead of local CPU, to reduce training time of YOLOV12 models on custom datasets

12. Conclusion

The YOLOv12 training on a custom dataset of facial emotions was completed. The model partially detected the facial emotions, but it could not detect facial emotions due to insufficient training epochs. Further training with the recommended number of epochs or additional dataset augmentation may enhance facial Emotions detection accuracy.

13. References

Redmon, J., & Farhadi, A. (2016). YOLO: Real-Time Object Detection. arXiv preprint arXiv:1506.02640.
Ultralytics. (2023). YOLOv11 Documentation. Available at: https://ultralytics.com/
PyTorch Documentation. Available at: https://pytorch.org/
COCO Dataset:
Google Colab:
CloudConvert for video conversion:
EZGIF for video format conversion: