Computer Vision Project

Computer Vision Project

Group (4)
March 2025

Project Overview

Developed and trained an end-to-end neural network for classifying 20 object classes in colour images

This project uses the PASCAL VOC 2009 dataset, consisting of colour images with different object classes (e.g. animal: bird, cat, ...; vehicle: aeroplane, bicycle, ...), totalling 20 classes. We developed and trained an end to end neural network for classifying these objects using a fine-tuned version MobileNetV2.

Base Model

  • MobileNetV2 pre-trained on ImageNet serves as our feature extractor. This lightweight but powerful architecture provides a strong foundation for our classification task.

Custom Classification Head

  • Global Average Pooling layer to reduce spatial dimensions
  • Dense layer (512 units) with ReLU activation
  • Dropout layer (0.3) for regularization
  • Dense layer (256 units) with ReLU activation
  • Final Dense layer with sigmoid activation for multi-label classification

Training Strategy

  • Two-phase training approach:
    • Initial training with frozen base model
    • Fine-tuning phase with last 5 blocks of MobileNetV2 unfrozen
  • Learning rate reduction during fine-tuning (50% of initial rate)
  • Batch size reduction during fine-tuning for better stability
Computer Vision project overview

Advanced Features

  • Class-weighted loss function was implemented to handle the severe class imbalance.
  • Data augmentation pipeline was used to improve model robustness and prevent overfitting on limited training dataset.

Results

Our final system achieved a mean F1 score of 0.83, reflecting good overall per-image multi-label performance. The below images show two good predictions from our model.

Computer Vision project overview

The following examples illustrate some of the model's incorrect predictions, highlighting the types of challenges and limitations that impacted overall performance. Here we could criticise the ground truth, asking questions such as "Is it fair to be penalising the model if objects are so heavily obstructed?"

Computer Vision project overview