Computer Vision Project

Group (4)

March 2025

Project Overview

Developed and trained an end-to-end neural network for classifying 20 object classes in colour images

This project uses the PASCAL VOC 2009 dataset, consisting of colour images with different object classes (e.g. animal: bird, cat, ...; vehicle: aeroplane, bicycle, ...), totalling 20 classes. We developed and trained an end to end neural network for classifying these objects using a fine-tuned version MobileNetV2.

Base Model

MobileNetV2 pre-trained on ImageNet serves as our feature extractor. This lightweight but powerful architecture provides a strong foundation for our classification task.

Custom Classification Head

Global Average Pooling layer to reduce spatial dimensions
Dense layer (512 units) with ReLU activation
Dropout layer (0.3) for regularization
Dense layer (256 units) with ReLU activation
Final Dense layer with sigmoid activation for multi-label classification

Training Strategy

Two-phase training approach:

Initial training with frozen base model
Fine-tuning phase with last 5 blocks of MobileNetV2 unfrozen

Learning rate reduction during fine-tuning (50% of initial rate)
Batch size reduction during fine-tuning for better stability

Advanced Features

Class-weighted loss function was implemented to handle the severe class imbalance.
Data augmentation pipeline was used to improve model robustness and prevent overfitting on limited training dataset.

Results

Our final system achieved a mean F1 score of 0.83, reflecting good overall per-image multi-label performance. The below images show two good predictions from our model.

The following examples illustrate some of the model's incorrect predictions, highlighting the types of challenges and limitations that impacted overall performance. Here we could criticise the ground truth, asking questions such as "Is it fair to be penalising the model if objects are so heavily obstructed?"