Given an image, generate a suitable caption describing the image using CNN and LSTM RNN cells. The project was a part of an academic course in team of 3, under CS 535 Deep Learning.
January 2020 – March 2020
Python, Numpy, Pandas, PyTorch, SkLearn, MatplotLib
🗯️ The Problem
Given an image, the model should generate a suitable caption describing the content and actions in the image. There has been several implementations in parallel to this problem; this project aims to solve by using CNN plus RNN model.
🎯 Architecture
The architecture of the project is inspired from the implementation described in this paper. The paper illustrates usage of blend of a pre-trained CNN and several LSTM RNN cells. Along with it, the project also implemented captioning using attention mechanism.
Following steps were performed over the completion of project:
Image Captioning Model
Training and Validation loss v/s TimeSteps
Perplexity v/s Timesteps
Encoder-Decoder architecture
Made with 🖤 by Gulshan