Project overview 📃

Given an image, generate a suitable caption describing the image using CNN and LSTM RNN cells. The project was a part of an academic course in team of 3, under CS 535 Deep Learning.

January 2020 – March 2020

Tech 👨🏻‍💻 ****

Python, Numpy, Pandas, PyTorch, SkLearn, MatplotLib

📕Source code:

Github

🗯️ The Problem

Given an image, the model should generate a suitable caption describing the content and actions in the image. There has been several implementations in parallel to this problem; this project aims to solve by using CNN plus RNN model.

🎯 Architecture

The architecture of the project is inspired from the implementation described in this paper. The paper illustrates usage of blend of a pre-trained CNN and several LSTM RNN cells. Along with it, the project also implemented captioning using attention mechanism.

Process:

Following steps were performed over the completion of project:

Some results during the training of the model:

                                                 Image Captioning Model

                                             Image Captioning Model

                            Training and Validation loss v/s TimeSteps 

                                                      Perplexity v/s Timesteps

                                           Encoder-Decoder architecture

                                                                                                                                               Made with 🖤 by Gulshan
Powered by Fruition