Image Captioning is the process of generating textual description of an image. In this project, I have implemented a Deep Learning Model inspired by this paper and this paper using COCO dataset by Microsoft and trained the network for nearly 10 hrs using GPU.
The architecture consists of:
- CNN based on the ResNet architecture encoder, which encodes the images into the embedded feature vectors