Skip to content

Informative Image captioning is describing an image fed to the model. This repo contains end to end model for Automatic image captioning using CNN and RNN

License

Notifications You must be signed in to change notification settings

arjavdongaonkar/Informative-Image-Captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Informative-Image-Captioning

Image Captioning is the technique in which automatic descriptions are generated for an image.

Image Captioning is the process of generating a textual description of an image. It uses both Natural Language Processing and CNN to generate the captions.

The entire code is in the jupyter notebook, so that should hopefully make it easier to understand.

Dependencies

1.Keras 2.3.1

2.Tensorflow-gpu 2.2.0

3.tqdm

4.numpy

5.pandas

6.matplotlib

7.pickle

8.PIL

9.glob

Imp: This code is implemented using Tensorflow-gpu.

You must have an Nvidia GPU and corresponding Drivers.

Dataset

I have used The Flickr8k dataset(size 1 GB). MS-COCO and Flickr30K are other datasets that you can use.

Flickr8K has training images-6000

validation images-1000

testing images-1000

Each image has 5 captions describing it.

Model

In Image Captioning, a CNN is used to extract the features from an image which is then along with the captions is fed into an RNN. To extract the features, we use a model trained on Imagenet. I tried out VGG-16, Resnet-50, and InceptionV3. Vgg16 has almost 134 million parameters and its top-5 error on Imagenet is 7.3%. InceptionV3 has 21 million parameters and its top-5 error on Imagenet is 3.46%. Human top-5 error on Imagenet is 5.1%.

For creating the model, the captions have to be put in an embedding. Setting the embedding size to 300. The image below is the model that I used. . alt text

After training the model for 50 epochs with batch size of 512,

the accuracy achieved was 75% and the loss was lowered to 0.911.

Results

Finally, here are some results that I got. The code for the results is in the jupyter notebook and you can generate your own by writing some code at the end.

1. True caption: Three child soccer players go for the ball .

alt text

2. True caption: A dog wading in the water with a ball in his mouth .

alt text

3. True caption: A large white bird flies over water .

alt text

4. True caption: small dog running in the grass with a toy in its mouth .

alt text

5.True caption: The girls is jumping into the air on the beach .

alt text

About

Informative Image captioning is describing an image fed to the model. This repo contains end to end model for Automatic image captioning using CNN and RNN

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published