Skip to content

deepakpawade/automatic_lip_reading

Repository files navigation

Automatic Lip Reading using deep learning techniques

""Fifteen speakers (five men and ten women) positioned in the frustum of a MS Kinect sensor and utter ten times a set of ten words and ten phrases (see the table below). Each instance of the dataset consists of a synchronized sequence of color and depth images (both of 640x480 pixels). The MIRACL-VC1 dataset contains a total number of 3000 instances."" image image

We have limited the scope of the project to only predicting the words.

Modules

The main code cells are in the files ./data_genertor.ipynb and ./architectures/3d_cnn.ipynb

data_generator.ipynb : Crops lips from face images and store them in the same folder structure as the original. image image

Extracted features: image

training_model.ipynb : Model:

image

Results:

image

At Epoch 45, the last epoch, the Validation accuracy of the model was 0.5850 which is expected as this was a simple 3DCNN with no memory retention like RNNs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published