Skip to content

This repository contains the implementation of the Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA) tasks. The project is structured to be easy to set up and use, providing a streamlined approach for experimenting with different configurations and datasets.

Notifications You must be signed in to change notification settings

SJ9VRF/Fine-tune-Vision-Language-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine-tune-Vision-Language-Model

This repository contains the implementation of the Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA) tasks. The project is structured to be easy to set up and use, providing a streamlined approach for experimenting with different configurations and datasets.

Screenshot_2024-08-08_at_9 50 25_PM-removebg-preview

Installation

  1. Clone the Repository
git clone https://your-repository-url.git
cd vilt-vqa
  1. Install Dependencies
pip install -r requirements.txt
  1. Download Data Ensure that your data files are in the data/ directory as specified in settings.py.

Usage

Training the Model

To train the model, run:

python train.py

This script will train the model using the configurations specified in config/settings.py.

Making Predictions

To perform inference with a pre-trained model, run:

python infer.py --image_path 'path/to/image.jpg' --question 'What is in the picture?'

This will load the trained model and output the top predictions for the specified image and question.

Configuration

Edit config/settings.py to modify paths, model parameters, and other settings like device configuration for GPU acceleration.

About

This repository contains the implementation of the Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA) tasks. The project is structured to be easy to set up and use, providing a streamlined approach for experimenting with different configurations and datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages