Simple Gradio application integrated with Hugging Face Multimodals to support visual question answering chatbot and more features
-
Updated
Aug 16, 2024 - Python
Simple Gradio application integrated with Hugging Face Multimodals to support visual question answering chatbot and more features
PaliGemma is a cutting-edge open vision-language model (VLM) developed by Google. It is designed to understand and generate detailed insights from both images and text, making it a powerful tool for tasks such as image captioning, visual question answering, object detection, and object segmentation.
Conversational Image Recognition Chatbot
Add a description, image, and links to the image-text-to-text topic page so that developers can more easily learn about it.
To associate your repository with the image-text-to-text topic, visit your repo's landing page and select "manage topics."