-
University of Florence - Media Integration and Communication Center
- Florence, Italy
- https://marcomistretta.github.io/
- https://orcid.org/0009-0006-6630-6477
- https://scholar.google.com/citations?hl=it&user=KMIb4eAAAAAJ
- in/marco-mistretta-0b02a021a
- @mistretta_marco
- marcomistre99
Highlights
- Pro
Stars
Easy wrapper for inserting LoRA layers in CLIP.
[ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
A smarter cd command. Supports all major shells.
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
A framework to easily use 32 (and growing) different image matching methods
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper
Pytorch implementation of various Knowledge Distillation (KD) methods.
Open source implementation of "Vision Transformers Need Registers"
A beautiful portfolio Jekyll theme that works with GitHub Pages.
[ECCV-W] Official repo for the paper "ComiCap: A VLMs pipeline for dense captioning of Comic Panels"
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Official Pytorch code for MANTRA - Memory Augmented Neural Trajectory Predictor (CVPR2020)
[ECCV 2024] - ScanTalk: 3D Talking Heads from Unregistered Scans
A light webserver for monitoring RAM and GPU usage on multiple servers.
The official repo of the Comics Survey: "A missing piece in Vision and Language: A Survey on Comics Understanding"
Pen and paper exercises in machine learning
The repository provides code for training the SegmentAnything Model (SAM) for predicting frame polygons in comic books
Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024
JAX reimplementation of the DeepMind paper "Genie: Generative Interactive Environments"
Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]
[ECCVW/TWYN 2024 - Best Workshop Paper] Are CLIP features all you need for Universal Synthetic Image Origin Attribution?