(Unofficial) building Hugging Face SmolLM-blazingly fast and small language model with PyTorch implementation of grouped query attention (GQA)
transformer attention smol huggingface ml-efficiency llm grouped-query-attention smol-lm huggingface-smol-lm
-
Updated
Jan 11, 2025 - Python