What is GGUF?
TL;DR
The file format used by llama.cpp for AI models. The standard format for running LLMs locally.
GGUF: Definition & Explanation
GGUF (GPT-Generated Unified Format) is the file format used by the llama.cpp project for AI models. Developed as the successor to the older GGML format, it packages model weights, tokenizer information, and metadata into a single file. Its key advantage is optimization for CPU-based inference, enabling LLMs to run on ordinary PCs without a GPU. GGUF is widely used for distributing quantized models, with options for various quantization levels such as Q4_K_M and Q5_K_M. Ollama uses llama.cpp as its backend and can easily run GGUF-format models. Hugging Face hosts a large number of GGUF-format models, including popular open-source options like Meta LLaMA, Mistral, and Gemma. GGUF has become the de facto standard for local LLM execution.