What is GGUF (GPT-Generated Unified Format)?
TL;DR
The standard file format for running local LLMs. Widely adopted in llama.cpp.
GGUF (GPT-Generated Unified Format): Definition & Explanation
GGUF (GPT-Generated Unified Format) is the standard model file format for running LLMs in local environments. Developed within the llama.cpp project, it replaced the older GGML format. GGUF includes model metadata (architecture, quantization information, tokenizer settings, etc.) within the file, allowing a single file to contain all information needed to run the model. It supports various quantization levels including 4-bit, 5-bit, and 8-bit, and enables CPU-only inference without a GPU. Local LLM tools such as LM Studio, Ollama, GPT4All, Jan, and KoboldCpp provide standard GGUF support, and numerous GGUF models are published on Hugging Face.