What is Prompt Caching?
TL;DR
An LLM optimization technique that skips recomputation of identical prompts to reduce costs and response times.
Prompt Caching: Definition & Explanation
Prompt Caching is a technique that caches the computation results of identical or similar prompt inputs to LLMs, eliminating redundant processing to achieve cost reduction and faster response times. Anthropic Claude's prompt caching feature can reduce input token costs by up to 90% by caching system prompts and large context blocks. OpenAI offers a similar capability. Prompt caching is particularly effective for use cases that repeatedly use common system prompts or document contexts, such as RAG applications and chatbots.