Prompt engineering was about wording one message. Context engineering is about managing the entire context window as a scarce budget — what goes in, in what order, and what gets evicted. For a backend engineer, it's working-set management applied to an LLM.
A normal cache keyed on the exact request string is almost useless for LLM calls, because every paraphrase is a miss. Semantic caching keys on meaning instead — embed the query, search for a near-identical past question, and return its answer with no model call. Here's the architecture, the threshold problem that makes or breaks it, and real pgvector code.
How resolving redundant API calls and leveraging caching transformed a sluggish billing generation process into a performant operation.