Skip to content

kksKishore K Sharma

Work Writing About Uses Contact

Get in touch→

/tag

#cost-optimization

← all writing

/footer · still here

If you're building something hard, let's talk.

Start a conversation→

Direct

via the contact form →
Noida, UP

Elsewhere

LinkedIn ↗
GitHub ↗
X ↗
Hashnode ↗
dev.to ↗
Bluesky ↗
Mastodon ↗
Instagram ↗
About
RSS feed ↗

Views are my own and do not represent any current or past employer. All work shown was completed under appropriate confidentiality and IP terms.

© 2026 Kishore · Built with restraint.·Privacy Termsv2 · System online

1 piece

Jun 4, 202610 min read
Semantic Caching for LLMs: Cache on Meaning, Not on Strings
A normal cache keyed on the exact request string is almost useless for LLM calls, because every paraphrase is a miss. Semantic caching keys on meaning instead — embed the query, search for a near-identical past question, and return its answer with no model call. Here's the architecture, the threshold problem that makes or breaks it, and real pgvector code.
- #llm
- #caching
- #pgvector
- #redis
- #embeddings
- #cost-optimization
- #typescript
- #backend