Skip to content

kksKishore K Sharma

Work Writing About Uses Contact

Get in touch→

/tag

#pgvector

← all writing

/footer · still here

If you're building something hard, let's talk.

Start a conversation→

Direct

via the contact form →
Noida, UP

Elsewhere

LinkedIn ↗
GitHub ↗
X ↗
Hashnode ↗
dev.to ↗
Bluesky ↗
Mastodon ↗
Instagram ↗
About
RSS feed ↗

Views are my own and do not represent any current or past employer. All work shown was completed under appropriate confidentiality and IP terms.

© 2026 Kishore · Built with restraint.·Privacy Termsv2 · System online

3 pieces

Jun 4, 202610 min read
Semantic Caching for LLMs: Cache on Meaning, Not on Strings
A normal cache keyed on the exact request string is almost useless for LLM calls, because every paraphrase is a miss. Semantic caching keys on meaning instead — embed the query, search for a near-identical past question, and return its answer with no model call. Here's the architecture, the threshold problem that makes or breaks it, and real pgvector code.
- #llm
- #caching
- #pgvector
- #redis
- #embeddings
- #cost-optimization
- #typescript
- #backend
Jun 2, 202611 min read
GraphRAG: When Vector Search Quietly Gives Up
Plain vector RAG can't answer multi-hop or 'across everything' questions — the answer is spread across chunks that no single chunk contains. GraphRAG extracts a knowledge graph instead. Here's how it works, the honest cost, and how to start in Postgres without a graph database.
- #graphrag
- #rag
- #knowledge-graph
- #pgvector
- #postgres
- #ai
- #retrieval
- #backend
May 15, 202612 min read
RAG From a Backend Engineer's POV: It's a Data Pipeline, Not a Magic Trick
Retrieval-augmented generation has been wrapped in enough mystique to obscure that it's mostly an ETL problem. What the pipeline actually looks like, where the real engineering happens, and the failure modes that have nothing to do with the model.
- #rag
- #ai
- #backend
- #vector-db
- #pgvector
- #etl
- #architecture