/tag

#backend

13 pieces

Jun 6, 202613 min read
Streaming LLM Responses: The Backend Engineer's Guide to Getting Tokens Out Fast
Why streaming an 8-second answer makes it feel fast even though it isn't, how to pick between long-poll, SSE, and WebSockets, and how to build a NestJS SSE endpoint that proxies a streaming LLM call — including the proxy-buffering, cancellation, and mid-stream-error gotchas that bite everyone the first time.
- #streaming
- #sse
- #nestjs
- #nodejs
- #llm
- #backend
- #spring
- #performance
Jun 4, 202610 min read
Semantic Caching for LLMs: Cache on Meaning, Not on Strings
A normal cache keyed on the exact request string is almost useless for LLM calls, because every paraphrase is a miss. Semantic caching keys on meaning instead — embed the query, search for a near-identical past question, and return its answer with no model call. Here's the architecture, the threshold problem that makes or breaks it, and real pgvector code.
- #llm
- #caching
- #pgvector
- #redis
- #embeddings
- #cost-optimization
- #typescript
- #backend
Jun 2, 202611 min read
GraphRAG: When Vector Search Quietly Gives Up
Plain vector RAG can't answer multi-hop or 'across everything' questions — the answer is spread across chunks that no single chunk contains. GraphRAG extracts a knowledge graph instead. Here's how it works, the honest cost, and how to start in Postgres without a graph database.
- #graphrag
- #rag
- #knowledge-graph
- #pgvector
- #postgres
- #ai
- #retrieval
- #backend
May 31, 202611 min read
Just Use Postgres: One Database Until It Actually Hurts
A modest app somehow grew Postgres, Redis, RabbitMQ, Elasticsearch and a vector DB — five things to back up, secure and pay for. Most of that is now one Postgres. Here's the queue, vector, search and pub/sub SQL, and the honest signals for when to graduate.
- #postgres
- #backend
- #infrastructure
- #sql
- #architecture
- #redis
- #vector-search
May 29, 202610 min read
The Transactional Outbox: Publishing Events Without a Distributed Transaction
Your OrderService saves to Postgres and publishes to Kafka — two systems, no shared transaction. There is no safe order to do them in. The outbox pattern makes the write atomic and lets the broker catch up. Here's how, with the relay tradeoffs and the guarantees you actually get.
- #backend
- #microservices
- #patterns
- #kafka
- #postgresql
- #spring-boot
- #messaging
- #production
May 26, 20267 min read
Breaking Into AI Engineering: What a Backend Engineer Actually Needs to Learn (and What to Skip)
Half the 'how to become an AI engineer' advice tells you to start with linear algebra. After years on backend, here's the honest, narrower path — what to learn, what to skip, and what the job actually looks like in 2026.
- #career
- #ai
- #backend
- #learning
- #engineering-leadership

← all writing

Streaming LLM Responses: The Backend Engineer's Guide to Getting Tokens Out Fast

Semantic Caching for LLMs: Cache on Meaning, Not on Strings

GraphRAG: When Vector Search Quietly Gives Up

Just Use Postgres: One Database Until It Actually Hurts

The Transactional Outbox: Publishing Events Without a Distributed Transaction

Breaking Into AI Engineering: What a Backend Engineer Actually Needs to Learn (and What to Skip)

Streaming LLM Responses: The Backend Engineer's Guide to Getting Tokens Out Fast

Semantic Caching for LLMs: Cache on Meaning, Not on Strings

GraphRAG: When Vector Search Quietly Gives Up

Just Use Postgres: One Database Until It Actually Hurts

The Transactional Outbox: Publishing Events Without a Distributed Transaction

Breaking Into AI Engineering: What a Backend Engineer Actually Needs to Learn (and What to Skip)