/tag

#llm

4 pieces

Jun 8, 20269 min read
Context Engineering: Managing the Window Like a Cache, Not a Prompt
Prompt engineering was about wording one message. Context engineering is about managing the entire context window as a scarce budget — what goes in, in what order, and what gets evicted. For a backend engineer, it's working-set management applied to an LLM.
- #context-engineering
- #prompt-engineering
- #llm
- #rag
- #agents
- #caching
- #ai-engineering
- #typescript
Jun 6, 202613 min read
Streaming LLM Responses: The Backend Engineer's Guide to Getting Tokens Out Fast
Why streaming an 8-second answer makes it feel fast even though it isn't, how to pick between long-poll, SSE, and WebSockets, and how to build a NestJS SSE endpoint that proxies a streaming LLM call — including the proxy-buffering, cancellation, and mid-stream-error gotchas that bite everyone the first time.
- #streaming
- #sse
- #nestjs
- #nodejs
- #llm
- #backend
- #spring
- #performance
Jun 4, 202610 min read
Semantic Caching for LLMs: Cache on Meaning, Not on Strings
A normal cache keyed on the exact request string is almost useless for LLM calls, because every paraphrase is a miss. Semantic caching keys on meaning instead — embed the query, search for a near-identical past question, and return its answer with no model call. Here's the architecture, the threshold problem that makes or breaks it, and real pgvector code.
- #llm
- #caching
- #pgvector
- #redis
- #embeddings
- #cost-optimization
- #typescript
- #backend
May 17, 202611 min read
Spring AI as an LLM Gateway: One Service, Many Providers, No Vendor Lock-In
Treating LLM calls as just another upstream dependency. How to use Spring AI to build a multi-provider gateway with retries, circuit breakers, prompt versioning, and observability — the same hygiene you'd put around any external API.
- #spring-ai
- #java
- #spring-boot
- #llm
- #architecture
- #resilience
- #observability

← all writing

Context Engineering: Managing the Window Like a Cache, Not a Prompt

Streaming LLM Responses: The Backend Engineer's Guide to Getting Tokens Out Fast

Semantic Caching for LLMs: Cache on Meaning, Not on Strings

Spring AI as an LLM Gateway: One Service, Many Providers, No Vendor Lock-In

Context Engineering: Managing the Window Like a Cache, Not a Prompt

Streaming LLM Responses: The Backend Engineer's Guide to Getting Tokens Out Fast

Semantic Caching for LLMs: Cache on Meaning, Not on Strings

Spring AI as an LLM Gateway: One Service, Many Providers, No Vendor Lock-In