Java 21 Virtual Threads in Production: When They Help, When They Hurt, and the Subtle Footguns Nobody Warns You About
Virtual threads make Java look like Go for concurrency — until you hit a synchronized block, a ThreadLocal hot path, or a pool you forgot to remove. A working engineer's guide to what actually changes and what to leave alone.
- #java
- #spring-boot
- #concurrency
- #performance
- #backend
Virtual threads have been generally available since Java 21 in late 2023, and by now most teams I work with are running them somewhere in production. The marketing arc is well-known: blocking I/O without blocking the kernel, millions of threads cheap, "Java has Go's concurrency story now." All true in the right shape of program.
What's underdiscussed is the failure shapes — the cases where adoption is a non-event, the cases where it's a regression, and the surprisingly small set of code patterns that have to be reworked or they will silently pin a virtual thread to its carrier and erase the benefit you migrated for.
This post is what I'd want a teammate to know before flipping a Spring Boot service to virtual threads. No hand-waving about "modern Java" — the actual mental model, the actual gotchas, the actual measurements that tell you it worked.
The one-paragraph mental model
A virtual thread is a Thread instance the JVM schedules onto a small pool of OS threads called carriers. When a virtual thread blocks on I/O via the standard library, the runtime unmounts it from its carrier and parks it on the I/O event; another virtual thread mounts in its place; the OS thread never sleeps. The effect is blocking code, non-blocking runtime. You write client.send() and repository.findById() and Thread.sleep() the way you always have, but you can now have a million of them in flight without exhausting the kernel's thread budget.
The promise is that you delete your reactive code, your CompletableFuture chains, your callbacks — and you keep the throughput. The reality is that the promise holds if your blocking calls go through the right APIs, which is most code but not all of it.
What "use virtual threads" actually means in Spring Boot
In Spring Boot 3.2+, three config options flip the model:
spring.threads.virtual.enabled=trueThat single property opts in the Tomcat/Jetty request executor and the @Async executor and the scheduled-task executor. For most services this is enough — Tomcat now serves each request on a virtual thread, your downstream JDBC and HTTP calls block but the virtual thread unmounts, and your max in-flight requests effectively becomes "as many as you can fit in heap."
The remaining levers, when you want finer control:
Executors.newVirtualThreadPerTaskExecutor()— for any place you'd previously have used a bounded thread pool for I/O-bound work.Thread.ofVirtual().start(runnable)— for one-off spawns.- The
StructuredTaskScopeAPI — for fan-out concurrency with cancellation semantics. (Still incubating in the form most teams want, but the basic shape is stable.)
The mistake here is leaving stale ThreadPoolExecutor configs in place after enabling virtual threads. A bounded pool of 200 platform threads sitting in front of work that should run on virtual threads silently caps your throughput at 200. Search the codebase for Executors.new and ThreadPoolTaskExecutor after the migration — every match needs a decision.
The four pin points that kill the benefit
A virtual thread is "pinned" when it cannot unmount from its carrier. While pinned, the carrier OS thread is held; the runtime falls back to platform-thread economics for that work. A handful of pinned hot paths and your "we moved to virtual threads" benchmark looks like nothing changed.
The four sources of pinning, in descending order of how often I see them in real codebases:
1. synchronized blocks holding the monitor across I/O
synchronized (cache) {
Result r = httpClient.send(request, ofString()); // pins for the duration of the HTTP call
cache.put(key, r);
}The fix is to either narrow the synchronized region to only the mutation (do the I/O outside the lock) or replace synchronized with ReentrantLock, which does not pin. Most "I think this is a hot path" cases are #1 — investigate synchronized first. As of late 2024 OpenJDK fixed the synchronized-pinning case in JDK 24+; if you're on 21–23 you still need the discipline. Verify your runtime before assuming it's solved.
2. Native frames on the stack during the blocking call
JNI calls and a few legacy java.io paths still pin. The standard library java.net.HttpClient, JDBC (in modern drivers), Netty, and java.nio channels all unmount correctly. The cases that don't: FileInputStream / FileOutputStream on some platforms, legacy database drivers that haven't been updated, anything using Object.wait() on a native monitor.
The diagnostic: run with -Djdk.tracePinnedThreads=full for a few minutes in load. The JVM prints a stack trace at every pin event. Read the traces, fix the cases. Don't leave the flag on in steady state — it's overhead — but use it during validation.
3. ThreadLocal hot paths
ThreadLocal works with virtual threads — but every virtual thread has its own copy. If you have a ThreadLocal<HeavyObject> and now run a million virtual threads, you have a million heavy objects. The pattern that was harmless at 200 threads becomes a heap problem at 200,000.
The fix: audit ThreadLocal usages. Anything that's effectively a per-thread cache of an expensive object should be replaced with either a real cache (sized, evictable) or, where the JDK supports it, ScopedValue — which is per-task, not per-thread, and doesn't have the multiplication problem.
4. Pooled resources that you forgot scale by thread count
This isn't pinning, but it bites the same way. If your code does connectionPool.borrowConnection() on every request and you now have 100,000 concurrent requests, you need a 100,000-connection database or you need to gate concurrency upstream of the connection borrow.
The right shape: cap concurrency on the I/O resource (Semaphore around the DB call), not on the thread (which is now free). Virtual threads make the thread-pool concurrency cap obsolete; they don't make the database's connection limit obsolete.
When virtual threads don't help
Honest list — virtual threads are not a free win for everything.
- CPU-bound work. A virtual thread doing matrix math doesn't unmount; it holds its carrier. You still want a bounded executor (
ForkJoinPool,Executors.newWorkStealingPool) sized to cores for CPU work. Mixing them is fine; using virtual threads for compute is not. - Programs with very low concurrency. A service handling 50 requests in flight on platform threads doesn't get faster on virtual threads. The benefit is at high concurrency. If your max-in-flight is well under your thread pool size, the migration is a no-op (which is fine — also not a regression).
- Reactive codebases you're not going to rewrite. If you're on WebFlux/Reactor and the team understands it and the throughput is what you want, virtual threads aren't an automatic upgrade. The two models can coexist, but mixing reactive composition with virtual-thread-mounted blocking calls inside the same handler is a category error that produces hard-to-debug latency.
- Tests that depend on
Thread.currentThread().getName()or thread identity. Virtual threads have generated names likeVirtualThread[#42]. Tests pinning on thread-name patterns will need updating.
What to measure
The mistake I see most often: teams turn on virtual threads, deploy, and don't measure anything beyond "the app didn't crash." That's not enough. The signals that tell you whether the migration paid off:
- p99 latency at peak load. Should be equal or better. If worse, you've introduced a pin or a downstream bottleneck.
- Active in-flight requests at peak. Should be higher than your old thread pool ceiling. If it's not, your concurrency is being capped somewhere upstream — Tomcat config, a
Semaphore, a connection pool. Find it. - CPU utilisation per request. Should be unchanged. Higher means pinning is causing extra runtime churn.
- Pinned-thread events. Should be zero in steady state. If
-Djdk.tracePinnedThreads=fullproduces output, you have unfinished work.
Set those up before the migration. Compare with the same load before and after. A go-live without a comparison benchmark is a vibes-based migration and you won't know what you bought.
The migration playbook I actually use
In order, slowly:
- Enable on a single low-stakes service first. Not the payment path. Something I/O-heavy but tolerant. Flip
spring.threads.virtual.enabled=true, run with-Djdk.tracePinnedThreads=short, measure for a week. - Read the pin traces. Fix the synchronized blocks first; they're usually the bulk of the volume.
- Audit and remove stale executor configs.
grep -r "new ThreadPoolTaskExecutor\|newFixedThreadPool\|newCachedThreadPool"and decide each one. Most should be gone or replaced withnewVirtualThreadPerTaskExecutor. - Audit
ThreadLocalusage. Anything heavy gets evaluated for replacement. - Cap concurrency at the resource, not the thread.
Semaphorearound DB calls if you're worried, sized to the connection pool. - Run the load test. Compare against the pre-migration baseline. If p99 didn't improve at high concurrency, find what's still capping you.
- Roll out to the next service. Repeat the diagnostic.
Skipping #1 — going straight to "flip the flag in prod everywhere" — is how teams end up rolling back. The flag is cheap to flip and cheap to unflip; the prod incident from doing it without a baseline is not cheap to recover from.
The single best diagnostic, again
If you read nothing else from this post, remember: -Djdk.tracePinnedThreads=full. It is the difference between guessing why virtual threads didn't help and knowing exactly which synchronized block to rewrite. Run it during validation on every service that flips the flag. The output is dense but the patterns are easy to spot — same stack trace repeating means same pin point, fix once, recheck.
Most teams who report "virtual threads were disappointing" did not run this flag. The ones who did fixed their pin points in an afternoon and got the throughput the marketing promised.
What this changes about how you write Java
Less than people expected.
You stop writing CompletableFuture chains for things you'd rather express linearly. You stop sizing thread pools for I/O work. You start thinking about resource concurrency caps instead of thread caps. You learn StructuredTaskScope for the cases where you used to write tangled CompletableFuture.allOf patterns.
The reactive style still exists, still has its place — but for the majority of request-response services with a database and a few HTTP calls, the simpler synchronous code on virtual threads is now the production-default. That's the actual change. The code looks like a 2010 Spring controller again, runs at 2025 throughput, and the only people who needed to learn anything new are the ones still writing reactive code by reflex.
That is, finally, the upgrade the language pitched in the JEP. The footnotes — pinning, ThreadLocal multiplication, resource caps — are the price of admission, and a small one.
/share

Kishore K Sharma
Lead Full Stack Engineer | Java · Spring Boot · Distributed Systems · AWS | Building Scalable Cloud-Native Platforms