Backend Development📅 February 10, 2026· 17 min read

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

✍️

Stripe Systems Engineering

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather than the OS — and this fundamentally changes the calculus for Java backend engineers. Meanwhile, Node.js continues to serve millions of production systems with its single-threaded event loop.

This post dissects both models at the implementation level. No hand-waving. We will walk through memory layouts, scheduling mechanics, failure modes, and real benchmark data from a production healthcare integration service.

The Node.js Event Loop — What Actually Happens

Node.js runs JavaScript on a single thread. All user code executes in one call stack. Concurrency comes from non-blocking I/O and an event loop implemented by libuv.

The event loop has distinct phases, executed in order on each tick:

✓Timers — Executes callbacks scheduled by setTimeout() and setInterval() whose threshold has elapsed.
✓Pending callbacks — Executes I/O callbacks deferred from the previous tick (e.g., TCP error callbacks).
✓Idle/Prepare — Internal housekeeping. Not relevant to application code.
✓Poll — Retrieves new I/O events. Executes I/O-related callbacks (excluding timers, close callbacks, and setImmediate). If there are no timers scheduled, the loop blocks here waiting for I/O.
✓Check — Executes setImmediate() callbacks.
✓Close callbacks — Executes close event callbacks (e.g., socket.on('close', ...)).

Between each phase, Node.js drains the microtask queue — this is where resolved Promise callbacks and process.nextTick() callbacks execute. process.nextTick() always runs before other microtasks.

Here is how async/await maps onto this:

async function fetchUserOrders(userId) {
  // This line suspends the function and registers a microtask
  // for when the DB query completes
  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);

  // When the I/O completes, libuv places the callback in the poll queue.
  // The resolved promise callback goes into the microtask queue.
  // This line runs in a subsequent event loop tick.
  const orders = await db.query(
    'SELECT * FROM orders WHERE user_id = $1', [user.id]
  );

  return { user, orders };
}

The critical constraint: the single thread must never block. Any synchronous CPU work longer than a few milliseconds degrades throughput for all concurrent connections. This is not a theoretical concern — a 50ms JSON parse of a large payload blocks 50ms of I/O processing for every other connected client.

Java Traditional Threading — The OS Thread Tax

Before virtual threads, Java's concurrency model was straightforward: one OS thread per concurrent task. The ExecutorService manages a pool of these threads:

ExecutorService pool = Executors.newFixedThreadPool(200);

for (int i = 0; i < 10_000; i++) {
    pool.submit(() -> {
        // Each task gets an OS thread from the pool
        var result = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
        processResult(result.body());
    });
}

The problem is quantifiable. Each OS thread in a typical Linux/JVM configuration consumes:

✓Stack memory: ~1 MB default (-Xss1m). Even with -Xss256k, a thread still consumes 256 KB of committed memory.
✓Kernel data structures: ~8–16 KB per thread for the OS task struct, kernel stack, and page tables.
✓Context switch cost: 1–10 microseconds per switch on modern hardware. Under heavy contention, this becomes the dominant cost.

At 10,000 threads, you are consuming ~10 GB of stack memory alone. At 100,000 threads, most systems cannot even create them — Linux defaults to a maximum of ~32,000 threads per process (/proc/sys/kernel/threads-max), and the JVM will throw OutOfMemoryError well before that.

This is why Java servers historically used thread pools capped at 200–500 threads, which caps concurrent I/O operations at that same number.

Java Virtual Threads — How They Actually Work

Virtual threads are user-mode threads managed by the JVM. They are not OS threads. The JVM schedules virtual threads onto a small pool of OS threads called carrier threads (by default, one per CPU core, managed by a ForkJoinPool).

The mechanics:

✓Mounting: When a virtual thread is ready to run, the JVM scheduler mounts it onto a carrier thread. The virtual thread's continuation (its saved stack) is loaded, and execution resumes.
✓Unmounting: When a virtual thread performs a blocking operation (socket read, Thread.sleep(), Lock.lock()), the JVM unmounts it from the carrier thread. The continuation is saved to heap memory. The carrier thread is free to run another virtual thread.
✓Continuations: Each virtual thread has a continuation — a representation of its call stack stored on the heap. This is what makes them lightweight. The continuation is typically a few hundred bytes to a few KB, depending on stack depth.

// Creating virtual threads directly
Thread vt = Thread.ofVirtual().name("worker-", 0).start(() -> {
    // This runs on a virtual thread.
    // Blocking calls here unmount this thread from the carrier,
    // freeing the carrier for other virtual threads.
    var response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
    process(response);
});

// Using the executor — preferred for task submission
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    IntStream.range(0, 100_000).forEach(i -> {
        executor.submit(() -> {
            // Each task gets its own virtual thread.
            // 100K virtual threads is routine — this is the point.
            Thread.sleep(Duration.ofSeconds(1));
            return fetchFromDatabase(i);
        });
    });
}
// ExecutorService.close() waits for all tasks to complete

Pinning is the critical failure mode. A virtual thread gets pinned to its carrier thread when:

✓It is inside a synchronized block or method and performs a blocking operation.
✓It calls native code via JNI that blocks.

When pinned, the virtual thread cannot unmount. The carrier thread is stuck, reducing the pool of available carriers. We will revisit this in the database pooling section.

Memory Footprint — The Numbers

Consider a server handling 1,000,000 concurrent connections, each waiting on I/O:

Model	Memory per connection	Total for 1M connections
Java OS threads (1 MB stack)	~1 MB	~1 TB (impossible)
Java OS threads (256 KB stack)	~256 KB	~256 GB (impractical)
Java virtual threads	~1–5 KB (heap continuation)	~1–5 GB
Node.js event loop	~1–2 KB (connection state in libuv + JS object)	~1–2 GB

Virtual threads and Node.js converge on similar memory profiles for connection state. The difference is in how you write code against them. Node.js requires non-blocking APIs everywhere. Virtual threads let you write blocking code that behaves non-blockingly under the hood.

The JVM does carry a fixed overhead that Node.js does not: the JIT compiler, class metadata, garbage collector structures. A minimal Java process starts at ~50–100 MB. A minimal Node.js process starts at ~30–50 MB. This fixed cost is irrelevant at scale but matters for microservices on constrained containers.

CPU-Bound vs I/O-Bound — Where Each Model Breaks

Node.js and CPU-bound work

The event loop blocks on CPU work. There is no concurrent execution of JavaScript — period:

const crypto = require('crypto');

// This blocks the event loop for ~200ms on typical hardware.
// During this time, ZERO I/O callbacks are processed.
function hashPassword(password) {
  return crypto.pbkdf2Sync(password, 'salt', 100000, 64, 'sha512');
}

// Correct approach: offload to a worker thread
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

if (isMainThread) {
  function hashPasswordAsync(password) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename, { workerData: { password } });
      worker.on('message', resolve);
      worker.on('error', reject);
    });
  }
} else {
  const hash = crypto.pbkdf2Sync(workerData.password, 'salt', 100000, 64, 'sha512');
  parentPort.postMessage(hash.toString('hex'));
}

Worker threads help, but they have overhead: each spawns a new V8 isolate (~5–10 MB), and data transfer between threads requires serialization (structured clone or SharedArrayBuffer).

Virtual threads and CPU-bound work

Virtual threads do not help with CPU-bound tasks. They are scheduled cooperatively — a virtual thread that never blocks (because it is computing) never unmounts. It occupies a carrier thread for the entire computation:

// This virtual thread will monopolize a carrier thread
// for the entire duration of the computation.
Thread.ofVirtual().start(() -> {
    // Pure CPU work — no yield points
    BigInteger result = BigInteger.ONE;
    for (int i = 2; i <= 100_000; i++) {
        result = result.multiply(BigInteger.valueOf(i));
    }
});

For CPU-bound work, both models fall back to the same answer: use a pool of OS threads or worker threads sized to the number of CPU cores. Virtual threads are specifically designed for I/O-bound workloads where tasks spend most of their time waiting.

I/O-bound work — the sweet spot for virtual threads

This is where virtual threads shine. Code that reads like sequential blocking logic executes with the efficiency of asynchronous I/O:

// This looks blocking but runs efficiently on virtual threads.
// Each blocking call (getInputStream, readAllBytes) triggers an unmount.
String fetchUrl(URI uri) throws Exception {
    HttpURLConnection conn = (HttpURLConnection) uri.toURL().openConnection();
    conn.setRequestMethod("GET");
    try (InputStream is = conn.getInputStream()) {
        return new String(is.readAllBytes(), StandardCharsets.UTF_8);
    }
}

// Launch 50,000 concurrent HTTP fetches. Each gets a virtual thread.
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    List<Future<String>> futures = urls.stream()
        .map(url -> executor.submit(() -> fetchUrl(url)))
        .toList();

    for (var future : futures) {
        String body = future.get(); // Blocks this virtual thread, not a carrier
        process(body);
    }
}

The equivalent in Node.js requires Promise.all(), which is elegant for simple fan-out but becomes unwieldy for complex control flow with error handling, partial results, and timeouts:

const results = await Promise.all(
  urls.map(url =>
    fetch(url)
      .then(res => res.text())
      .catch(err => ({ error: err.message, url }))
  )
);

Structured Concurrency — Managing Concurrent Task Lifecycles

Java 21 introduced StructuredTaskScope (preview) to manage groups of concurrent tasks as a unit. This prevents the common problem of leaked threads and orphaned tasks.

ShutdownOnFailure — fail fast

If any subtask fails, cancel all remaining subtasks:

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Subtask<User> userTask = scope.fork(() -> fetchUser(userId));
    Subtask<List<Order>> ordersTask = scope.fork(() -> fetchOrders(userId));
    Subtask<CreditScore> creditTask = scope.fork(() -> fetchCreditScore(userId));

    scope.join();           // Wait for all tasks to complete or one to fail
    scope.throwIfFailed();  // Propagate the first exception

    // All three succeeded — safe to access results
    return new UserProfile(
        userTask.get(),
        ordersTask.get(),
        creditTask.get()
    );
}
// Exiting the try-with-resources block cancels any incomplete subtasks.

ShutdownOnSuccess — first result wins

Return the first successful result, cancel the rest:

try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
    scope.fork(() -> fetchFromPrimaryDb(key));
    scope.fork(() -> fetchFromReplicaDb(key));
    scope.fork(() -> fetchFromCache(key));

    scope.join();
    return scope.result();  // Returns the first completed result
}

Comparison with Promise.all() and Promise.race()

Behavior	Java Structured Concurrency	Node.js
All succeed or fail fast	`ShutdownOnFailure`	`Promise.all()`
First success wins	`ShutdownOnSuccess`	`Promise.any()`
All settle (no short-circuit)	Custom scope	`Promise.allSettled()`
First to settle (success or failure)	Custom scope	`Promise.race()`

The critical difference: StructuredTaskScope guarantees that no forked task outlives the scope. When the scope closes, all subtasks are cancelled and joined. In Node.js, a rejected promise in Promise.all() does not cancel the other promises — those HTTP requests continue to completion, consuming resources. You need AbortController for cancellation, which requires explicit wiring:

async function fetchUserProfile(userId) {
  const controller = new AbortController();
  const { signal } = controller;

  try {
    const [user, orders, credit] = await Promise.all([
      fetch(`/users/${userId}`, { signal }).then(r => r.json()),
      fetch(`/orders?user=${userId}`, { signal }).then(r => r.json()),
      fetch(`/credit/${userId}`, { signal }).then(r => r.json()),
    ]);
    return { user, orders, credit };
  } catch (err) {
    controller.abort(); // Cancel remaining requests on failure
    throw err;
  }
}

Backpressure Handling

Backpressure is the mechanism that prevents a fast producer from overwhelming a slow consumer.

Node.js streams

Node.js has built-in backpressure via the streams API. A writable stream has a highWaterMark (default 16 KB for byte streams, 16 objects for object streams). When the internal buffer exceeds this mark, write() returns false, signaling the producer to pause:

const { Transform } = require('stream');

const transformer = new Transform({
  highWaterMark: 1024 * 64, // 64 KB buffer
  transform(chunk, encoding, callback) {
    const processed = expensiveTransform(chunk);
    callback(null, processed);
  }
});

// Pipe automatically handles backpressure between streams.
// If transformer is slow, readableSource pauses automatically.
readableSource.pipe(transformer).pipe(writableDestination);

Java reactive streams

Before virtual threads, Java addressed backpressure through reactive streams (Flow.Publisher, Flow.Subscriber). This works but forces a reactive programming model onto the entire call chain:

// Reactive backpressure — complex, infectious API style
subscription.request(10); // Pull-based: subscriber requests 10 items

Virtual threads with blocking queues

Virtual threads offer a simpler alternative: use a bounded BlockingQueue. The producer blocks when the queue is full; the consumer blocks when it is empty. Because these are virtual threads, blocking is cheap — the JVM unmounts the blocked thread and reclaims the carrier:

BlockingQueue<DataChunk> queue = new ArrayBlockingQueue<>(100);

// Producer virtual thread — blocks when queue is full
Thread.ofVirtual().start(() -> {
    for (DataChunk chunk : dataSource) {
        queue.put(chunk); // Blocks if queue has 100 items — unmounts, does not pin
    }
    queue.put(DataChunk.POISON_PILL);
});

// Consumer virtual thread — blocks when queue is empty
Thread.ofVirtual().start(() -> {
    while (true) {
        DataChunk chunk = queue.take(); // Blocks if queue is empty
        if (chunk == DataChunk.POISON_PILL) break;
        process(chunk);
    }
});

This is the classical producer-consumer pattern. Before virtual threads, this pattern consumed OS threads while blocked. Now it consumes a few kilobytes of heap.

The Database Connection Pooling Gotcha

This is the most common production issue with virtual threads. Many JDBC drivers and connection pools use synchronized blocks internally. When a virtual thread enters a synchronized block and then performs a blocking I/O operation (like sending a query to the database), it gets pinned to the carrier thread.

The problem

// Inside a typical JDBC driver or connection pool (simplified)
public class ConnectionPool {
    // synchronized causes pinning when blocking I/O happens inside
    public synchronized Connection getConnection() {
        while (availableConnections.isEmpty()) {
            wait(); // PINNED — this virtual thread cannot unmount
        }
        return availableConnections.remove(0);
    }
}

If you have 8 carrier threads (on an 8-core machine) and 8 virtual threads get pinned inside synchronized blocks, all carrier threads are occupied. The remaining virtual threads — even thousands of them — cannot run. The system is effectively deadlocked.

The fix

Replace synchronized with ReentrantLock:

public class ConnectionPool {
    private final ReentrantLock lock = new ReentrantLock();
    private final Condition available = lock.newCondition();

    public Connection getConnection() throws InterruptedException {
        lock.lock(); // Virtual thread parks here without pinning
        try {
            while (availableConnections.isEmpty()) {
                available.await(); // Unmounts cleanly — no pinning
            }
            return availableConnections.remove(0);
        } finally {
            lock.unlock();
        }
    }
}

ReentrantLock and its associated Condition are virtual-thread-friendly. When a virtual thread calls lock.lock() and the lock is held, or calls condition.await(), it parks and unmounts from the carrier thread cleanly.

Connection pool sizing

With traditional thread pools, you size the connection pool to match the thread pool. 200 threads → ~200 connections makes sense.

With virtual threads, you can have 100,000 concurrent tasks but your database cannot handle 100,000 connections. You still need a bounded connection pool — typically 20–50 connections for most relational databases. The virtual threads will park while waiting for a connection, which is fine. The key is using a pool implementation that does not pin (HikariCP 5.1+ and most modern pools have been updated).

Use the JVM flag -Djdk.tracePinnedThreads=short during development to detect pinning:

Thread[#42,ForkJoinPool-1-worker-3,5,CarrierThreads]
    java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:183)
    java.base/java.lang.VirtualThread.parkOnCarrierThread(VirtualThread.java:556)

Benchmarks

All benchmarks were run on identical hardware: AWS c6i.4xlarge (16 vCPU, 32 GB RAM, Amazon Linux 2023). Each scenario was run 5 times with a 30-second warmup; results are the median of 5 runs.

✓Java 21.0.2 (GraalVM CE), flags: -Xmx8g -Djdk.virtualThreadScheduler.parallelism=16
✓Node.js 20.11.0, flags: --max-old-space-size=8192
✓Load generator: wrk2 running on a separate c6i.4xlarge, connected via 25 Gbps network

HTTP Request Throughput (simple JSON response, ~500 bytes)

Concurrent connections	Java (virtual threads)	Java (platform threads, 200 pool)	Node.js (cluster, 16 workers)
1,000	142,000 req/s	138,000 req/s	127,000 req/s
10,000	139,000 req/s	91,000 req/s	124,000 req/s
50,000	131,000 req/s	43,000 req/s	118,000 req/s
100,000	122,000 req/s	OOM / crashed	108,000 req/s

Java platform threads collapse beyond 10K connections because the pool is saturated and queuing adds latency. Node.js maintains relatively consistent throughput because the event loop does not care how many connections are open — it processes events as they arrive.

Database Query Throughput (PostgreSQL, simple SELECT, connection pool of 40)

Concurrent tasks	Java (virtual threads)	Java (platform threads, 200 pool)	Node.js (pg pool, 40 conn)
100	18,200 q/s	18,400 q/s	16,800 q/s
1,000	17,900 q/s	17,100 q/s	16,500 q/s
10,000	17,400 q/s	14,200 q/s	15,900 q/s
50,000	16,800 q/s	6,300 q/s	15,100 q/s

Database throughput is bottlenecked by the connection pool and database itself. All three approaches converge near the database's limit at low concurrency. The difference emerges under load: virtual threads park efficiently while waiting for a connection, platform threads consume memory and context-switch overhead, and Node.js performs well here since database I/O is naturally async in the pg driver.

JSON Serialization (CPU-bound, 10 KB payload)

Metric	Java (Jackson)	Node.js (JSON.stringify)
Single-thread throughput	320,000 ops/s	145,000 ops/s
16-core parallel throughput	4,800,000 ops/s	2,100,000 ops/s (16 workers)
p99 latency (under load)	0.4 ms	1.1 ms

Java's JIT compiler produces more optimized machine code for tight loops. Node.js V8 JIT is excellent but operates under more constraints (dynamic typing, shape polymorphism). For CPU-bound serialization, Java consistently outperforms Node.js by 2–2.5x.

Decision Framework

Choose Java virtual threads when:

✓Your workload is I/O-bound with many concurrent blocking operations (database queries, HTTP calls, file I/O).
✓Your team has Java expertise and an existing Java ecosystem (Spring Boot 3.2+, Quarkus, Helidon).
✓You need fine-grained CPU-bound parallelism alongside I/O concurrency.
✓You are building services that need to handle 50K+ concurrent connections per instance.
✓You want to write straight-line blocking code without callback chains or reactive operators.

Choose Node.js when:

✓Your service is primarily an I/O proxy — receiving requests, calling other services, returning results — with minimal CPU processing.
✓Your team is JavaScript/TypeScript-native and shares a frontend codebase.
✓You are building real-time systems (WebSocket servers, chat, live dashboards) where the event-driven model is natural.
✓Startup time and cold-start latency matter (serverless, edge functions). Node.js starts in ~50 ms; a JVM needs 500 ms–2 seconds.
✓Your ecosystem is npm-centric, and rewriting in Java would mean abandoning well-tested libraries.

Neither is categorically better. The right choice depends on the workload profile, team capability, and existing infrastructure.

Case Study: Healthcare Data Aggregation Service

A healthcare client needed a service to aggregate patient records from 200+ hospital APIs in real time. Each aggregation request fans out to 15–40 hospital endpoints simultaneously, collects responses, normalizes the data into FHIR-compliant resources, and returns a unified patient record. The SLA required p99 latency under 2 seconds with sustained throughput of 500 aggregation requests per second.

Stripe Systems Engineering benchmarked two implementations head-to-head to make an evidence-based architecture decision.

Benchmark Setup

Hardware: 3x AWS c6i.4xlarge (16 vCPU, 32 GB RAM) behind an NLB. Simulated hospital APIs running on a separate fleet, introducing 50–200ms random latency per response.

Java implementation: Java 21.0.2 (GraalVM CE), Spring Boot 3.2.3 with virtual threads enabled (spring.threads.virtual.enabled=true). JVM flags: -Xmx12g -Xms12g -XX:+UseZGC -Djdk.virtualThreadScheduler.parallelism=16. HTTP client: java.net.http.HttpClient (supports virtual threads natively).

Node.js implementation: Node.js 20.11.0, Express 4.18 + undici HTTP client. Cluster mode with 16 workers. --max-old-space-size=12288.

Both implementations used the same normalization logic (ported between Java and TypeScript), the same PostgreSQL database for caching (HikariCP / pg pool, 40 connections each), and identical API endpoint contracts.

Java virtual threads implementation (core fan-out)

public PatientRecord aggregate(String patientId, List<HospitalEndpoint> endpoints)
        throws InterruptedException, ExecutionException {

    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
        List<Subtask<HospitalResponse>> tasks = endpoints.stream()
            .map(endpoint -> scope.fork(() -> {
                HttpRequest req = HttpRequest.newBuilder()
                    .uri(endpoint.buildUri(patientId))
                    .header("Authorization", "Bearer " + endpoint.getToken())
                    .timeout(Duration.ofSeconds(5))
                    .build();

                HttpResponse<String> resp = httpClient.send(
                    req, HttpResponse.BodyHandlers.ofString()
                );

                if (resp.statusCode() != 200) {
                    throw new HospitalApiException(endpoint.name(), resp.statusCode());
                }
                return parseResponse(endpoint.format(), resp.body());
            }))
            .toList();

        scope.joinUntil(Instant.now().plusSeconds(8));
        scope.throwIfFailed();

        List<HospitalResponse> responses = tasks.stream()
            .map(Subtask::get)
            .toList();

        return fhirNormalizer.normalize(patientId, responses);
    }
}

Node.js implementation (core fan-out)

async function aggregate(patientId, endpoints) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 8000);

  try {
    const responses = await Promise.all(
      endpoints.map(async (endpoint) => {
        const url = endpoint.buildUri(patientId);
        const resp = await fetch(url, {
          headers: { Authorization: `Bearer ${endpoint.token}` },
          signal: controller.signal,
        });

        if (!resp.ok) {
          throw new HospitalApiError(endpoint.name, resp.status);
        }
        return parseResponse(endpoint.format, await resp.text());
      })
    );

    return fhirNormalizer.normalize(patientId, responses);
  } finally {
    clearTimeout(timeout);
  }
}

Results Under Load

Aggregation throughput (requests/sec, each request fans out to ~25 hospital APIs):

Concurrent aggregation requests	Java (virtual threads)	Node.js (cluster, 16 workers)
100	520 req/s	490 req/s
500	510 req/s	475 req/s
1,000	495 req/s	440 req/s
5,000	470 req/s	380 req/s
10,000	440 req/s	310 req/s

At 10K concurrent aggregation requests, each fanning out to ~25 endpoints, the system handles ~250,000 concurrent outbound HTTP connections. The Java implementation maintained 440 req/s; Node.js dropped to 310 req/s as garbage collection pressure from 250K in-flight promise chains increased GC pause times.

p99 latency (ms):

Concurrent aggregation requests	Java (virtual threads)	Node.js (cluster, 16 workers)
100	620 ms	640 ms
500	710 ms	780 ms
1,000	890 ms	1,100 ms
5,000	1,400 ms	2,800 ms
10,000	1,900 ms	4,200 ms

The latency divergence at scale is the most significant finding. At 5,000 concurrent requests, Node.js breached the 2-second SLA at p99. Java virtual threads stayed within SLA up to ~8,000 concurrent requests.

Memory consumption (RSS, steady state at 5,000 concurrent requests):

Metric	Java (virtual threads)	Node.js (16 workers)
RSS per instance	6.2 GB	8.4 GB (16 × ~525 MB)
GC pause p99	3.1 ms (ZGC)	48 ms (V8 major GC)
Virtual thread / promise count	~125,000	~125,000 per worker
Carrier threads / event loops	16	16

The GC pause difference is critical for tail latency. ZGC keeps p99 pauses under 10 ms regardless of heap size. V8's generational GC produces occasional major pauses of 30–80 ms under heavy allocation, which directly impacts p99 response latency.

Architecture Decision

Stripe Systems selected Java 21 virtual threads for this service based on three factors:

✓
SLA compliance at scale: The Java implementation met the 2-second p99 SLA at 3x the projected peak load (5,000 concurrent requests vs projected 1,500). The Node.js implementation breached SLA at 2x projected peak.
✓
Simpler error handling: StructuredTaskScope with joinUntil() provided timeout semantics and automatic cancellation of in-flight hospital API calls. The Node.js AbortController pattern required more boilerplate and was easier to get wrong — in early testing, we discovered several code paths where the abort signal was not propagated, leading to leaked connections.
✓
Operational predictability: ZGC's sub-10ms pause times meant tail latency was stable and predictable. V8 GC pauses introduced latency spikes that were harder to reason about under load.

The service has been in production for seven months, handling an average of 1,200 aggregation requests per second across 3 instances, with p99 latency at 820 ms.

Conclusion

Virtual threads do not make Java faster. They make Java capable of handling more concurrent I/O-bound work with fewer resources and simpler code. Node.js does not need virtual threads — its event loop already handles massive concurrency for I/O-bound workloads. The trade-offs are in programming model complexity, CPU-bound performance, and behavior under extreme concurrency.

Measure your workload. Benchmark on your hardware. Choose the model that fits the problem, not the one that fits the hype cycle.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

Backend Development

Server-side systems designed for correctness, observability, and horizontal scalability.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting co...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

A default Docker image built from `node:18` or `python:3.11` ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary p...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pric...

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers me...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public inter...

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there ...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

Backend Development📅 February 10, 2026· 17 min read

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

✍️

Stripe Systems Engineering

The Node.js Event Loop — What Actually Happens

Node.js runs JavaScript on a single thread. All user code executes in one call stack. Concurrency comes from non-blocking I/O and an event loop implemented by libuv.

The event loop has distinct phases, executed in order on each tick:

✓Timers — Executes callbacks scheduled by setTimeout() and setInterval() whose threshold has elapsed.
✓Pending callbacks — Executes I/O callbacks deferred from the previous tick (e.g., TCP error callbacks).
✓Idle/Prepare — Internal housekeeping. Not relevant to application code.
✓Poll — Retrieves new I/O events. Executes I/O-related callbacks (excluding timers, close callbacks, and setImmediate). If there are no timers scheduled, the loop blocks here waiting for I/O.
✓Check — Executes setImmediate() callbacks.
✓Close callbacks — Executes close event callbacks (e.g., socket.on('close', ...)).

Here is how async/await maps onto this:

async function fetchUserOrders(userId) {
  // This line suspends the function and registers a microtask
  // for when the DB query completes
  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);

  // When the I/O completes, libuv places the callback in the poll queue.
  // The resolved promise callback goes into the microtask queue.
  // This line runs in a subsequent event loop tick.
  const orders = await db.query(
    'SELECT * FROM orders WHERE user_id = $1', [user.id]
  );

  return { user, orders };
}

Java Traditional Threading — The OS Thread Tax

Before virtual threads, Java's concurrency model was straightforward: one OS thread per concurrent task. The ExecutorService manages a pool of these threads:

ExecutorService pool = Executors.newFixedThreadPool(200);

for (int i = 0; i < 10_000; i++) {
    pool.submit(() -> {
        // Each task gets an OS thread from the pool
        var result = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
        processResult(result.body());
    });
}

The problem is quantifiable. Each OS thread in a typical Linux/JVM configuration consumes:

✓Stack memory: ~1 MB default (-Xss1m). Even with -Xss256k, a thread still consumes 256 KB of committed memory.
✓Kernel data structures: ~8–16 KB per thread for the OS task struct, kernel stack, and page tables.
✓Context switch cost: 1–10 microseconds per switch on modern hardware. Under heavy contention, this becomes the dominant cost.

This is why Java servers historically used thread pools capped at 200–500 threads, which caps concurrent I/O operations at that same number.

Java Virtual Threads — How They Actually Work

The mechanics:

✓Mounting: When a virtual thread is ready to run, the JVM scheduler mounts it onto a carrier thread. The virtual thread's continuation (its saved stack) is loaded, and execution resumes.
✓Unmounting: When a virtual thread performs a blocking operation (socket read, Thread.sleep(), Lock.lock()), the JVM unmounts it from the carrier thread. The continuation is saved to heap memory. The carrier thread is free to run another virtual thread.
✓Continuations: Each virtual thread has a continuation — a representation of its call stack stored on the heap. This is what makes them lightweight. The continuation is typically a few hundred bytes to a few KB, depending on stack depth.

// Creating virtual threads directly
Thread vt = Thread.ofVirtual().name("worker-", 0).start(() -> {
    // This runs on a virtual thread.
    // Blocking calls here unmount this thread from the carrier,
    // freeing the carrier for other virtual threads.
    var response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
    process(response);
});

// Using the executor — preferred for task submission
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    IntStream.range(0, 100_000).forEach(i -> {
        executor.submit(() -> {
            // Each task gets its own virtual thread.
            // 100K virtual threads is routine — this is the point.
            Thread.sleep(Duration.ofSeconds(1));
            return fetchFromDatabase(i);
        });
    });
}
// ExecutorService.close() waits for all tasks to complete

Pinning is the critical failure mode. A virtual thread gets pinned to its carrier thread when:

✓It is inside a synchronized block or method and performs a blocking operation.
✓It calls native code via JNI that blocks.

When pinned, the virtual thread cannot unmount. The carrier thread is stuck, reducing the pool of available carriers. We will revisit this in the database pooling section.

Memory Footprint — The Numbers

Consider a server handling 1,000,000 concurrent connections, each waiting on I/O:

Model	Memory per connection	Total for 1M connections
Java OS threads (1 MB stack)	~1 MB	~1 TB (impossible)
Java OS threads (256 KB stack)	~256 KB	~256 GB (impractical)
Java virtual threads	~1–5 KB (heap continuation)	~1–5 GB
Node.js event loop	~1–2 KB (connection state in libuv + JS object)	~1–2 GB

CPU-Bound vs I/O-Bound — Where Each Model Breaks

Node.js and CPU-bound work

The event loop blocks on CPU work. There is no concurrent execution of JavaScript — period:

const crypto = require('crypto');

// This blocks the event loop for ~200ms on typical hardware.
// During this time, ZERO I/O callbacks are processed.
function hashPassword(password) {
  return crypto.pbkdf2Sync(password, 'salt', 100000, 64, 'sha512');
}

// Correct approach: offload to a worker thread
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

if (isMainThread) {
  function hashPasswordAsync(password) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename, { workerData: { password } });
      worker.on('message', resolve);
      worker.on('error', reject);
    });
  }
} else {
  const hash = crypto.pbkdf2Sync(workerData.password, 'salt', 100000, 64, 'sha512');
  parentPort.postMessage(hash.toString('hex'));
}

Worker threads help, but they have overhead: each spawns a new V8 isolate (~5–10 MB), and data transfer between threads requires serialization (structured clone or SharedArrayBuffer).

Virtual threads and CPU-bound work

// This virtual thread will monopolize a carrier thread
// for the entire duration of the computation.
Thread.ofVirtual().start(() -> {
    // Pure CPU work — no yield points
    BigInteger result = BigInteger.ONE;
    for (int i = 2; i <= 100_000; i++) {
        result = result.multiply(BigInteger.valueOf(i));
    }
});

I/O-bound work — the sweet spot for virtual threads

This is where virtual threads shine. Code that reads like sequential blocking logic executes with the efficiency of asynchronous I/O:

// This looks blocking but runs efficiently on virtual threads.
// Each blocking call (getInputStream, readAllBytes) triggers an unmount.
String fetchUrl(URI uri) throws Exception {
    HttpURLConnection conn = (HttpURLConnection) uri.toURL().openConnection();
    conn.setRequestMethod("GET");
    try (InputStream is = conn.getInputStream()) {
        return new String(is.readAllBytes(), StandardCharsets.UTF_8);
    }
}

// Launch 50,000 concurrent HTTP fetches. Each gets a virtual thread.
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    List<Future<String>> futures = urls.stream()
        .map(url -> executor.submit(() -> fetchUrl(url)))
        .toList();

    for (var future : futures) {
        String body = future.get(); // Blocks this virtual thread, not a carrier
        process(body);
    }
}

The equivalent in Node.js requires Promise.all(), which is elegant for simple fan-out but becomes unwieldy for complex control flow with error handling, partial results, and timeouts:

const results = await Promise.all(
  urls.map(url =>
    fetch(url)
      .then(res => res.text())
      .catch(err => ({ error: err.message, url }))
  )
);

Structured Concurrency — Managing Concurrent Task Lifecycles

Java 21 introduced StructuredTaskScope (preview) to manage groups of concurrent tasks as a unit. This prevents the common problem of leaked threads and orphaned tasks.

ShutdownOnFailure — fail fast

If any subtask fails, cancel all remaining subtasks:

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Subtask<User> userTask = scope.fork(() -> fetchUser(userId));
    Subtask<List<Order>> ordersTask = scope.fork(() -> fetchOrders(userId));
    Subtask<CreditScore> creditTask = scope.fork(() -> fetchCreditScore(userId));

    scope.join();           // Wait for all tasks to complete or one to fail
    scope.throwIfFailed();  // Propagate the first exception

    // All three succeeded — safe to access results
    return new UserProfile(
        userTask.get(),
        ordersTask.get(),
        creditTask.get()
    );
}
// Exiting the try-with-resources block cancels any incomplete subtasks.

ShutdownOnSuccess — first result wins

Return the first successful result, cancel the rest:

try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
    scope.fork(() -> fetchFromPrimaryDb(key));
    scope.fork(() -> fetchFromReplicaDb(key));
    scope.fork(() -> fetchFromCache(key));

    scope.join();
    return scope.result();  // Returns the first completed result
}

Comparison with Promise.all() and Promise.race()

Behavior	Java Structured Concurrency	Node.js
All succeed or fail fast	`ShutdownOnFailure`	`Promise.all()`
First success wins	`ShutdownOnSuccess`	`Promise.any()`
All settle (no short-circuit)	Custom scope	`Promise.allSettled()`
First to settle (success or failure)	Custom scope	`Promise.race()`

async function fetchUserProfile(userId) {
  const controller = new AbortController();
  const { signal } = controller;

  try {
    const [user, orders, credit] = await Promise.all([
      fetch(`/users/${userId}`, { signal }).then(r => r.json()),
      fetch(`/orders?user=${userId}`, { signal }).then(r => r.json()),
      fetch(`/credit/${userId}`, { signal }).then(r => r.json()),
    ]);
    return { user, orders, credit };
  } catch (err) {
    controller.abort(); // Cancel remaining requests on failure
    throw err;
  }
}

Backpressure Handling

Backpressure is the mechanism that prevents a fast producer from overwhelming a slow consumer.

Node.js streams

const { Transform } = require('stream');

const transformer = new Transform({
  highWaterMark: 1024 * 64, // 64 KB buffer
  transform(chunk, encoding, callback) {
    const processed = expensiveTransform(chunk);
    callback(null, processed);
  }
});

// Pipe automatically handles backpressure between streams.
// If transformer is slow, readableSource pauses automatically.
readableSource.pipe(transformer).pipe(writableDestination);

Java reactive streams

Before virtual threads, Java addressed backpressure through reactive streams (Flow.Publisher, Flow.Subscriber). This works but forces a reactive programming model onto the entire call chain:

// Reactive backpressure — complex, infectious API style
subscription.request(10); // Pull-based: subscriber requests 10 items

Virtual threads with blocking queues

BlockingQueue<DataChunk> queue = new ArrayBlockingQueue<>(100);

// Producer virtual thread — blocks when queue is full
Thread.ofVirtual().start(() -> {
    for (DataChunk chunk : dataSource) {
        queue.put(chunk); // Blocks if queue has 100 items — unmounts, does not pin
    }
    queue.put(DataChunk.POISON_PILL);
});

// Consumer virtual thread — blocks when queue is empty
Thread.ofVirtual().start(() -> {
    while (true) {
        DataChunk chunk = queue.take(); // Blocks if queue is empty
        if (chunk == DataChunk.POISON_PILL) break;
        process(chunk);
    }
});

This is the classical producer-consumer pattern. Before virtual threads, this pattern consumed OS threads while blocked. Now it consumes a few kilobytes of heap.

The Database Connection Pooling Gotcha

The problem

// Inside a typical JDBC driver or connection pool (simplified)
public class ConnectionPool {
    // synchronized causes pinning when blocking I/O happens inside
    public synchronized Connection getConnection() {
        while (availableConnections.isEmpty()) {
            wait(); // PINNED — this virtual thread cannot unmount
        }
        return availableConnections.remove(0);
    }
}

The fix

Replace synchronized with ReentrantLock:

public class ConnectionPool {
    private final ReentrantLock lock = new ReentrantLock();
    private final Condition available = lock.newCondition();

    public Connection getConnection() throws InterruptedException {
        lock.lock(); // Virtual thread parks here without pinning
        try {
            while (availableConnections.isEmpty()) {
                available.await(); // Unmounts cleanly — no pinning
            }
            return availableConnections.remove(0);
        } finally {
            lock.unlock();
        }
    }
}

Connection pool sizing

With traditional thread pools, you size the connection pool to match the thread pool. 200 threads → ~200 connections makes sense.

Use the JVM flag -Djdk.tracePinnedThreads=short during development to detect pinning:

Thread[#42,ForkJoinPool-1-worker-3,5,CarrierThreads]
    java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:183)
    java.base/java.lang.VirtualThread.parkOnCarrierThread(VirtualThread.java:556)

Benchmarks

All benchmarks were run on identical hardware: AWS c6i.4xlarge (16 vCPU, 32 GB RAM, Amazon Linux 2023). Each scenario was run 5 times with a 30-second warmup; results are the median of 5 runs.

✓Java 21.0.2 (GraalVM CE), flags: -Xmx8g -Djdk.virtualThreadScheduler.parallelism=16
✓Node.js 20.11.0, flags: --max-old-space-size=8192
✓Load generator: wrk2 running on a separate c6i.4xlarge, connected via 25 Gbps network

HTTP Request Throughput (simple JSON response, ~500 bytes)

Concurrent connections	Java (virtual threads)	Java (platform threads, 200 pool)	Node.js (cluster, 16 workers)
1,000	142,000 req/s	138,000 req/s	127,000 req/s
10,000	139,000 req/s	91,000 req/s	124,000 req/s
50,000	131,000 req/s	43,000 req/s	118,000 req/s
100,000	122,000 req/s	OOM / crashed	108,000 req/s

Database Query Throughput (PostgreSQL, simple SELECT, connection pool of 40)

Concurrent tasks	Java (virtual threads)	Java (platform threads, 200 pool)	Node.js (pg pool, 40 conn)
100	18,200 q/s	18,400 q/s	16,800 q/s
1,000	17,900 q/s	17,100 q/s	16,500 q/s
10,000	17,400 q/s	14,200 q/s	15,900 q/s
50,000	16,800 q/s	6,300 q/s	15,100 q/s

JSON Serialization (CPU-bound, 10 KB payload)

Metric	Java (Jackson)	Node.js (JSON.stringify)
Single-thread throughput	320,000 ops/s	145,000 ops/s
16-core parallel throughput	4,800,000 ops/s	2,100,000 ops/s (16 workers)
p99 latency (under load)	0.4 ms	1.1 ms

Decision Framework

Choose Java virtual threads when:

✓Your workload is I/O-bound with many concurrent blocking operations (database queries, HTTP calls, file I/O).
✓Your team has Java expertise and an existing Java ecosystem (Spring Boot 3.2+, Quarkus, Helidon).
✓You need fine-grained CPU-bound parallelism alongside I/O concurrency.
✓You are building services that need to handle 50K+ concurrent connections per instance.
✓You want to write straight-line blocking code without callback chains or reactive operators.

Choose Node.js when:

✓Your service is primarily an I/O proxy — receiving requests, calling other services, returning results — with minimal CPU processing.
✓Your team is JavaScript/TypeScript-native and shares a frontend codebase.
✓You are building real-time systems (WebSocket servers, chat, live dashboards) where the event-driven model is natural.
✓Startup time and cold-start latency matter (serverless, edge functions). Node.js starts in ~50 ms; a JVM needs 500 ms–2 seconds.
✓Your ecosystem is npm-centric, and rewriting in Java would mean abandoning well-tested libraries.

Neither is categorically better. The right choice depends on the workload profile, team capability, and existing infrastructure.

Case Study: Healthcare Data Aggregation Service

Stripe Systems Engineering benchmarked two implementations head-to-head to make an evidence-based architecture decision.

Benchmark Setup

Hardware: 3x AWS c6i.4xlarge (16 vCPU, 32 GB RAM) behind an NLB. Simulated hospital APIs running on a separate fleet, introducing 50–200ms random latency per response.

Node.js implementation: Node.js 20.11.0, Express 4.18 + undici HTTP client. Cluster mode with 16 workers. --max-old-space-size=12288.

Java virtual threads implementation (core fan-out)

public PatientRecord aggregate(String patientId, List<HospitalEndpoint> endpoints)
        throws InterruptedException, ExecutionException {

    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
        List<Subtask<HospitalResponse>> tasks = endpoints.stream()
            .map(endpoint -> scope.fork(() -> {
                HttpRequest req = HttpRequest.newBuilder()
                    .uri(endpoint.buildUri(patientId))
                    .header("Authorization", "Bearer " + endpoint.getToken())
                    .timeout(Duration.ofSeconds(5))
                    .build();

                HttpResponse<String> resp = httpClient.send(
                    req, HttpResponse.BodyHandlers.ofString()
                );

                if (resp.statusCode() != 200) {
                    throw new HospitalApiException(endpoint.name(), resp.statusCode());
                }
                return parseResponse(endpoint.format(), resp.body());
            }))
            .toList();

        scope.joinUntil(Instant.now().plusSeconds(8));
        scope.throwIfFailed();

        List<HospitalResponse> responses = tasks.stream()
            .map(Subtask::get)
            .toList();

        return fhirNormalizer.normalize(patientId, responses);
    }
}

Node.js implementation (core fan-out)

async function aggregate(patientId, endpoints) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 8000);

  try {
    const responses = await Promise.all(
      endpoints.map(async (endpoint) => {
        const url = endpoint.buildUri(patientId);
        const resp = await fetch(url, {
          headers: { Authorization: `Bearer ${endpoint.token}` },
          signal: controller.signal,
        });

        if (!resp.ok) {
          throw new HospitalApiError(endpoint.name, resp.status);
        }
        return parseResponse(endpoint.format, await resp.text());
      })
    );

    return fhirNormalizer.normalize(patientId, responses);
  } finally {
    clearTimeout(timeout);
  }
}

Results Under Load

Aggregation throughput (requests/sec, each request fans out to ~25 hospital APIs):

Concurrent aggregation requests	Java (virtual threads)	Node.js (cluster, 16 workers)
100	520 req/s	490 req/s
500	510 req/s	475 req/s
1,000	495 req/s	440 req/s
5,000	470 req/s	380 req/s
10,000	440 req/s	310 req/s

p99 latency (ms):

Concurrent aggregation requests	Java (virtual threads)	Node.js (cluster, 16 workers)
100	620 ms	640 ms
500	710 ms	780 ms
1,000	890 ms	1,100 ms
5,000	1,400 ms	2,800 ms
10,000	1,900 ms	4,200 ms

Memory consumption (RSS, steady state at 5,000 concurrent requests):

Metric	Java (virtual threads)	Node.js (16 workers)
RSS per instance	6.2 GB	8.4 GB (16 × ~525 MB)
GC pause p99	3.1 ms (ZGC)	48 ms (V8 major GC)
Virtual thread / promise count	~125,000	~125,000 per worker
Carrier threads / event loops	16	16

Architecture Decision

Stripe Systems selected Java 21 virtual threads for this service based on three factors:

✓
SLA compliance at scale: The Java implementation met the 2-second p99 SLA at 3x the projected peak load (5,000 concurrent requests vs projected 1,500). The Node.js implementation breached SLA at 2x projected peak.
✓
Simpler error handling: StructuredTaskScope with joinUntil() provided timeout semantics and automatic cancellation of in-flight hospital API calls. The Node.js AbortController pattern required more boilerplate and was easier to get wrong — in early testing, we discovered several code paths where the abort signal was not propagated, leading to leaked connections.
✓
Operational predictability: ZGC's sub-10ms pause times meant tail latency was stable and predictable. V8 GC pauses introduced latency spikes that were harder to reason about under load.

The service has been in production for seven months, handling an average of 1,200 aggregation requests per second across 3 instances, with p99 latency at 820 ms.

Conclusion

Measure your workload. Benchmark on your hardware. Choose the model that fits the problem, not the one that fits the hype cycle.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

Backend Development

Server-side systems designed for correctness, observability, and horizontal scalability.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

The Node.js Event Loop — What Actually Happens

Java Traditional Threading — The OS Thread Tax

Java Virtual Threads — How They Actually Work

Memory Footprint — The Numbers

CPU-Bound vs I/O-Bound — Where Each Model Breaks

Node.js and CPU-bound work

Virtual threads and CPU-bound work

I/O-bound work — the sweet spot for virtual threads

Structured Concurrency — Managing Concurrent Task Lifecycles

ShutdownOnFailure — fail fast

ShutdownOnSuccess — first result wins

Comparison with Promise.all() and Promise.race()

Backpressure Handling

Node.js streams

Java reactive streams

Virtual threads with blocking queues

The Database Connection Pooling Gotcha

The problem

The fix

Connection pool sizing

Benchmarks

HTTP Request Throughput (simple JSON response, ~500 bytes)

Database Query Throughput (PostgreSQL, simple SELECT, connection pool of 40)

JSON Serialization (CPU-bound, 10 KB payload)

Decision Framework

Case Study: Healthcare Data Aggregation Service

Benchmark Setup

Java virtual threads implementation (core fan-out)

Node.js implementation (core fan-out)

Results Under Load

Architecture Decision

Conclusion

Related Services from Stripe Systems

Backend Development

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

Staff Augmentation — A Practical Guide for Engineering Leaders

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Why Custom Software Development Matters for Growing Businesses