Skip to main content
Stripe SystemsStripe Systems
Backend Development📅 February 10, 2026· 17 min read

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

✍️
Stripe Systems Engineering

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather than the OS — and this fundamentally changes the calculus for Java backend engineers. Meanwhile, Node.js continues to serve millions of production systems with its single-threaded event loop.

This post dissects both models at the implementation level. No hand-waving. We will walk through memory layouts, scheduling mechanics, failure modes, and real benchmark data from a production healthcare integration service.

The Node.js Event Loop — What Actually Happens

Node.js runs JavaScript on a single thread. All user code executes in one call stack. Concurrency comes from non-blocking I/O and an event loop implemented by libuv.

The event loop has distinct phases, executed in order on each tick:

  1. Timers — Executes callbacks scheduled by setTimeout() and setInterval() whose threshold has elapsed.
  2. Pending callbacks — Executes I/O callbacks deferred from the previous tick (e.g., TCP error callbacks).
  3. Idle/Prepare — Internal housekeeping. Not relevant to application code.
  4. Poll — Retrieves new I/O events. Executes I/O-related callbacks (excluding timers, close callbacks, and setImmediate). If there are no timers scheduled, the loop blocks here waiting for I/O.
  5. Check — Executes setImmediate() callbacks.
  6. Close callbacks — Executes close event callbacks (e.g., socket.on('close', ...)).

Between each phase, Node.js drains the microtask queue — this is where resolved Promise callbacks and process.nextTick() callbacks execute. process.nextTick() always runs before other microtasks.

Here is how async/await maps onto this:

async function fetchUserOrders(userId) {
  // This line suspends the function and registers a microtask
  // for when the DB query completes
  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);

  // When the I/O completes, libuv places the callback in the poll queue.
  // The resolved promise callback goes into the microtask queue.
  // This line runs in a subsequent event loop tick.
  const orders = await db.query(
    'SELECT * FROM orders WHERE user_id = $1', [user.id]
  );

  return { user, orders };
}

The critical constraint: the single thread must never block. Any synchronous CPU work longer than a few milliseconds degrades throughput for all concurrent connections. This is not a theoretical concern — a 50ms JSON parse of a large payload blocks 50ms of I/O processing for every other connected client.

Java Traditional Threading — The OS Thread Tax

Before virtual threads, Java's concurrency model was straightforward: one OS thread per concurrent task. The ExecutorService manages a pool of these threads:

ExecutorService pool = Executors.newFixedThreadPool(200);

for (int i = 0; i < 10_000; i++) {
    pool.submit(() -> {
        // Each task gets an OS thread from the pool
        var result = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
        processResult(result.body());
    });
}

The problem is quantifiable. Each OS thread in a typical Linux/JVM configuration consumes:

  • Stack memory: ~1 MB default (-Xss1m). Even with -Xss256k, a thread still consumes 256 KB of committed memory.
  • Kernel data structures: ~8–16 KB per thread for the OS task struct, kernel stack, and page tables.
  • Context switch cost: 1–10 microseconds per switch on modern hardware. Under heavy contention, this becomes the dominant cost.

At 10,000 threads, you are consuming ~10 GB of stack memory alone. At 100,000 threads, most systems cannot even create them — Linux defaults to a maximum of ~32,000 threads per process (/proc/sys/kernel/threads-max), and the JVM will throw OutOfMemoryError well before that.

This is why Java servers historically used thread pools capped at 200–500 threads, which caps concurrent I/O operations at that same number.

Java Virtual Threads — How They Actually Work

Virtual threads are user-mode threads managed by the JVM. They are not OS threads. The JVM schedules virtual threads onto a small pool of OS threads called carrier threads (by default, one per CPU core, managed by a ForkJoinPool).

The mechanics:

  1. Mounting: When a virtual thread is ready to run, the JVM scheduler mounts it onto a carrier thread. The virtual thread's continuation (its saved stack) is loaded, and execution resumes.
  2. Unmounting: When a virtual thread performs a blocking operation (socket read, Thread.sleep(), Lock.lock()), the JVM unmounts it from the carrier thread. The continuation is saved to heap memory. The carrier thread is free to run another virtual thread.
  3. Continuations: Each virtual thread has a continuation — a representation of its call stack stored on the heap. This is what makes them lightweight. The continuation is typically a few hundred bytes to a few KB, depending on stack depth.
// Creating virtual threads directly
Thread vt = Thread.ofVirtual().name("worker-", 0).start(() -> {
    // This runs on a virtual thread.
    // Blocking calls here unmount this thread from the carrier,
    // freeing the carrier for other virtual threads.
    var response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
    process(response);
});

// Using the executor — preferred for task submission
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    IntStream.range(0, 100_000).forEach(i -> {
        executor.submit(() -> {
            // Each task gets its own virtual thread.
            // 100K virtual threads is routine — this is the point.
            Thread.sleep(Duration.ofSeconds(1));
            return fetchFromDatabase(i);
        });
    });
}
// ExecutorService.close() waits for all tasks to complete

Pinning is the critical failure mode. A virtual thread gets pinned to its carrier thread when:

  • It is inside a synchronized block or method and performs a blocking operation.
  • It calls native code via JNI that blocks.

When pinned, the virtual thread cannot unmount. The carrier thread is stuck, reducing the pool of available carriers. We will revisit this in the database pooling section.

Memory Footprint — The Numbers

Consider a server handling 1,000,000 concurrent connections, each waiting on I/O:

ModelMemory per connectionTotal for 1M connections
Java OS threads (1 MB stack)~1 MB~1 TB (impossible)
Java OS threads (256 KB stack)~256 KB~256 GB (impractical)
Java virtual threads~1–5 KB (heap continuation)~1–5 GB
Node.js event loop~1–2 KB (connection state in libuv + JS object)~1–2 GB

Virtual threads and Node.js converge on similar memory profiles for connection state. The difference is in how you write code against them. Node.js requires non-blocking APIs everywhere. Virtual threads let you write blocking code that behaves non-blockingly under the hood.

The JVM does carry a fixed overhead that Node.js does not: the JIT compiler, class metadata, garbage collector structures. A minimal Java process starts at ~50–100 MB. A minimal Node.js process starts at ~30–50 MB. This fixed cost is irrelevant at scale but matters for microservices on constrained containers.

CPU-Bound vs I/O-Bound — Where Each Model Breaks

Node.js and CPU-bound work

The event loop blocks on CPU work. There is no concurrent execution of JavaScript — period:

const crypto = require('crypto');

// This blocks the event loop for ~200ms on typical hardware.
// During this time, ZERO I/O callbacks are processed.
function hashPassword(password) {
  return crypto.pbkdf2Sync(password, 'salt', 100000, 64, 'sha512');
}

// Correct approach: offload to a worker thread
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

if (isMainThread) {
  function hashPasswordAsync(password) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename, { workerData: { password } });
      worker.on('message', resolve);
      worker.on('error', reject);
    });
  }
} else {
  const hash = crypto.pbkdf2Sync(workerData.password, 'salt', 100000, 64, 'sha512');
  parentPort.postMessage(hash.toString('hex'));
}

Worker threads help, but they have overhead: each spawns a new V8 isolate (~5–10 MB), and data transfer between threads requires serialization (structured clone or SharedArrayBuffer).

Virtual threads and CPU-bound work

Virtual threads do not help with CPU-bound tasks. They are scheduled cooperatively — a virtual thread that never blocks (because it is computing) never unmounts. It occupies a carrier thread for the entire computation:

// This virtual thread will monopolize a carrier thread
// for the entire duration of the computation.
Thread.ofVirtual().start(() -> {
    // Pure CPU work — no yield points
    BigInteger result = BigInteger.ONE;
    for (int i = 2; i <= 100_000; i++) {
        result = result.multiply(BigInteger.valueOf(i));
    }
});

For CPU-bound work, both models fall back to the same answer: use a pool of OS threads or worker threads sized to the number of CPU cores. Virtual threads are specifically designed for I/O-bound workloads where tasks spend most of their time waiting.

I/O-bound work — the sweet spot for virtual threads

This is where virtual threads shine. Code that reads like sequential blocking logic executes with the efficiency of asynchronous I/O:

// This looks blocking but runs efficiently on virtual threads.
// Each blocking call (getInputStream, readAllBytes) triggers an unmount.
String fetchUrl(URI uri) throws Exception {
    HttpURLConnection conn = (HttpURLConnection) uri.toURL().openConnection();
    conn.setRequestMethod("GET");
    try (InputStream is = conn.getInputStream()) {
        return new String(is.readAllBytes(), StandardCharsets.UTF_8);
    }
}

// Launch 50,000 concurrent HTTP fetches. Each gets a virtual thread.
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    List<Future<String>> futures = urls.stream()
        .map(url -> executor.submit(() -> fetchUrl(url)))
        .toList();

    for (var future : futures) {
        String body = future.get(); // Blocks this virtual thread, not a carrier
        process(body);
    }
}

The equivalent in Node.js requires Promise.all(), which is elegant for simple fan-out but becomes unwieldy for complex control flow with error handling, partial results, and timeouts:

const results = await Promise.all(
  urls.map(url =>
    fetch(url)
      .then(res => res.text())
      .catch(err => ({ error: err.message, url }))
  )
);

Structured Concurrency — Managing Concurrent Task Lifecycles

Java 21 introduced StructuredTaskScope (preview) to manage groups of concurrent tasks as a unit. This prevents the common problem of leaked threads and orphaned tasks.

ShutdownOnFailure — fail fast

If any subtask fails, cancel all remaining subtasks:

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Subtask<User> userTask = scope.fork(() -> fetchUser(userId));
    Subtask<List<Order>> ordersTask = scope.fork(() -> fetchOrders(userId));
    Subtask<CreditScore> creditTask = scope.fork(() -> fetchCreditScore(userId));

    scope.join();           // Wait for all tasks to complete or one to fail
    scope.throwIfFailed();  // Propagate the first exception

    // All three succeeded — safe to access results
    return new UserProfile(
        userTask.get(),
        ordersTask.get(),
        creditTask.get()
    );
}
// Exiting the try-with-resources block cancels any incomplete subtasks.

ShutdownOnSuccess — first result wins

Return the first successful result, cancel the rest:

try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
    scope.fork(() -> fetchFromPrimaryDb(key));
    scope.fork(() -> fetchFromReplicaDb(key));
    scope.fork(() -> fetchFromCache(key));

    scope.join();
    return scope.result();  // Returns the first completed result
}

Comparison with Promise.all() and Promise.race()

BehaviorJava Structured ConcurrencyNode.js
All succeed or fail fastShutdownOnFailurePromise.all()
First success winsShutdownOnSuccessPromise.any()
All settle (no short-circuit)Custom scopePromise.allSettled()
First to settle (success or failure)Custom scopePromise.race()

The critical difference: StructuredTaskScope guarantees that no forked task outlives the scope. When the scope closes, all subtasks are cancelled and joined. In Node.js, a rejected promise in Promise.all() does not cancel the other promises — those HTTP requests continue to completion, consuming resources. You need AbortController for cancellation, which requires explicit wiring:

async function fetchUserProfile(userId) {
  const controller = new AbortController();
  const { signal } = controller;

  try {
    const [user, orders, credit] = await Promise.all([
      fetch(`/users/${userId}`, { signal }).then(r => r.json()),
      fetch(`/orders?user=${userId}`, { signal }).then(r => r.json()),
      fetch(`/credit/${userId}`, { signal }).then(r => r.json()),
    ]);
    return { user, orders, credit };
  } catch (err) {
    controller.abort(); // Cancel remaining requests on failure
    throw err;
  }
}

Backpressure Handling

Backpressure is the mechanism that prevents a fast producer from overwhelming a slow consumer.

Node.js streams

Node.js has built-in backpressure via the streams API. A writable stream has a highWaterMark (default 16 KB for byte streams, 16 objects for object streams). When the internal buffer exceeds this mark, write() returns false, signaling the producer to pause:

const { Transform } = require('stream');

const transformer = new Transform({
  highWaterMark: 1024 * 64, // 64 KB buffer
  transform(chunk, encoding, callback) {
    const processed = expensiveTransform(chunk);
    callback(null, processed);
  }
});

// Pipe automatically handles backpressure between streams.
// If transformer is slow, readableSource pauses automatically.
readableSource.pipe(transformer).pipe(writableDestination);

Java reactive streams

Before virtual threads, Java addressed backpressure through reactive streams (Flow.Publisher, Flow.Subscriber). This works but forces a reactive programming model onto the entire call chain:

// Reactive backpressure — complex, infectious API style
subscription.request(10); // Pull-based: subscriber requests 10 items

Virtual threads with blocking queues

Virtual threads offer a simpler alternative: use a bounded BlockingQueue. The producer blocks when the queue is full; the consumer blocks when it is empty. Because these are virtual threads, blocking is cheap — the JVM unmounts the blocked thread and reclaims the carrier:

BlockingQueue<DataChunk> queue = new ArrayBlockingQueue<>(100);

// Producer virtual thread — blocks when queue is full
Thread.ofVirtual().start(() -> {
    for (DataChunk chunk : dataSource) {
        queue.put(chunk); // Blocks if queue has 100 items — unmounts, does not pin
    }
    queue.put(DataChunk.POISON_PILL);
});

// Consumer virtual thread — blocks when queue is empty
Thread.ofVirtual().start(() -> {
    while (true) {
        DataChunk chunk = queue.take(); // Blocks if queue is empty
        if (chunk == DataChunk.POISON_PILL) break;
        process(chunk);
    }
});

This is the classical producer-consumer pattern. Before virtual threads, this pattern consumed OS threads while blocked. Now it consumes a few kilobytes of heap.

The Database Connection Pooling Gotcha

This is the most common production issue with virtual threads. Many JDBC drivers and connection pools use synchronized blocks internally. When a virtual thread enters a synchronized block and then performs a blocking I/O operation (like sending a query to the database), it gets pinned to the carrier thread.

The problem

// Inside a typical JDBC driver or connection pool (simplified)
public class ConnectionPool {
    // synchronized causes pinning when blocking I/O happens inside
    public synchronized Connection getConnection() {
        while (availableConnections.isEmpty()) {
            wait(); // PINNED — this virtual thread cannot unmount
        }
        return availableConnections.remove(0);
    }
}

If you have 8 carrier threads (on an 8-core machine) and 8 virtual threads get pinned inside synchronized blocks, all carrier threads are occupied. The remaining virtual threads — even thousands of them — cannot run. The system is effectively deadlocked.

The fix

Replace synchronized with ReentrantLock:

public class ConnectionPool {
    private final ReentrantLock lock = new ReentrantLock();
    private final Condition available = lock.newCondition();

    public Connection getConnection() throws InterruptedException {
        lock.lock(); // Virtual thread parks here without pinning
        try {
            while (availableConnections.isEmpty()) {
                available.await(); // Unmounts cleanly — no pinning
            }
            return availableConnections.remove(0);
        } finally {
            lock.unlock();
        }
    }
}

ReentrantLock and its associated Condition are virtual-thread-friendly. When a virtual thread calls lock.lock() and the lock is held, or calls condition.await(), it parks and unmounts from the carrier thread cleanly.

Connection pool sizing

With traditional thread pools, you size the connection pool to match the thread pool. 200 threads → ~200 connections makes sense.

With virtual threads, you can have 100,000 concurrent tasks but your database cannot handle 100,000 connections. You still need a bounded connection pool — typically 20–50 connections for most relational databases. The virtual threads will park while waiting for a connection, which is fine. The key is using a pool implementation that does not pin (HikariCP 5.1+ and most modern pools have been updated).

Use the JVM flag -Djdk.tracePinnedThreads=short during development to detect pinning:

Thread[#42,ForkJoinPool-1-worker-3,5,CarrierThreads]
    java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:183)
    java.base/java.lang.VirtualThread.parkOnCarrierThread(VirtualThread.java:556)

Benchmarks

All benchmarks were run on identical hardware: AWS c6i.4xlarge (16 vCPU, 32 GB RAM, Amazon Linux 2023). Each scenario was run 5 times with a 30-second warmup; results are the median of 5 runs.

  • Java 21.0.2 (GraalVM CE), flags: -Xmx8g -Djdk.virtualThreadScheduler.parallelism=16
  • Node.js 20.11.0, flags: --max-old-space-size=8192
  • Load generator: wrk2 running on a separate c6i.4xlarge, connected via 25 Gbps network

HTTP Request Throughput (simple JSON response, ~500 bytes)

Concurrent connectionsJava (virtual threads)Java (platform threads, 200 pool)Node.js (cluster, 16 workers)
1,000142,000 req/s138,000 req/s127,000 req/s
10,000139,000 req/s91,000 req/s124,000 req/s
50,000131,000 req/s43,000 req/s118,000 req/s
100,000122,000 req/sOOM / crashed108,000 req/s

Java platform threads collapse beyond 10K connections because the pool is saturated and queuing adds latency. Node.js maintains relatively consistent throughput because the event loop does not care how many connections are open — it processes events as they arrive.

Database Query Throughput (PostgreSQL, simple SELECT, connection pool of 40)

Concurrent tasksJava (virtual threads)Java (platform threads, 200 pool)Node.js (pg pool, 40 conn)
10018,200 q/s18,400 q/s16,800 q/s
1,00017,900 q/s17,100 q/s16,500 q/s
10,00017,400 q/s14,200 q/s15,900 q/s
50,00016,800 q/s6,300 q/s15,100 q/s

Database throughput is bottlenecked by the connection pool and database itself. All three approaches converge near the database's limit at low concurrency. The difference emerges under load: virtual threads park efficiently while waiting for a connection, platform threads consume memory and context-switch overhead, and Node.js performs well here since database I/O is naturally async in the pg driver.

JSON Serialization (CPU-bound, 10 KB payload)

MetricJava (Jackson)Node.js (JSON.stringify)
Single-thread throughput320,000 ops/s145,000 ops/s
16-core parallel throughput4,800,000 ops/s2,100,000 ops/s (16 workers)
p99 latency (under load)0.4 ms1.1 ms

Java's JIT compiler produces more optimized machine code for tight loops. Node.js V8 JIT is excellent but operates under more constraints (dynamic typing, shape polymorphism). For CPU-bound serialization, Java consistently outperforms Node.js by 2–2.5x.

Decision Framework

Choose Java virtual threads when:

  • Your workload is I/O-bound with many concurrent blocking operations (database queries, HTTP calls, file I/O).
  • Your team has Java expertise and an existing Java ecosystem (Spring Boot 3.2+, Quarkus, Helidon).
  • You need fine-grained CPU-bound parallelism alongside I/O concurrency.
  • You are building services that need to handle 50K+ concurrent connections per instance.
  • You want to write straight-line blocking code without callback chains or reactive operators.

Choose Node.js when:

  • Your service is primarily an I/O proxy — receiving requests, calling other services, returning results — with minimal CPU processing.
  • Your team is JavaScript/TypeScript-native and shares a frontend codebase.
  • You are building real-time systems (WebSocket servers, chat, live dashboards) where the event-driven model is natural.
  • Startup time and cold-start latency matter (serverless, edge functions). Node.js starts in ~50 ms; a JVM needs 500 ms–2 seconds.
  • Your ecosystem is npm-centric, and rewriting in Java would mean abandoning well-tested libraries.

Neither is categorically better. The right choice depends on the workload profile, team capability, and existing infrastructure.

Case Study: Healthcare Data Aggregation Service

A healthcare client needed a service to aggregate patient records from 200+ hospital APIs in real time. Each aggregation request fans out to 15–40 hospital endpoints simultaneously, collects responses, normalizes the data into FHIR-compliant resources, and returns a unified patient record. The SLA required p99 latency under 2 seconds with sustained throughput of 500 aggregation requests per second.

Stripe Systems Engineering benchmarked two implementations head-to-head to make an evidence-based architecture decision.

Benchmark Setup

Hardware: 3x AWS c6i.4xlarge (16 vCPU, 32 GB RAM) behind an NLB. Simulated hospital APIs running on a separate fleet, introducing 50–200ms random latency per response.

Java implementation: Java 21.0.2 (GraalVM CE), Spring Boot 3.2.3 with virtual threads enabled (spring.threads.virtual.enabled=true). JVM flags: -Xmx12g -Xms12g -XX:+UseZGC -Djdk.virtualThreadScheduler.parallelism=16. HTTP client: java.net.http.HttpClient (supports virtual threads natively).

Node.js implementation: Node.js 20.11.0, Express 4.18 + undici HTTP client. Cluster mode with 16 workers. --max-old-space-size=12288.

Both implementations used the same normalization logic (ported between Java and TypeScript), the same PostgreSQL database for caching (HikariCP / pg pool, 40 connections each), and identical API endpoint contracts.

Java virtual threads implementation (core fan-out)

public PatientRecord aggregate(String patientId, List<HospitalEndpoint> endpoints)
        throws InterruptedException, ExecutionException {

    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
        List<Subtask<HospitalResponse>> tasks = endpoints.stream()
            .map(endpoint -> scope.fork(() -> {
                HttpRequest req = HttpRequest.newBuilder()
                    .uri(endpoint.buildUri(patientId))
                    .header("Authorization", "Bearer " + endpoint.getToken())
                    .timeout(Duration.ofSeconds(5))
                    .build();

                HttpResponse<String> resp = httpClient.send(
                    req, HttpResponse.BodyHandlers.ofString()
                );

                if (resp.statusCode() != 200) {
                    throw new HospitalApiException(endpoint.name(), resp.statusCode());
                }
                return parseResponse(endpoint.format(), resp.body());
            }))
            .toList();

        scope.joinUntil(Instant.now().plusSeconds(8));
        scope.throwIfFailed();

        List<HospitalResponse> responses = tasks.stream()
            .map(Subtask::get)
            .toList();

        return fhirNormalizer.normalize(patientId, responses);
    }
}

Node.js implementation (core fan-out)

async function aggregate(patientId, endpoints) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 8000);

  try {
    const responses = await Promise.all(
      endpoints.map(async (endpoint) => {
        const url = endpoint.buildUri(patientId);
        const resp = await fetch(url, {
          headers: { Authorization: `Bearer ${endpoint.token}` },
          signal: controller.signal,
        });

        if (!resp.ok) {
          throw new HospitalApiError(endpoint.name, resp.status);
        }
        return parseResponse(endpoint.format, await resp.text());
      })
    );

    return fhirNormalizer.normalize(patientId, responses);
  } finally {
    clearTimeout(timeout);
  }
}

Results Under Load

Aggregation throughput (requests/sec, each request fans out to ~25 hospital APIs):

Concurrent aggregation requestsJava (virtual threads)Node.js (cluster, 16 workers)
100520 req/s490 req/s
500510 req/s475 req/s
1,000495 req/s440 req/s
5,000470 req/s380 req/s
10,000440 req/s310 req/s

At 10K concurrent aggregation requests, each fanning out to ~25 endpoints, the system handles ~250,000 concurrent outbound HTTP connections. The Java implementation maintained 440 req/s; Node.js dropped to 310 req/s as garbage collection pressure from 250K in-flight promise chains increased GC pause times.

p99 latency (ms):

Concurrent aggregation requestsJava (virtual threads)Node.js (cluster, 16 workers)
100620 ms640 ms
500710 ms780 ms
1,000890 ms1,100 ms
5,0001,400 ms2,800 ms
10,0001,900 ms4,200 ms

The latency divergence at scale is the most significant finding. At 5,000 concurrent requests, Node.js breached the 2-second SLA at p99. Java virtual threads stayed within SLA up to ~8,000 concurrent requests.

Memory consumption (RSS, steady state at 5,000 concurrent requests):

MetricJava (virtual threads)Node.js (16 workers)
RSS per instance6.2 GB8.4 GB (16 × ~525 MB)
GC pause p993.1 ms (ZGC)48 ms (V8 major GC)
Virtual thread / promise count~125,000~125,000 per worker
Carrier threads / event loops1616

The GC pause difference is critical for tail latency. ZGC keeps p99 pauses under 10 ms regardless of heap size. V8's generational GC produces occasional major pauses of 30–80 ms under heavy allocation, which directly impacts p99 response latency.

Architecture Decision

Stripe Systems selected Java 21 virtual threads for this service based on three factors:

  1. SLA compliance at scale: The Java implementation met the 2-second p99 SLA at 3x the projected peak load (5,000 concurrent requests vs projected 1,500). The Node.js implementation breached SLA at 2x projected peak.

  2. Simpler error handling: StructuredTaskScope with joinUntil() provided timeout semantics and automatic cancellation of in-flight hospital API calls. The Node.js AbortController pattern required more boilerplate and was easier to get wrong — in early testing, we discovered several code paths where the abort signal was not propagated, leading to leaked connections.

  3. Operational predictability: ZGC's sub-10ms pause times meant tail latency was stable and predictable. V8 GC pauses introduced latency spikes that were harder to reason about under load.

The service has been in production for seven months, handling an average of 1,200 aggregation requests per second across 3 instances, with p99 latency at 820 ms.

Conclusion

Virtual threads do not make Java faster. They make Java capable of handling more concurrent I/O-bound work with fewer resources and simpler code. Node.js does not need virtual threads — its event loop already handles massive concurrency for I/O-bound workloads. The trade-offs are in programming model complexity, CPU-bound performance, and behavior under extreme concurrency.

Measure your workload. Benchmark on your hardware. Choose the model that fits the problem, not the one that fits the hype cycle.

Ready to discuss your project?

Get in Touch →
← Back to Blog

More Articles