Backend Development📅 February 21, 2026· 19 min read

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

✍️

Stripe Systems Engineering

Why gRPC for Inter-Service Communication

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers measurable advantages that compound at scale.

Binary Serialization vs JSON

Protocol Buffers encode data in a compact binary format. A JSON payload carrying a shipment status update might look like this:

{
  "shipment_id": "SHP-2025-00482931",
  "status": "IN_TRANSIT",
  "location": {
    "latitude": 37.7749,
    "longitude": -122.4194
  },
  "timestamp": "2025-11-05T14:32:00Z",
  "carrier_code": "FEDX",
  "weight_kg": 12.5
}

That's roughly 220 bytes on the wire. The equivalent protobuf encoding is around 62 bytes. At 50,000 messages per minute, that difference is 475 MB/hour in saved bandwidth. JSON also requires schema validation at runtime. Protobuf enforces it at compile time through generated code — you cannot construct a malformed message without the compiler catching it.

HTTP/2 Multiplexing

REST over HTTP/1.1 requires a new TCP connection per request (or connection pooling with head-of-line blocking). gRPC runs on HTTP/2, which multiplexes multiple streams over a single TCP connection. This means:

✓No connection setup overhead per RPC call
✓Header compression via HPACK reduces repeated header bytes
✓Server push and flow control are built into the protocol

When gRPC Is Not the Right Choice

gRPC is a poor fit when:

✓Browser clients need direct access. gRPC-Web exists but adds a proxy layer and loses bidirectional streaming. Use REST or GraphQL for public APIs.
✓Human readability matters. Debugging protobuf requires tooling. JSON you can curl and read.
✓The team is small and services are few. The upfront cost of proto file management, code generation pipelines, and tooling setup only pays off with multiple services and high message volumes.
✓You need ad-hoc querying. GraphQL lets clients specify exactly the data shape they need. gRPC has fixed request/response shapes defined in the proto.

The decision is not REST vs gRPC — it's about picking the right tool for each communication boundary.

Protobuf Schema Design

Protobuf schemas are the contract between services. Getting the schema design right prevents painful migrations later.

Message Types and Service Definitions

syntax = "proto3";

package logistics.shipment.v1;

import "google/protobuf/timestamp.proto";
import "google/protobuf/empty.proto";

service ShipmentTrackingService {
  rpc GetShipment(GetShipmentRequest) returns (ShipmentResponse);
  rpc UpdateStatus(UpdateStatusRequest) returns (ShipmentResponse);
  rpc StreamUpdates(StreamUpdatesRequest) returns (stream ShipmentEvent);
  rpc BulkIngest(stream ShipmentEvent) returns (BulkIngestResponse);
}

message GetShipmentRequest {
  string shipment_id = 1;
}

message UpdateStatusRequest {
  string shipment_id = 1;
  ShipmentStatus status = 2;
  Location location = 3;
  string notes = 4;
}

message ShipmentResponse {
  string shipment_id = 1;
  ShipmentStatus status = 2;
  Location origin = 3;
  Location destination = 4;
  repeated ShipmentEvent history = 5;
  google.protobuf.Timestamp created_at = 6;
  google.protobuf.Timestamp updated_at = 7;
}

message ShipmentEvent {
  string event_id = 1;
  string shipment_id = 2;
  ShipmentStatus status = 3;
  Location location = 4;
  google.protobuf.Timestamp timestamp = 5;
  oneof detail {
    DelayInfo delay = 6;
    CustomsInfo customs = 7;
    DeliveryAttempt delivery_attempt = 8;
  }
}

message Location {
  double latitude = 1;
  double longitude = 2;
  string address = 3;
  string city = 4;
  string country_code = 5;
}

message DelayInfo {
  string reason = 1;
  int32 estimated_delay_minutes = 2;
}

message CustomsInfo {
  string declaration_id = 1;
  CustomsStatus customs_status = 2;
}

message DeliveryAttempt {
  int32 attempt_number = 1;
  string failure_reason = 2;
}

enum ShipmentStatus {
  SHIPMENT_STATUS_UNSPECIFIED = 0;
  CREATED = 1;
  PICKED_UP = 2;
  IN_TRANSIT = 3;
  OUT_FOR_DELIVERY = 4;
  DELIVERED = 5;
  FAILED_DELIVERY = 6;
  RETURNED = 7;
}

enum CustomsStatus {
  CUSTOMS_STATUS_UNSPECIFIED = 0;
  PENDING_CLEARANCE = 1;
  CLEARED = 2;
  HELD = 3;
}

message StreamUpdatesRequest {
  repeated string shipment_ids = 1;
  google.protobuf.Timestamp since = 2;
}

message BulkIngestResponse {
  int32 accepted = 1;
  int32 rejected = 2;
  repeated string failed_event_ids = 3;
}

Versioning Rules

Protobuf's backward compatibility hinges on field numbers:

✓Never reuse a field number. Once assigned, a field number is permanent. If you remove a field, mark it as reserved.
✓Never change a field's type. int32 to int64 is a breaking change, even though it seems safe.
✓Add new fields with new numbers. Old consumers will ignore unknown fields.
✓Use reserved to prevent accidental reuse:

message ShipmentResponse {
  reserved 8, 9;            // removed: eta_minutes, priority
  reserved "eta_minutes", "priority";
  // ...existing fields...
}

✓Use oneof for variant types instead of stuffing optional fields into a flat message. It makes the schema self-documenting and ensures only one variant is set.
✓Package with version suffixes (logistics.shipment.v1) so you can run v1 and v2 side by side during migration.

NestJS gRPC Transport Layer

Module Setup

Install the required packages:

npm install @nestjs/microservices @grpc/grpc-js @grpc/proto-loader

Configure the gRPC microservice in your bootstrap:

// src/main.ts
import { NestFactory } from '@nestjs/core';
import { MicroserviceOptions, Transport } from '@nestjs/microservices';
import { join } from 'path';
import { AppModule } from './app.module';

async function bootstrap() {
  const app = await NestFactory.createMicroservice<MicroserviceOptions>(
    AppModule,
    {
      transport: Transport.GRPC,
      options: {
        package: 'logistics.shipment.v1',
        protoPath: join(__dirname, '../proto/shipment.proto'),
        url: '0.0.0.0:50051',
        loader: {
          keepCase: true,
          longs: String,
          enums: String,
          defaults: true,
          oneofs: true,
        },
        channelOptions: {
          'grpc.max_receive_message_length': 4 * 1024 * 1024,
          'grpc.max_send_message_length': 4 * 1024 * 1024,
          'grpc.keepalive_time_ms': 10000,
          'grpc.keepalive_timeout_ms': 5000,
          'grpc.keepalive_permit_without_calls': 1,
        },
      },
    },
  );

  await app.listen();
  console.log('gRPC server listening on 0.0.0.0:50051');
}
bootstrap();

Controller with @GrpcMethod

// src/shipment/shipment.controller.ts
import { Controller } from '@nestjs/common';
import { GrpcMethod, GrpcStreamMethod } from '@nestjs/microservices';
import { Metadata, ServerUnaryCall } from '@grpc/grpc-js';
import { ShipmentService } from './shipment.service';
import {
  GetShipmentRequest,
  ShipmentResponse,
  UpdateStatusRequest,
} from '../generated/shipment';

@Controller()
export class ShipmentController {
  constructor(private readonly shipmentService: ShipmentService) {}

  @GrpcMethod('ShipmentTrackingService', 'GetShipment')
  async getShipment(
    data: GetShipmentRequest,
    metadata: Metadata,
    call: ServerUnaryCall<GetShipmentRequest, ShipmentResponse>,
  ): Promise<ShipmentResponse> {
    const tenantId = metadata.get('x-tenant-id')[0]?.toString();
    return this.shipmentService.findById(data.shipment_id, tenantId);
  }

  @GrpcMethod('ShipmentTrackingService', 'UpdateStatus')
  async updateStatus(
    data: UpdateStatusRequest,
    metadata: Metadata,
  ): Promise<ShipmentResponse> {
    return this.shipmentService.updateStatus(
      data.shipment_id,
      data.status,
      data.location,
      data.notes,
    );
  }
}

The third parameter of a @GrpcMethod handler gives you access to the raw gRPC call object, which is useful for setting response metadata or trailing metadata.

Client Setup

To call another gRPC service from NestJS, register the client in a module:

// src/shipment/shipment.module.ts
import { Module } from '@nestjs/common';
import { ClientsModule, Transport } from '@nestjs/microservices';
import { join } from 'path';
import { ShipmentController } from './shipment.controller';
import { ShipmentService } from './shipment.service';

@Module({
  imports: [
    ClientsModule.register([
      {
        name: 'ROUTING_PACKAGE',
        transport: Transport.GRPC,
        options: {
          package: 'logistics.routing.v1',
          protoPath: join(__dirname, '../../proto/routing.proto'),
          url: 'routing-service:50051',
          loader: {
            keepCase: true,
            longs: String,
            enums: String,
            defaults: true,
            oneofs: true,
          },
        },
      },
    ]),
  ],
  controllers: [ShipmentController],
  providers: [ShipmentService],
})
export class ShipmentModule {}

Then inject and use the client:

// src/shipment/shipment.service.ts
import { Inject, Injectable, OnModuleInit } from '@nestjs/common';
import { ClientGrpc } from '@nestjs/microservices';
import { firstValueFrom } from 'rxjs';

interface RoutingService {
  CalculateRoute(data: { origin: Location; destination: Location }): Observable<Route>;
}

@Injectable()
export class ShipmentService implements OnModuleInit {
  private routingService: RoutingService;

  constructor(@Inject('ROUTING_PACKAGE') private client: ClientGrpc) {}

  onModuleInit() {
    this.routingService =
      this.client.getService<RoutingService>('RoutingService');
  }

  async calculateRoute(origin: Location, destination: Location): Promise<Route> {
    return firstValueFrom(
      this.routingService.CalculateRoute({ origin, destination }),
    );
  }
}

Note the onModuleInit hook. getService must be called after the client connection is established, not in the constructor.

Streaming Patterns

gRPC supports four communication patterns. Each serves a distinct purpose.

Unary RPC

Standard request-response. Use for simple lookups and mutations. Already shown above with GetShipment.

Server-Side Streaming

The server sends a stream of messages in response to a single client request. Ideal for real-time feeds and change notifications.

@GrpcStreamMethod('ShipmentTrackingService', 'StreamUpdates')
streamUpdates(
  data: Observable<StreamUpdatesRequest>,
): Observable<ShipmentEvent> {
  // For server streaming in NestJS, the method receives the request
  // and returns an Observable that emits multiple responses
  return new Observable<ShipmentEvent>((subscriber) => {
    // Subscribe to shipment events from your event bus
    const subscription = this.eventBus
      .pipe(
        filter((event) => requestedIds.includes(event.shipment_id)),
      )
      .subscribe({
        next: (event) => subscriber.next(event),
        error: (err) => subscriber.error(err),
      });

    return () => subscription.unsubscribe();
  });
}

A more practical implementation using an async generator:

@GrpcMethod('ShipmentTrackingService', 'StreamUpdates')
async *streamUpdates(
  data: StreamUpdatesRequest,
): AsyncGenerator<ShipmentEvent> {
  const { shipment_ids, since } = data;

  // First, replay missed events
  const missed = await this.eventStore.getEventsSince(shipment_ids, since);
  for (const event of missed) {
    yield event;
  }

  // Then stream live events
  const channel = this.eventBus.subscribe(shipment_ids);
  try {
    for await (const event of channel) {
      yield event;
    }
  } finally {
    channel.unsubscribe();
  }
}

Client-Side Streaming

The client sends a stream of messages, and the server responds with a single message after the stream completes. Use for bulk uploads and batch ingestion.

@GrpcStreamMethod('ShipmentTrackingService', 'BulkIngest')
bulkIngest(
  messages: Observable<ShipmentEvent>,
): Observable<BulkIngestResponse> {
  return new Observable<BulkIngestResponse>((subscriber) => {
    let accepted = 0;
    let rejected = 0;
    const failedIds: string[] = [];

    messages.subscribe({
      next: async (event: ShipmentEvent) => {
        try {
          await this.shipmentService.processEvent(event);
          accepted++;
        } catch {
          rejected++;
          failedIds.push(event.event_id);
        }
      },
      complete: () => {
        subscriber.next({ accepted, rejected, failed_event_ids: failedIds });
        subscriber.complete();
      },
      error: (err) => subscriber.error(err),
    });
  });
}

Bidirectional Streaming

Both client and server send streams concurrently. Use for real-time collaboration, live tracking dashboards, and chat-like protocols.

service LiveTrackingService {
  rpc TrackVehicles(stream VehiclePosition) returns (stream RouteUpdate);
}

@GrpcStreamMethod('LiveTrackingService', 'TrackVehicles')
trackVehicles(
  positions: Observable<VehiclePosition>,
): Observable<RouteUpdate> {
  return new Observable<RouteUpdate>((subscriber) => {
    positions.subscribe({
      next: async (position: VehiclePosition) => {
        // Process incoming position, compute route deviation
        const update = await this.routeEngine.evaluate(position);
        if (update.deviation_meters > 500) {
          subscriber.next({
            vehicle_id: position.vehicle_id,
            recalculated_route: update.new_route,
            reason: 'ROUTE_DEVIATION',
          });
        }
      },
      complete: () => subscriber.complete(),
      error: (err) => subscriber.error(err),
    });
  });
}

Error Handling with gRPC Status Codes

gRPC defines 16 status codes. Mapping business errors to the correct status code is critical — clients depend on these codes for retry logic and error categorization.

gRPC Status	HTTP Equivalent	Use Case
`OK`	200	Success
`NOT_FOUND`	404	Resource doesn't exist
`ALREADY_EXISTS`	409	Duplicate creation
`INVALID_ARGUMENT`	400	Validation failure
`PERMISSION_DENIED`	403	Authorization failure
`UNAUTHENTICATED`	401	Missing/invalid credentials
`RESOURCE_EXHAUSTED`	429	Rate limit exceeded
`INTERNAL`	500	Unexpected server error
`UNAVAILABLE`	503	Transient failure (retry safe)
`DEADLINE_EXCEEDED`	504	Timeout

Error Interceptor

// src/interceptors/grpc-error.interceptor.ts
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
} from '@nestjs/common';
import { RpcException } from '@nestjs/microservices';
import { Observable, catchError, throwError } from 'rxjs';
import { status as GrpcStatus } from '@grpc/grpc-js';

@Injectable()
export class GrpcErrorInterceptor implements NestInterceptor {
  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    return next.handle().pipe(
      catchError((error) => {
        if (error instanceof RpcException) {
          return throwError(() => error);
        }

        const grpcError = this.mapToGrpcError(error);
        return throwError(() => new RpcException(grpcError));
      }),
    );
  }

  private mapToGrpcError(error: any): { code: number; message: string } {
    if (error.name === 'ValidationError') {
      return {
        code: GrpcStatus.INVALID_ARGUMENT,
        message: `Validation failed: ${error.message}`,
      };
    }

    if (error.name === 'EntityNotFoundError') {
      return {
        code: GrpcStatus.NOT_FOUND,
        message: error.message,
      };
    }

    if (error.code === 'ER_DUP_ENTRY' || error.code === '23505') {
      return {
        code: GrpcStatus.ALREADY_EXISTS,
        message: 'Resource already exists',
      };
    }

    if (error.name === 'UnauthorizedError') {
      return {
        code: GrpcStatus.UNAUTHENTICATED,
        message: 'Authentication required',
      };
    }

    // Default: don't leak internal details
    console.error('Unhandled gRPC error:', error);
    return {
      code: GrpcStatus.INTERNAL,
      message: 'Internal server error',
    };
  }
}

Apply it globally:

// src/app.module.ts
import { APP_INTERCEPTOR } from '@nestjs/core';

@Module({
  providers: [
    {
      provide: APP_INTERCEPTOR,
      useClass: GrpcErrorInterceptor,
    },
  ],
})
export class AppModule {}

One common mistake: throwing HTTP exceptions (NotFoundException, BadRequestException) from shared code. These don't translate to gRPC status codes automatically. Always throw RpcException with explicit gRPC status codes, or use an interceptor like the one above to catch and convert them.

Health Checking and Graceful Shutdown

gRPC Health Checking Protocol

gRPC defines a standard health checking protocol in grpc.health.v1. Load balancers and orchestrators (Kubernetes) use this to determine if a service can accept traffic.

// health.proto (standard — don't modify)
syntax = "proto3";

package grpc.health.v1;

service Health {
  rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
  rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

message HealthCheckRequest {
  string service = 1;
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
    SERVICE_UNKNOWN = 3;
  }
  ServingStatus status = 1;
}

NestJS Implementation

// src/health/health.controller.ts
import { Controller } from '@nestjs/common';
import { GrpcMethod } from '@nestjs/microservices';

interface HealthCheckRequest {
  service: string;
}

interface HealthCheckResponse {
  status: 'UNKNOWN' | 'SERVING' | 'NOT_SERVING' | 'SERVICE_UNKNOWN';
}

@Controller()
export class HealthController {
  private isShuttingDown = false;
  private readonly serviceChecks = new Map<string, () => Promise<boolean>>();

  registerCheck(serviceName: string, check: () => Promise<boolean>) {
    this.serviceChecks.set(serviceName, check);
  }

  setShuttingDown() {
    this.isShuttingDown = true;
  }

  @GrpcMethod('Health', 'Check')
  async check(data: HealthCheckRequest): Promise<HealthCheckResponse> {
    if (this.isShuttingDown) {
      return { status: 'NOT_SERVING' };
    }

    if (data.service === '') {
      // Empty service name = overall server health
      return { status: 'SERVING' };
    }

    const check = this.serviceChecks.get(data.service);
    if (!check) {
      return { status: 'SERVICE_UNKNOWN' };
    }

    const healthy = await check();
    return { status: healthy ? 'SERVING' : 'NOT_SERVING' };
  }
}

Graceful Shutdown

// src/main.ts (additions)
async function bootstrap() {
  const app = await NestFactory.createMicroservice<MicroserviceOptions>(
    AppModule,
    { /* ...gRPC options */ },
  );

  const healthController = app.get(HealthController);

  const shutdown = async (signal: string) => {
    console.log(`Received ${signal}. Starting graceful shutdown...`);

    // 1. Mark as not serving — stops new requests from load balancer
    healthController.setShuttingDown();

    // 2. Wait for in-flight requests to drain
    await new Promise((resolve) => setTimeout(resolve, 10_000));

    // 3. Close the application
    await app.close();
    process.exit(0);
  };

  process.on('SIGTERM', () => shutdown('SIGTERM'));
  process.on('SIGINT', () => shutdown('SIGINT'));

  await app.listen();
}

The 10-second drain window is important. Kubernetes sends SIGTERM, then waits terminationGracePeriodSeconds (default 30s) before sending SIGKILL. You need enough time for the load balancer to detect the health check failure and stop routing traffic, plus time for in-flight requests to complete.

Load Balancing gRPC

This is where most teams get burned. Standard load balancing strategies that work for REST APIs break with gRPC.

The HTTP/2 Multiplexing Problem

With HTTP/1.1, each request uses a separate TCP connection (or is serialized on a keep-alive connection). An L4 load balancer distributes connections across backends, and since each connection carries one request at a time, load distributes evenly.

gRPC uses HTTP/2, which multiplexes all RPCs over a single long-lived TCP connection. An L4 load balancer sees one connection and routes all traffic to a single backend. Result: one server gets all the load while others sit idle.

Solutions

L7 Load Balancing (Envoy, Linkerd, Istio): These proxies understand HTTP/2 framing and can distribute individual RPCs across backends, even within a single client connection. Envoy is the most common choice.

# Envoy configuration snippet
clusters:
  - name: shipment_service
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    http2_protocol_options: {}
    load_assignment:
      cluster_name: shipment_service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: shipment-service
                    port_value: 50051
    health_checks:
      - grpc_health_check: {}
        timeout: 2s
        interval: 10s
        unhealthy_threshold: 3
        healthy_threshold: 2

Client-Side Load Balancing: The gRPC client resolves multiple backend addresses (via DNS or a service registry) and distributes calls itself. NestJS doesn't expose this directly, but you can configure the underlying @grpc/grpc-js channel:

// Using DNS-based client-side load balancing
options: {
  url: 'dns:///shipment-service:50051',
  channelOptions: {
    'grpc.service_config': JSON.stringify({
      loadBalancingConfig: [{ round_robin: {} }],
    }),
  },
}

The dns:/// scheme tells gRPC to resolve the hostname to multiple A records and balance across them. This works well in Kubernetes where a headless service returns all pod IPs.

Recommendation: Use L7 load balancing (Envoy) in production. Client-side balancing works but pushes complexity into every service. Envoy centralizes it and adds observability, retries, and circuit breaking.

Interceptors

Logging Interceptor

// src/interceptors/grpc-logging.interceptor.ts
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
  Logger,
} from '@nestjs/common';
import { Observable, tap } from 'rxjs';

@Injectable()
export class GrpcLoggingInterceptor implements NestInterceptor {
  private readonly logger = new Logger('gRPC');

  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const rpcContext = context.switchToRpc();
    const handler = context.getHandler().name;
    const className = context.getClass().name;
    const method = `${className}.${handler}`;
    const start = performance.now();

    return next.handle().pipe(
      tap({
        next: () => {
          const duration = (performance.now() - start).toFixed(2);
          this.logger.log(`${method} completed in ${duration}ms`);
        },
        error: (err) => {
          const duration = (performance.now() - start).toFixed(2);
          this.logger.error(`${method} failed in ${duration}ms: ${err.message}`);
        },
      }),
    );
  }
}

Auth Interceptor

// src/interceptors/grpc-auth.interceptor.ts
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
} from '@nestjs/common';
import { RpcException } from '@nestjs/microservices';
import { Observable } from 'rxjs';
import { Metadata } from '@grpc/grpc-js';
import { status as GrpcStatus } from '@grpc/grpc-js';

@Injectable()
export class GrpcAuthInterceptor implements NestInterceptor {
  constructor(private readonly tokenService: TokenService) {}

  async intercept(
    context: ExecutionContext,
    next: CallHandler,
  ): Promise<Observable<any>> {
    const metadata: Metadata = context.switchToRpc().getContext();
    const authHeader = metadata.get('authorization')[0]?.toString();

    if (!authHeader) {
      throw new RpcException({
        code: GrpcStatus.UNAUTHENTICATED,
        message: 'Missing authorization metadata',
      });
    }

    const token = authHeader.replace('Bearer ', '');
    try {
      const claims = await this.tokenService.verify(token);
      // Attach claims to metadata for downstream handlers
      metadata.set('x-user-id', claims.sub);
      metadata.set('x-tenant-id', claims.tenantId);
    } catch {
      throw new RpcException({
        code: GrpcStatus.UNAUTHENTICATED,
        message: 'Invalid or expired token',
      });
    }

    return next.handle();
  }
}

OpenTelemetry Distributed Tracing

// src/interceptors/grpc-tracing.interceptor.ts
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
} from '@nestjs/common';
import { Observable, tap } from 'rxjs';
import { trace, SpanStatusCode, context, propagation } from '@opentelemetry/api';
import { Metadata } from '@grpc/grpc-js';

@Injectable()
export class GrpcTracingInterceptor implements NestInterceptor {
  private readonly tracer = trace.getTracer('grpc-server');

  intercept(ctx: ExecutionContext, next: CallHandler): Observable<any> {
    const metadata: Metadata = ctx.switchToRpc().getContext();
    const handler = ctx.getHandler().name;
    const service = ctx.getClass().name;

    // Extract trace context from incoming metadata
    const carrier: Record<string, string> = {};
    for (const [key, values] of Object.entries(metadata.getMap())) {
      carrier[key] = String(values);
    }
    const parentContext = propagation.extract(context.active(), carrier);

    return new Observable((subscriber) => {
      context.with(parentContext, () => {
        const span = this.tracer.startSpan(`grpc.${service}/${handler}`);
        span.setAttribute('rpc.system', 'grpc');
        span.setAttribute('rpc.service', service);
        span.setAttribute('rpc.method', handler);

        next.handle().pipe(
          tap({
            next: (value) => {
              span.setStatus({ code: SpanStatusCode.OK });
              subscriber.next(value);
            },
            error: (err) => {
              span.setStatus({
                code: SpanStatusCode.ERROR,
                message: err.message,
              });
              span.recordException(err);
              subscriber.error(err);
            },
            complete: () => {
              span.end();
              subscriber.complete();
            },
          }),
        ).subscribe();
      });
    });
  }
}

This propagates trace context across service boundaries. When service A calls service B, the trace ID flows through gRPC metadata, giving you end-to-end distributed traces in Jaeger, Tempo, or your preferred backend.

Proto-First vs Code-First Development

Proto-First

You write .proto files first, then generate TypeScript types, service interfaces, and client stubs.

Advantages:

✓Single source of truth for the API contract
✓Language-agnostic — generate clients in Go, Python, Java from the same proto
✓Forces you to think about the API design before implementation
✓Supports a review process for API changes (proto files in version control)

Disadvantages:

✓Requires a code generation pipeline (protoc, ts-proto, buf)
✓Generated code can be verbose or hard to customize
✓Extra build step that developers need to run

Code-First

You write TypeScript interfaces and decorators, and the framework infers the proto definition.

Advantages:

✓Faster iteration in early development
✓No build step for proto generation
✓Stays within the TypeScript ecosystem

Disadvantages:

✓Locks you into one language ecosystem
✓Proto files (if generated) may not match what you'd design by hand
✓Harder to enforce contract-first API design across teams

Recommendation: Use proto-first for any system with more than two services or more than one team. The upfront cost of the generation pipeline pays for itself in API consistency and cross-language support. Use buf instead of raw protoc — it handles linting, breaking change detection, and code generation with a single tool.

# buf.gen.yaml
version: v2
plugins:
  - remote: buf.build/community/timostamm-protobuf-ts
    out: src/generated
    opt:
      - long_type_string
      - output_javascript

Connection Pooling and Channel Management

A gRPC "channel" is a logical connection to a target. Under the hood, it manages one or more HTTP/2 connections, handles name resolution, load balancing, and reconnection.

Key behaviors to understand:

✓Channels are expensive to create. They involve DNS resolution, TCP handshake, and TLS negotiation. Create one channel per target service and reuse it.
✓Channels handle reconnection automatically. If the underlying connection drops, the channel reconnects with exponential backoff.
✓Subchannels map to backends. If DNS returns three IPs, the channel creates three subchannels and balances across them.
✓Idle channels get closed. By default, gRPC closes channels after 30 minutes of inactivity. Configure grpc.client_idle_timeout_ms if your traffic is bursty.

In NestJS, the ClientGrpc proxy manages the channel lifecycle. Each ClientsModule.register entry creates one channel. If you register a client with the same name in multiple modules, NestJS shares the same instance (assuming the same provider token).

For high-throughput scenarios, a single HTTP/2 connection can bottleneck on stream concurrency (default: 100 concurrent streams per connection). You can increase this:

channelOptions: {
  'grpc.max_concurrent_streams': 1000,
}

Or, for extreme throughput, create a pool of channels manually:

@Injectable()
export class GrpcChannelPool implements OnModuleInit {
  private channels: ClientGrpc[] = [];
  private index = 0;

  constructor(
    @Inject('SHIPMENT_PACKAGE_0') private c0: ClientGrpc,
    @Inject('SHIPMENT_PACKAGE_1') private c1: ClientGrpc,
    @Inject('SHIPMENT_PACKAGE_2') private c2: ClientGrpc,
  ) {}

  onModuleInit() {
    this.channels = [this.c0, this.c1, this.c2];
  }

  getService<T>(serviceName: string): T {
    const channel = this.channels[this.index % this.channels.length];
    this.index++;
    return channel.getService<T>(serviceName);
  }
}

This round-robins across three underlying HTTP/2 connections, tripling your stream concurrency capacity.

Case Study: Logistics Platform at Scale

The Problem

A logistics platform built by Stripe Systems processed shipment status updates from 200+ carrier integrations. The original architecture used REST (Express.js) for all inter-service communication. At 50,000 updates per minute, the system exhibited:

✓p99 latency: 180ms for status update propagation
✓Payload overhead: JSON payloads averaged 1.2 KB per update
✓Connection churn: Each REST call opened a new HTTP/1.1 connection or contended on a limited keep-alive pool
✓No streaming: Carrier adapters polled the tracking service every 5 seconds, creating 2,400 requests/minute per adapter even when there were no updates
✓Schema drift: Different services had subtly different JSON shapes for the same entities, causing intermittent parsing failures

The Migration

We replaced REST-based inter-service communication with gRPC over a 6-week period. The migration was incremental — both protocols ran in parallel during the transition.

Core Proto Definitions

The ShipmentTrackingService proto shown earlier in this article is the actual schema used. The critical design decisions:

✓
oneof detail in ShipmentEvent — Different event types carry different metadata. Rather than a flat message with 15 optional fields, the oneof forces each event to specify exactly one detail type. This eliminated an entire class of bugs where handlers assumed a field was present but it was only set for certain event types.
✓
repeated ShipmentEvent history in ShipmentResponse — Embedding the full event history in the response eliminated the common pattern of "get shipment, then get events" — two round trips replaced by one.
✓
Server streaming for StreamUpdates — Replaced the polling pattern entirely. Carrier adapters open a stream and receive events as they happen.

NestJS Module Setup

// src/app.module.ts
import { Module } from '@nestjs/common';
import { APP_INTERCEPTOR } from '@nestjs/core';
import { ShipmentModule } from './shipment/shipment.module';
import { HealthModule } from './health/health.module';
import { GrpcErrorInterceptor } from './interceptors/grpc-error.interceptor';
import { GrpcLoggingInterceptor } from './interceptors/grpc-logging.interceptor';
import { GrpcTracingInterceptor } from './interceptors/grpc-tracing.interceptor';

@Module({
  imports: [ShipmentModule, HealthModule],
  providers: [
    { provide: APP_INTERCEPTOR, useClass: GrpcTracingInterceptor },
    { provide: APP_INTERCEPTOR, useClass: GrpcLoggingInterceptor },
    { provide: APP_INTERCEPTOR, useClass: GrpcErrorInterceptor },
  ],
})
export class AppModule {}

Interceptor order matters. The tracing interceptor wraps the logging interceptor, which wraps the error interceptor. This means the trace span captures the full request lifecycle including error mapping.

Results

After migrating all inter-service communication to gRPC:

Metric              REST (Before)      gRPC (After)       Improvement
─────────────────────────────────────────────────────────────────────
p50 latency         42ms               8ms                5.2x faster
p95 latency         110ms              16ms               6.9x faster
p99 latency         180ms              22ms               8.2x faster
Payload size (avg)  1,200 bytes        340 bytes          70% reduction
Connections/node    ~2,400             3 (multiplexed)    99.9% reduction
CPU (tracking svc)  72% avg            31% avg            57% reduction
Bandwidth           3.4 GB/hr          0.97 GB/hr         71% reduction
Error rate          0.12%              0.03%              75% reduction

Architecture Decisions and Rationale

Why not just optimize REST? We considered HTTP/2 + JSON with REST, which would have solved the connection multiplexing problem. But it wouldn't address serialization overhead, schema enforcement, or the lack of streaming. The migration cost was similar, so we chose the option with more long-term value.

Why NestJS over raw gRPC? The team was already using NestJS for the REST services. NestJS's microservice abstraction meant controllers, guards, interceptors, and dependency injection all worked the same way. The migration was mostly mechanical: swap @Get() and @Post() for @GrpcMethod(), update the transport configuration, and adjust error handling.

Why Envoy over client-side load balancing? With 14 services communicating via gRPC, client-side load balancing would require every service to implement health checking, retry logic, and circuit breaking. Envoy gave us all of that as infrastructure, plus mTLS termination, per-route rate limiting, and Prometheus metrics on every RPC call — without changing application code.

Why oneof instead of separate RPCs per event type? We considered having ReportDelay, ReportCustomsUpdate, and ReportDeliveryAttempt as separate RPCs. But this would have meant separate proto definitions, separate handlers, and separate client calls for what is fundamentally the same operation: recording an event against a shipment. The oneof pattern keeps a single RPC with type-safe variants.

Streaming migration strategy: We didn't switch to streaming everywhere on day one. The first phase replaced all unary REST calls with unary gRPC — same request-response pattern, different transport. Once that was stable, the second phase introduced server streaming for the carrier adapter polling use case. This two-phase approach limited the blast radius of each change.

The latency improvement came from three compounding factors: binary serialization (smaller payloads to parse), HTTP/2 multiplexing (eliminated connection setup overhead), and streaming (eliminated polling intervals). No single factor would have achieved the 8x improvement at p99 — it was the combination that mattered.

Summary

gRPC with NestJS is a strong foundation for high-throughput service-to-service communication, but it requires deliberate engineering decisions around schema design, error handling, load balancing, and observability. The patterns in this article — proto-first development, proper status code mapping, L7 load balancing, interceptor chains, and health checking — are not optional niceties. They are the difference between a gRPC system that works in development and one that survives production traffic.

Start with proto-first schemas, get your error interceptor right on day one, and deploy behind Envoy. Everything else can be iterated on.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

Backend Development

Server-side systems designed for correctness, observability, and horizontal scalability.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting co...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

A default Docker image built from `node:18` or `python:3.11` ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary p...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather tha...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pric...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public inter...

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there ...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

Backend Development📅 February 21, 2026· 19 min read

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

✍️

Stripe Systems Engineering

Why gRPC for Inter-Service Communication

Binary Serialization vs JSON

Protocol Buffers encode data in a compact binary format. A JSON payload carrying a shipment status update might look like this:

{
  "shipment_id": "SHP-2025-00482931",
  "status": "IN_TRANSIT",
  "location": {
    "latitude": 37.7749,
    "longitude": -122.4194
  },
  "timestamp": "2025-11-05T14:32:00Z",
  "carrier_code": "FEDX",
  "weight_kg": 12.5
}

HTTP/2 Multiplexing

✓No connection setup overhead per RPC call
✓Header compression via HPACK reduces repeated header bytes
✓Server push and flow control are built into the protocol

When gRPC Is Not the Right Choice

gRPC is a poor fit when:

✓Browser clients need direct access. gRPC-Web exists but adds a proxy layer and loses bidirectional streaming. Use REST or GraphQL for public APIs.
✓Human readability matters. Debugging protobuf requires tooling. JSON you can curl and read.
✓The team is small and services are few. The upfront cost of proto file management, code generation pipelines, and tooling setup only pays off with multiple services and high message volumes.
✓You need ad-hoc querying. GraphQL lets clients specify exactly the data shape they need. gRPC has fixed request/response shapes defined in the proto.

The decision is not REST vs gRPC — it's about picking the right tool for each communication boundary.

Protobuf Schema Design

Protobuf schemas are the contract between services. Getting the schema design right prevents painful migrations later.

Message Types and Service Definitions

syntax = "proto3";

package logistics.shipment.v1;

import "google/protobuf/timestamp.proto";
import "google/protobuf/empty.proto";

service ShipmentTrackingService {
  rpc GetShipment(GetShipmentRequest) returns (ShipmentResponse);
  rpc UpdateStatus(UpdateStatusRequest) returns (ShipmentResponse);
  rpc StreamUpdates(StreamUpdatesRequest) returns (stream ShipmentEvent);
  rpc BulkIngest(stream ShipmentEvent) returns (BulkIngestResponse);
}

message GetShipmentRequest {
  string shipment_id = 1;
}

message UpdateStatusRequest {
  string shipment_id = 1;
  ShipmentStatus status = 2;
  Location location = 3;
  string notes = 4;
}

message ShipmentResponse {
  string shipment_id = 1;
  ShipmentStatus status = 2;
  Location origin = 3;
  Location destination = 4;
  repeated ShipmentEvent history = 5;
  google.protobuf.Timestamp created_at = 6;
  google.protobuf.Timestamp updated_at = 7;
}

message ShipmentEvent {
  string event_id = 1;
  string shipment_id = 2;
  ShipmentStatus status = 3;
  Location location = 4;
  google.protobuf.Timestamp timestamp = 5;
  oneof detail {
    DelayInfo delay = 6;
    CustomsInfo customs = 7;
    DeliveryAttempt delivery_attempt = 8;
  }
}

message Location {
  double latitude = 1;
  double longitude = 2;
  string address = 3;
  string city = 4;
  string country_code = 5;
}

message DelayInfo {
  string reason = 1;
  int32 estimated_delay_minutes = 2;
}

message CustomsInfo {
  string declaration_id = 1;
  CustomsStatus customs_status = 2;
}

message DeliveryAttempt {
  int32 attempt_number = 1;
  string failure_reason = 2;
}

enum ShipmentStatus {
  SHIPMENT_STATUS_UNSPECIFIED = 0;
  CREATED = 1;
  PICKED_UP = 2;
  IN_TRANSIT = 3;
  OUT_FOR_DELIVERY = 4;
  DELIVERED = 5;
  FAILED_DELIVERY = 6;
  RETURNED = 7;
}

enum CustomsStatus {
  CUSTOMS_STATUS_UNSPECIFIED = 0;
  PENDING_CLEARANCE = 1;
  CLEARED = 2;
  HELD = 3;
}

message StreamUpdatesRequest {
  repeated string shipment_ids = 1;
  google.protobuf.Timestamp since = 2;
}

message BulkIngestResponse {
  int32 accepted = 1;
  int32 rejected = 2;
  repeated string failed_event_ids = 3;
}

Versioning Rules

Protobuf's backward compatibility hinges on field numbers:

✓Never reuse a field number. Once assigned, a field number is permanent. If you remove a field, mark it as reserved.
✓Never change a field's type. int32 to int64 is a breaking change, even though it seems safe.
✓Add new fields with new numbers. Old consumers will ignore unknown fields.
✓Use reserved to prevent accidental reuse:

message ShipmentResponse {
  reserved 8, 9;            // removed: eta_minutes, priority
  reserved "eta_minutes", "priority";
  // ...existing fields...
}

✓Use oneof for variant types instead of stuffing optional fields into a flat message. It makes the schema self-documenting and ensures only one variant is set.
✓Package with version suffixes (logistics.shipment.v1) so you can run v1 and v2 side by side during migration.

NestJS gRPC Transport Layer

Module Setup

Install the required packages:

npm install @nestjs/microservices @grpc/grpc-js @grpc/proto-loader

Configure the gRPC microservice in your bootstrap:

// src/main.ts
import { NestFactory } from '@nestjs/core';
import { MicroserviceOptions, Transport } from '@nestjs/microservices';
import { join } from 'path';
import { AppModule } from './app.module';

async function bootstrap() {
  const app = await NestFactory.createMicroservice<MicroserviceOptions>(
    AppModule,
    {
      transport: Transport.GRPC,
      options: {
        package: 'logistics.shipment.v1',
        protoPath: join(__dirname, '../proto/shipment.proto'),
        url: '0.0.0.0:50051',
        loader: {
          keepCase: true,
          longs: String,
          enums: String,
          defaults: true,
          oneofs: true,
        },
        channelOptions: {
          'grpc.max_receive_message_length': 4 * 1024 * 1024,
          'grpc.max_send_message_length': 4 * 1024 * 1024,
          'grpc.keepalive_time_ms': 10000,
          'grpc.keepalive_timeout_ms': 5000,
          'grpc.keepalive_permit_without_calls': 1,
        },
      },
    },
  );

  await app.listen();
  console.log('gRPC server listening on 0.0.0.0:50051');
}
bootstrap();

Controller with @GrpcMethod

// src/shipment/shipment.controller.ts
import { Controller } from '@nestjs/common';
import { GrpcMethod, GrpcStreamMethod } from '@nestjs/microservices';
import { Metadata, ServerUnaryCall } from '@grpc/grpc-js';
import { ShipmentService } from './shipment.service';
import {
  GetShipmentRequest,
  ShipmentResponse,
  UpdateStatusRequest,
} from '../generated/shipment';

@Controller()
export class ShipmentController {
  constructor(private readonly shipmentService: ShipmentService) {}

  @GrpcMethod('ShipmentTrackingService', 'GetShipment')
  async getShipment(
    data: GetShipmentRequest,
    metadata: Metadata,
    call: ServerUnaryCall<GetShipmentRequest, ShipmentResponse>,
  ): Promise<ShipmentResponse> {
    const tenantId = metadata.get('x-tenant-id')[0]?.toString();
    return this.shipmentService.findById(data.shipment_id, tenantId);
  }

  @GrpcMethod('ShipmentTrackingService', 'UpdateStatus')
  async updateStatus(
    data: UpdateStatusRequest,
    metadata: Metadata,
  ): Promise<ShipmentResponse> {
    return this.shipmentService.updateStatus(
      data.shipment_id,
      data.status,
      data.location,
      data.notes,
    );
  }
}

The third parameter of a @GrpcMethod handler gives you access to the raw gRPC call object, which is useful for setting response metadata or trailing metadata.

Client Setup

To call another gRPC service from NestJS, register the client in a module:

// src/shipment/shipment.module.ts
import { Module } from '@nestjs/common';
import { ClientsModule, Transport } from '@nestjs/microservices';
import { join } from 'path';
import { ShipmentController } from './shipment.controller';
import { ShipmentService } from './shipment.service';

@Module({
  imports: [
    ClientsModule.register([
      {
        name: 'ROUTING_PACKAGE',
        transport: Transport.GRPC,
        options: {
          package: 'logistics.routing.v1',
          protoPath: join(__dirname, '../../proto/routing.proto'),
          url: 'routing-service:50051',
          loader: {
            keepCase: true,
            longs: String,
            enums: String,
            defaults: true,
            oneofs: true,
          },
        },
      },
    ]),
  ],
  controllers: [ShipmentController],
  providers: [ShipmentService],
})
export class ShipmentModule {}

Then inject and use the client:

// src/shipment/shipment.service.ts
import { Inject, Injectable, OnModuleInit } from '@nestjs/common';
import { ClientGrpc } from '@nestjs/microservices';
import { firstValueFrom } from 'rxjs';

interface RoutingService {
  CalculateRoute(data: { origin: Location; destination: Location }): Observable<Route>;
}

@Injectable()
export class ShipmentService implements OnModuleInit {
  private routingService: RoutingService;

  constructor(@Inject('ROUTING_PACKAGE') private client: ClientGrpc) {}

  onModuleInit() {
    this.routingService =
      this.client.getService<RoutingService>('RoutingService');
  }

  async calculateRoute(origin: Location, destination: Location): Promise<Route> {
    return firstValueFrom(
      this.routingService.CalculateRoute({ origin, destination }),
    );
  }
}

Note the onModuleInit hook. getService must be called after the client connection is established, not in the constructor.

Streaming Patterns

gRPC supports four communication patterns. Each serves a distinct purpose.

Unary RPC

Standard request-response. Use for simple lookups and mutations. Already shown above with GetShipment.

Server-Side Streaming

The server sends a stream of messages in response to a single client request. Ideal for real-time feeds and change notifications.

@GrpcStreamMethod('ShipmentTrackingService', 'StreamUpdates')
streamUpdates(
  data: Observable<StreamUpdatesRequest>,
): Observable<ShipmentEvent> {
  // For server streaming in NestJS, the method receives the request
  // and returns an Observable that emits multiple responses
  return new Observable<ShipmentEvent>((subscriber) => {
    // Subscribe to shipment events from your event bus
    const subscription = this.eventBus
      .pipe(
        filter((event) => requestedIds.includes(event.shipment_id)),
      )
      .subscribe({
        next: (event) => subscriber.next(event),
        error: (err) => subscriber.error(err),
      });

    return () => subscription.unsubscribe();
  });
}

A more practical implementation using an async generator:

@GrpcMethod('ShipmentTrackingService', 'StreamUpdates')
async *streamUpdates(
  data: StreamUpdatesRequest,
): AsyncGenerator<ShipmentEvent> {
  const { shipment_ids, since } = data;

  // First, replay missed events
  const missed = await this.eventStore.getEventsSince(shipment_ids, since);
  for (const event of missed) {
    yield event;
  }

  // Then stream live events
  const channel = this.eventBus.subscribe(shipment_ids);
  try {
    for await (const event of channel) {
      yield event;
    }
  } finally {
    channel.unsubscribe();
  }
}

Client-Side Streaming

The client sends a stream of messages, and the server responds with a single message after the stream completes. Use for bulk uploads and batch ingestion.

@GrpcStreamMethod('ShipmentTrackingService', 'BulkIngest')
bulkIngest(
  messages: Observable<ShipmentEvent>,
): Observable<BulkIngestResponse> {
  return new Observable<BulkIngestResponse>((subscriber) => {
    let accepted = 0;
    let rejected = 0;
    const failedIds: string[] = [];

    messages.subscribe({
      next: async (event: ShipmentEvent) => {
        try {
          await this.shipmentService.processEvent(event);
          accepted++;
        } catch {
          rejected++;
          failedIds.push(event.event_id);
        }
      },
      complete: () => {
        subscriber.next({ accepted, rejected, failed_event_ids: failedIds });
        subscriber.complete();
      },
      error: (err) => subscriber.error(err),
    });
  });
}

Bidirectional Streaming

Both client and server send streams concurrently. Use for real-time collaboration, live tracking dashboards, and chat-like protocols.

service LiveTrackingService {
  rpc TrackVehicles(stream VehiclePosition) returns (stream RouteUpdate);
}

@GrpcStreamMethod('LiveTrackingService', 'TrackVehicles')
trackVehicles(
  positions: Observable<VehiclePosition>,
): Observable<RouteUpdate> {
  return new Observable<RouteUpdate>((subscriber) => {
    positions.subscribe({
      next: async (position: VehiclePosition) => {
        // Process incoming position, compute route deviation
        const update = await this.routeEngine.evaluate(position);
        if (update.deviation_meters > 500) {
          subscriber.next({
            vehicle_id: position.vehicle_id,
            recalculated_route: update.new_route,
            reason: 'ROUTE_DEVIATION',
          });
        }
      },
      complete: () => subscriber.complete(),
      error: (err) => subscriber.error(err),
    });
  });
}

Error Handling with gRPC Status Codes

gRPC defines 16 status codes. Mapping business errors to the correct status code is critical — clients depend on these codes for retry logic and error categorization.

gRPC Status	HTTP Equivalent	Use Case
`OK`	200	Success
`NOT_FOUND`	404	Resource doesn't exist
`ALREADY_EXISTS`	409	Duplicate creation
`INVALID_ARGUMENT`	400	Validation failure
`PERMISSION_DENIED`	403	Authorization failure
`UNAUTHENTICATED`	401	Missing/invalid credentials
`RESOURCE_EXHAUSTED`	429	Rate limit exceeded
`INTERNAL`	500	Unexpected server error
`UNAVAILABLE`	503	Transient failure (retry safe)
`DEADLINE_EXCEEDED`	504	Timeout

Error Interceptor

// src/interceptors/grpc-error.interceptor.ts
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
} from '@nestjs/common';
import { RpcException } from '@nestjs/microservices';
import { Observable, catchError, throwError } from 'rxjs';
import { status as GrpcStatus } from '@grpc/grpc-js';

@Injectable()
export class GrpcErrorInterceptor implements NestInterceptor {
  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    return next.handle().pipe(
      catchError((error) => {
        if (error instanceof RpcException) {
          return throwError(() => error);
        }

        const grpcError = this.mapToGrpcError(error);
        return throwError(() => new RpcException(grpcError));
      }),
    );
  }

  private mapToGrpcError(error: any): { code: number; message: string } {
    if (error.name === 'ValidationError') {
      return {
        code: GrpcStatus.INVALID_ARGUMENT,
        message: `Validation failed: ${error.message}`,
      };
    }

    if (error.name === 'EntityNotFoundError') {
      return {
        code: GrpcStatus.NOT_FOUND,
        message: error.message,
      };
    }

    if (error.code === 'ER_DUP_ENTRY' || error.code === '23505') {
      return {
        code: GrpcStatus.ALREADY_EXISTS,
        message: 'Resource already exists',
      };
    }

    if (error.name === 'UnauthorizedError') {
      return {
        code: GrpcStatus.UNAUTHENTICATED,
        message: 'Authentication required',
      };
    }

    // Default: don't leak internal details
    console.error('Unhandled gRPC error:', error);
    return {
      code: GrpcStatus.INTERNAL,
      message: 'Internal server error',
    };
  }
}

Apply it globally:

// src/app.module.ts
import { APP_INTERCEPTOR } from '@nestjs/core';

@Module({
  providers: [
    {
      provide: APP_INTERCEPTOR,
      useClass: GrpcErrorInterceptor,
    },
  ],
})
export class AppModule {}

Health Checking and Graceful Shutdown

gRPC Health Checking Protocol

gRPC defines a standard health checking protocol in grpc.health.v1. Load balancers and orchestrators (Kubernetes) use this to determine if a service can accept traffic.

// health.proto (standard — don't modify)
syntax = "proto3";

package grpc.health.v1;

service Health {
  rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
  rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

message HealthCheckRequest {
  string service = 1;
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
    SERVICE_UNKNOWN = 3;
  }
  ServingStatus status = 1;
}

NestJS Implementation

// src/health/health.controller.ts
import { Controller } from '@nestjs/common';
import { GrpcMethod } from '@nestjs/microservices';

interface HealthCheckRequest {
  service: string;
}

interface HealthCheckResponse {
  status: 'UNKNOWN' | 'SERVING' | 'NOT_SERVING' | 'SERVICE_UNKNOWN';
}

@Controller()
export class HealthController {
  private isShuttingDown = false;
  private readonly serviceChecks = new Map<string, () => Promise<boolean>>();

  registerCheck(serviceName: string, check: () => Promise<boolean>) {
    this.serviceChecks.set(serviceName, check);
  }

  setShuttingDown() {
    this.isShuttingDown = true;
  }

  @GrpcMethod('Health', 'Check')
  async check(data: HealthCheckRequest): Promise<HealthCheckResponse> {
    if (this.isShuttingDown) {
      return { status: 'NOT_SERVING' };
    }

    if (data.service === '') {
      // Empty service name = overall server health
      return { status: 'SERVING' };
    }

    const check = this.serviceChecks.get(data.service);
    if (!check) {
      return { status: 'SERVICE_UNKNOWN' };
    }

    const healthy = await check();
    return { status: healthy ? 'SERVING' : 'NOT_SERVING' };
  }
}

Graceful Shutdown

// src/main.ts (additions)
async function bootstrap() {
  const app = await NestFactory.createMicroservice<MicroserviceOptions>(
    AppModule,
    { /* ...gRPC options */ },
  );

  const healthController = app.get(HealthController);

  const shutdown = async (signal: string) => {
    console.log(`Received ${signal}. Starting graceful shutdown...`);

    // 1. Mark as not serving — stops new requests from load balancer
    healthController.setShuttingDown();

    // 2. Wait for in-flight requests to drain
    await new Promise((resolve) => setTimeout(resolve, 10_000));

    // 3. Close the application
    await app.close();
    process.exit(0);
  };

  process.on('SIGTERM', () => shutdown('SIGTERM'));
  process.on('SIGINT', () => shutdown('SIGINT'));

  await app.listen();
}

Load Balancing gRPC

This is where most teams get burned. Standard load balancing strategies that work for REST APIs break with gRPC.

The HTTP/2 Multiplexing Problem

Solutions

# Envoy configuration snippet
clusters:
  - name: shipment_service
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    http2_protocol_options: {}
    load_assignment:
      cluster_name: shipment_service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: shipment-service
                    port_value: 50051
    health_checks:
      - grpc_health_check: {}
        timeout: 2s
        interval: 10s
        unhealthy_threshold: 3
        healthy_threshold: 2

// Using DNS-based client-side load balancing
options: {
  url: 'dns:///shipment-service:50051',
  channelOptions: {
    'grpc.service_config': JSON.stringify({
      loadBalancingConfig: [{ round_robin: {} }],
    }),
  },
}

The dns:/// scheme tells gRPC to resolve the hostname to multiple A records and balance across them. This works well in Kubernetes where a headless service returns all pod IPs.

Interceptors

Logging Interceptor

// src/interceptors/grpc-logging.interceptor.ts
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
  Logger,
} from '@nestjs/common';
import { Observable, tap } from 'rxjs';

@Injectable()
export class GrpcLoggingInterceptor implements NestInterceptor {
  private readonly logger = new Logger('gRPC');

  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const rpcContext = context.switchToRpc();
    const handler = context.getHandler().name;
    const className = context.getClass().name;
    const method = `${className}.${handler}`;
    const start = performance.now();

    return next.handle().pipe(
      tap({
        next: () => {
          const duration = (performance.now() - start).toFixed(2);
          this.logger.log(`${method} completed in ${duration}ms`);
        },
        error: (err) => {
          const duration = (performance.now() - start).toFixed(2);
          this.logger.error(`${method} failed in ${duration}ms: ${err.message}`);
        },
      }),
    );
  }
}

Auth Interceptor

// src/interceptors/grpc-auth.interceptor.ts
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
} from '@nestjs/common';
import { RpcException } from '@nestjs/microservices';
import { Observable } from 'rxjs';
import { Metadata } from '@grpc/grpc-js';
import { status as GrpcStatus } from '@grpc/grpc-js';

@Injectable()
export class GrpcAuthInterceptor implements NestInterceptor {
  constructor(private readonly tokenService: TokenService) {}

  async intercept(
    context: ExecutionContext,
    next: CallHandler,
  ): Promise<Observable<any>> {
    const metadata: Metadata = context.switchToRpc().getContext();
    const authHeader = metadata.get('authorization')[0]?.toString();

    if (!authHeader) {
      throw new RpcException({
        code: GrpcStatus.UNAUTHENTICATED,
        message: 'Missing authorization metadata',
      });
    }

    const token = authHeader.replace('Bearer ', '');
    try {
      const claims = await this.tokenService.verify(token);
      // Attach claims to metadata for downstream handlers
      metadata.set('x-user-id', claims.sub);
      metadata.set('x-tenant-id', claims.tenantId);
    } catch {
      throw new RpcException({
        code: GrpcStatus.UNAUTHENTICATED,
        message: 'Invalid or expired token',
      });
    }

    return next.handle();
  }
}

OpenTelemetry Distributed Tracing

// src/interceptors/grpc-tracing.interceptor.ts
import {
  CallHandler,
  ExecutionContext,
  Injectable,
  NestInterceptor,
} from '@nestjs/common';
import { Observable, tap } from 'rxjs';
import { trace, SpanStatusCode, context, propagation } from '@opentelemetry/api';
import { Metadata } from '@grpc/grpc-js';

@Injectable()
export class GrpcTracingInterceptor implements NestInterceptor {
  private readonly tracer = trace.getTracer('grpc-server');

  intercept(ctx: ExecutionContext, next: CallHandler): Observable<any> {
    const metadata: Metadata = ctx.switchToRpc().getContext();
    const handler = ctx.getHandler().name;
    const service = ctx.getClass().name;

    // Extract trace context from incoming metadata
    const carrier: Record<string, string> = {};
    for (const [key, values] of Object.entries(metadata.getMap())) {
      carrier[key] = String(values);
    }
    const parentContext = propagation.extract(context.active(), carrier);

    return new Observable((subscriber) => {
      context.with(parentContext, () => {
        const span = this.tracer.startSpan(`grpc.${service}/${handler}`);
        span.setAttribute('rpc.system', 'grpc');
        span.setAttribute('rpc.service', service);
        span.setAttribute('rpc.method', handler);

        next.handle().pipe(
          tap({
            next: (value) => {
              span.setStatus({ code: SpanStatusCode.OK });
              subscriber.next(value);
            },
            error: (err) => {
              span.setStatus({
                code: SpanStatusCode.ERROR,
                message: err.message,
              });
              span.recordException(err);
              subscriber.error(err);
            },
            complete: () => {
              span.end();
              subscriber.complete();
            },
          }),
        ).subscribe();
      });
    });
  }
}

Proto-First vs Code-First Development

Proto-First

You write .proto files first, then generate TypeScript types, service interfaces, and client stubs.

Advantages:

✓Single source of truth for the API contract
✓Language-agnostic — generate clients in Go, Python, Java from the same proto
✓Forces you to think about the API design before implementation
✓Supports a review process for API changes (proto files in version control)

Disadvantages:

✓Requires a code generation pipeline (protoc, ts-proto, buf)
✓Generated code can be verbose or hard to customize
✓Extra build step that developers need to run

Code-First

You write TypeScript interfaces and decorators, and the framework infers the proto definition.

Advantages:

✓Faster iteration in early development
✓No build step for proto generation
✓Stays within the TypeScript ecosystem

Disadvantages:

✓Locks you into one language ecosystem
✓Proto files (if generated) may not match what you'd design by hand
✓Harder to enforce contract-first API design across teams

# buf.gen.yaml
version: v2
plugins:
  - remote: buf.build/community/timostamm-protobuf-ts
    out: src/generated
    opt:
      - long_type_string
      - output_javascript

Connection Pooling and Channel Management

A gRPC "channel" is a logical connection to a target. Under the hood, it manages one or more HTTP/2 connections, handles name resolution, load balancing, and reconnection.

Key behaviors to understand:

✓Channels are expensive to create. They involve DNS resolution, TCP handshake, and TLS negotiation. Create one channel per target service and reuse it.
✓Channels handle reconnection automatically. If the underlying connection drops, the channel reconnects with exponential backoff.
✓Subchannels map to backends. If DNS returns three IPs, the channel creates three subchannels and balances across them.
✓Idle channels get closed. By default, gRPC closes channels after 30 minutes of inactivity. Configure grpc.client_idle_timeout_ms if your traffic is bursty.

For high-throughput scenarios, a single HTTP/2 connection can bottleneck on stream concurrency (default: 100 concurrent streams per connection). You can increase this:

channelOptions: {
  'grpc.max_concurrent_streams': 1000,
}

Or, for extreme throughput, create a pool of channels manually:

@Injectable()
export class GrpcChannelPool implements OnModuleInit {
  private channels: ClientGrpc[] = [];
  private index = 0;

  constructor(
    @Inject('SHIPMENT_PACKAGE_0') private c0: ClientGrpc,
    @Inject('SHIPMENT_PACKAGE_1') private c1: ClientGrpc,
    @Inject('SHIPMENT_PACKAGE_2') private c2: ClientGrpc,
  ) {}

  onModuleInit() {
    this.channels = [this.c0, this.c1, this.c2];
  }

  getService<T>(serviceName: string): T {
    const channel = this.channels[this.index % this.channels.length];
    this.index++;
    return channel.getService<T>(serviceName);
  }
}

This round-robins across three underlying HTTP/2 connections, tripling your stream concurrency capacity.

Case Study: Logistics Platform at Scale

The Problem

✓p99 latency: 180ms for status update propagation
✓Payload overhead: JSON payloads averaged 1.2 KB per update
✓Connection churn: Each REST call opened a new HTTP/1.1 connection or contended on a limited keep-alive pool
✓No streaming: Carrier adapters polled the tracking service every 5 seconds, creating 2,400 requests/minute per adapter even when there were no updates
✓Schema drift: Different services had subtly different JSON shapes for the same entities, causing intermittent parsing failures

The Migration

We replaced REST-based inter-service communication with gRPC over a 6-week period. The migration was incremental — both protocols ran in parallel during the transition.

Core Proto Definitions

The ShipmentTrackingService proto shown earlier in this article is the actual schema used. The critical design decisions:

✓
oneof detail in ShipmentEvent — Different event types carry different metadata. Rather than a flat message with 15 optional fields, the oneof forces each event to specify exactly one detail type. This eliminated an entire class of bugs where handlers assumed a field was present but it was only set for certain event types.
✓
repeated ShipmentEvent history in ShipmentResponse — Embedding the full event history in the response eliminated the common pattern of "get shipment, then get events" — two round trips replaced by one.
✓
Server streaming for StreamUpdates — Replaced the polling pattern entirely. Carrier adapters open a stream and receive events as they happen.

NestJS Module Setup

// src/app.module.ts
import { Module } from '@nestjs/common';
import { APP_INTERCEPTOR } from '@nestjs/core';
import { ShipmentModule } from './shipment/shipment.module';
import { HealthModule } from './health/health.module';
import { GrpcErrorInterceptor } from './interceptors/grpc-error.interceptor';
import { GrpcLoggingInterceptor } from './interceptors/grpc-logging.interceptor';
import { GrpcTracingInterceptor } from './interceptors/grpc-tracing.interceptor';

@Module({
  imports: [ShipmentModule, HealthModule],
  providers: [
    { provide: APP_INTERCEPTOR, useClass: GrpcTracingInterceptor },
    { provide: APP_INTERCEPTOR, useClass: GrpcLoggingInterceptor },
    { provide: APP_INTERCEPTOR, useClass: GrpcErrorInterceptor },
  ],
})
export class AppModule {}

Results

After migrating all inter-service communication to gRPC:

Metric              REST (Before)      gRPC (After)       Improvement
─────────────────────────────────────────────────────────────────────
p50 latency         42ms               8ms                5.2x faster
p95 latency         110ms              16ms               6.9x faster
p99 latency         180ms              22ms               8.2x faster
Payload size (avg)  1,200 bytes        340 bytes          70% reduction
Connections/node    ~2,400             3 (multiplexed)    99.9% reduction
CPU (tracking svc)  72% avg            31% avg            57% reduction
Bandwidth           3.4 GB/hr          0.97 GB/hr         71% reduction
Error rate          0.12%              0.03%              75% reduction

Architecture Decisions and Rationale

Summary

Start with proto-first schemas, get your error interceptor right on day one, and deploy behind Envoy. Everything else can be iterated on.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

Backend Development

Server-side systems designed for correctness, observability, and horizontal scalability.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why gRPC for Inter-Service Communication

Binary Serialization vs JSON

HTTP/2 Multiplexing

When gRPC Is Not the Right Choice

Protobuf Schema Design

Message Types and Service Definitions

Versioning Rules

NestJS gRPC Transport Layer

Module Setup

Controller with @GrpcMethod

Client Setup

Streaming Patterns

Unary RPC

Server-Side Streaming

Client-Side Streaming

Bidirectional Streaming

Error Handling with gRPC Status Codes

Error Interceptor

Health Checking and Graceful Shutdown

gRPC Health Checking Protocol

NestJS Implementation

Graceful Shutdown

Load Balancing gRPC

The HTTP/2 Multiplexing Problem

Solutions

Interceptors

Logging Interceptor

Auth Interceptor

OpenTelemetry Distributed Tracing

Proto-First vs Code-First Development

Proto-First

Code-First

Connection Pooling and Channel Management

Case Study: Logistics Platform at Scale

The Problem

The Migration

Core Proto Definitions

NestJS Module Setup

Results

Architecture Decisions and Rationale

Summary

Related Services from Stripe Systems

Backend Development

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders