Backend Development📅 January 15, 2026· 21 min read

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

✍️

Stripe Systems Engineering

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting consequences for performance, maintainability, and team velocity. This post breaks down three dominant patterns, compares gateway frameworks honestly, and walks through a real architecture we shipped for a multi-platform SaaS product.

What an API Gateway Actually Does

An API gateway sits between your clients and your backend services. Its responsibilities typically include:

✓Routing — mapping external URLs to internal service endpoints
✓Authentication and authorization — validating tokens before requests reach services
✓Rate limiting — protecting services from abuse and enforcing usage tiers
✓Request/response transformation — reshaping payloads, filtering fields, translating protocols
✓Observability — centralized logging, distributed tracing headers, metrics collection
✓SSL termination — handling TLS at the edge so internal traffic can run over plain HTTP
✓Load balancing — distributing traffic across service instances

A common confusion: gateways are not service meshes. A service mesh (Istio, Linkerd) handles service-to-service (east-west) traffic with sidecar proxies. A gateway handles client-to-service (north-south) traffic. They solve different problems and often coexist. The gateway is your public edge; the mesh manages internal communication.

Pattern 1: Direct-to-Microservice

The simplest approach — clients call services directly.

Client → Service A (users)
Client → Service B (orders)
Client → Service C (inventory)

Each service exposes its own API, handles its own authentication, and manages its own CORS configuration.

When It Works

This is perfectly fine when you have a small team (under 10 engineers), fewer than 5 services, and a single client type. The overhead of a gateway adds complexity that may not pay off early.

When It Breaks Down

As you scale, problems accumulate:

Auth duplication. Every service validates JWTs independently. When you need to change your token format or add a new auth provider, you touch every service. One team forgets to update, and you have a security gap.

CORS management. Each service needs CORS headers configured for every allowed origin. A new frontend deployment means touching N services.

Versioning sprawl. Service A is on v2, Service B is still on v1. Clients need to know which version to call for each service. Your SDK becomes a compatibility matrix.

Client coupling. Mobile clients end up knowing the internal topology of your backend. When you split a service, every client needs an update.

Observability gaps. Distributed tracing requires every service to propagate headers consistently. Without a central point, one service that drops trace context breaks your entire trace chain.

The direct pattern trades short-term simplicity for long-term operational debt. Most teams cross the threshold where a gateway becomes necessary around 6-8 services or when they add a second client type.

Pattern 2: Aggregator / Composition Gateway

The aggregator pattern places a single gateway in front of all services. This gateway handles cross-cutting concerns and — critically — can compose responses from multiple backend calls into a single client response.

Client → Gateway → Service A
                 → Service B
                 → Service C

The gateway fans out requests, merges responses, and returns a unified payload to the client.

NestJS Aggregator Implementation

Here is a practical aggregator that calls three services to build a dashboard response. This handles partial failures, timeouts, and response merging:

// dashboard.controller.ts
import { Controller, Get, Req, HttpException, HttpStatus } from '@nestjs/common';
import { HttpService } from '@nestjs/axios';
import { firstValueFrom, timeout, catchError, of } from 'rxjs';

interface DashboardResponse {
  user: UserProfile | null;
  recentOrders: Order[] | null;
  notifications: Notification[] | null;
  _meta: {
    partial: boolean;
    failedSources: string[];
    responseTimeMs: number;
  };
}

@Controller('api/dashboard')
export class DashboardController {
  constructor(private readonly http: HttpService) {}

  @Get()
  async getDashboard(@Req() req: Request): Promise<DashboardResponse> {
    const start = Date.now();
    const token = req.headers['authorization'];
    const failedSources: string[] = [];

    const fetchWithFallback = async <T>(
      url: string,
      source: string,
      timeoutMs: number = 3000,
    ): Promise<T | null> => {
      try {
        const response = await firstValueFrom(
          this.http
            .get<T>(url, { headers: { authorization: token } })
            .pipe(
              timeout(timeoutMs),
              catchError((err) => {
                failedSources.push(source);
                console.error(`${source} failed: ${err.message}`);
                return of(null);
              }),
            ),
        );
        return response?.data ?? null;
      } catch {
        failedSources.push(source);
        return null;
      }
    };

    // Fan out requests in parallel
    const [user, recentOrders, notifications] = await Promise.all([
      fetchWithFallback<UserProfile>(
        'http://user-service:3001/api/profile',
        'user-service',
        2000,
      ),
      fetchWithFallback<Order[]>(
        'http://order-service:3002/api/orders?limit=5',
        'order-service',
        3000,
      ),
      fetchWithFallback<Notification[]>(
        'http://notification-service:3003/api/unread',
        'notification-service',
        1500,
      ),
    ]);

    // If the user service fails, the response is meaningless
    if (!user) {
      throw new HttpException(
        'Core service unavailable',
        HttpStatus.SERVICE_UNAVAILABLE,
      );
    }

    return {
      user,
      recentOrders,
      notifications,
      _meta: {
        partial: failedSources.length > 0,
        failedSources,
        responseTimeMs: Date.now() - start,
      },
    };
  }
}

Partial Failure Handling

The _meta field in the response is deliberate. Clients can inspect partial: true and failedSources to show degraded UI instead of a full error page. The notification service being down should not prevent a user from seeing their dashboard.

The timeout hierarchy matters too — the user service gets 2000ms (critical path), orders get 3000ms (can be stale), and notifications get 1500ms (low-priority, fail fast). These numbers should come from your p99 latency measurements, not guesswork.

Tradeoffs

The aggregator gateway centralizes cross-cutting logic, but it also centralizes risk. A bug in the gateway takes down everything. It also becomes a coordination bottleneck — if the orders team and notifications team both need gateway changes, they compete for merge priority. This is where the BFF pattern enters.

Pattern 3: Backend for Frontend (BFF)

BFF takes the aggregator idea and splits it by client type. Instead of one gateway serving all clients, you deploy separate gateways — each tailored to its consumer.

Web App     → Web BFF     → Services
Mobile App  → Mobile BFF  → Services
Partner API → Partner BFF → Services

Why Different Clients Need Different APIs

A web dashboard rendering a user profile page might need:

✓Full user profile with avatar URL, bio, social links
✓Last 20 orders with full line items and tracking URLs
✓50 most recent notifications with HTML-formatted bodies

A mobile app showing the same screen needs:

✓User name, avatar thumbnail URL (not the full-res URL)
✓Last 5 orders with status and total only
✓10 unread notification summaries (plain text, not HTML)

Serving the web payload to mobile wastes bandwidth, drains battery parsing unused fields, and increases render time. Serving the mobile payload to web gives an incomplete experience.

The Same Data Through Two BFFs

Web BFF response (~4.2 KB):

{
  "user": {
    "id": "usr_8a3f2",
    "name": "Dana Chen",
    "email": "[email protected]",
    "avatar": "https://cdn.example.com/avatars/usr_8a3f2/full.jpg",
    "bio": "Engineering lead focused on distributed systems.",
    "joinedAt": "2023-01-15T00:00:00Z",
    "socialLinks": {
      "github": "https://github.com/danachen",
      "linkedin": "https://linkedin.com/in/danachen"
    }
  },
  "recentOrders": [
    {
      "id": "ord_f91a",
      "status": "delivered",
      "total": 149.99,
      "currency": "USD",
      "items": [
        { "name": "Ergonomic Keyboard", "qty": 1, "price": 149.99 }
      ],
      "trackingUrl": "https://tracking.example.com/ord_f91a",
      "createdAt": "2025-07-28T14:30:00Z"
    }
  ],
  "notifications": [
    {
      "id": "ntf_3b2a",
      "title": "Order Delivered",
      "body": "<p>Your order <strong>ord_f91a</strong> has been delivered.</p>",
      "read": false,
      "createdAt": "2025-07-30T09:15:00Z"
    }
  ]
}

Mobile BFF response (~1.1 KB):

{
  "user": {
    "id": "usr_8a3f2",
    "name": "Dana Chen",
    "avatarThumb": "https://cdn.example.com/avatars/usr_8a3f2/thumb_64.jpg"
  },
  "recentOrders": [
    {
      "id": "ord_f91a",
      "status": "delivered",
      "total": "$149.99"
    }
  ],
  "unreadCount": 3,
  "notifications": [
    {
      "id": "ntf_3b2a",
      "title": "Order Delivered",
      "summary": "Your order ord_f91a has been delivered."
    }
  ]
}

The mobile BFF drops social links, full-resolution images, HTML notification bodies, and line item details. It pre-formats the currency string (the mobile app does not need to carry locale-aware currency formatting logic). The payload is 74% smaller.

BFF Ownership

Each BFF should be owned by the team building its client. The web team owns the web BFF. The mobile team owns the mobile BFF. This aligns incentives — the people who know what the client needs control the API shaping layer. It follows Conway's Law intentionally rather than fighting it.

Gateway Frameworks: Honest Comparison

Framework	Best For	Latency Overhead	Auth Built-in	Customization	Pricing Model
Kong	Plugin-heavy setups	1-3ms	Yes (plugins)	Lua/Go plugins	Open source + Enterprise
AWS API Gateway	Serverless stacks	5-15ms (cold)	Cognito/Lambda authorizers	Limited to AWS transforms	Per-request ($3.50/million)
Azure APIM	Enterprise/.NET shops	3-8ms	AAD, certificates	XML policies	Tiered (starts ~$0.80/hr)
NGINX	Raw throughput	<1ms	Module-based	C modules, Lua (OpenResty)	Open source + NGINX Plus
Custom NestJS	Full control	2-5ms	You build it	Unlimited	Your infra cost

Kong has the richest plugin ecosystem (300+), but debugging Lua plugins in production is unpleasant. Use it when off-the-shelf plugins cover 80%+ of your needs.

AWS API Gateway is the right choice if you are already on Lambda. The per-request pricing looks cheap until you hit scale — 100M requests/month costs $350 before you count data transfer. Cold start latency on HTTP APIs is lower than REST APIs, but still noticeable.

Azure APIM has powerful policy expressions (think XSLT for HTTP), but the XML policy language has a steep learning curve. The developer portal is genuinely useful for partner APIs.

NGINX is the performance baseline everything else is measured against. For pure reverse-proxy-with-auth, nothing beats it. But building complex aggregation logic in Lua or C modules is painful.

Custom NestJS (or Express, Fastify, Go) gives you the most flexibility. You write TypeScript, test it like any other service, and deploy it like any other service. The cost is that you own every feature — rate limiting, circuit breakers, caching. For BFF gateways, this is often the right choice because the gateway logic is inherently application-specific.

Authentication at the Gateway

The gateway is the natural place to verify identity before requests reach services. Three common strategies:

JWT Validation (Stateless, Fast)

The gateway validates the JWT signature and expiration without calling any external service. This adds sub-millisecond overhead.

// jwt-auth.middleware.ts
import { Injectable, NestMiddleware, UnauthorizedException } from '@nestjs/common';
import { Request, Response, NextFunction } from 'express';
import * as jose from 'jose';

@Injectable()
export class JwtAuthMiddleware implements NestMiddleware {
  private jwks: jose.JWTVerifyGetKey;

  constructor() {
    // Cache the JWKS endpoint. jose handles key rotation automatically.
    this.jwks = jose.createRemoteJWKSet(
      new URL('https://auth.example.com/.well-known/jwks.json'),
    );
  }

  async use(req: Request, res: Response, next: NextFunction) {
    const authHeader = req.headers.authorization;

    if (!authHeader?.startsWith('Bearer ')) {
      throw new UnauthorizedException('Missing bearer token');
    }

    try {
      const token = authHeader.slice(7);
      const { payload } = await jose.jwtVerify(token, this.jwks, {
        issuer: 'https://auth.example.com',
        audience: 'api.example.com',
      });

      // Attach decoded claims for downstream services
      req['user'] = {
        sub: payload.sub,
        email: payload.email,
        roles: payload.roles,
        orgId: payload.org_id,
      };

      // Propagate user context as headers for downstream services
      req.headers['x-user-id'] = payload.sub as string;
      req.headers['x-user-roles'] = (payload.roles as string[]).join(',');
      req.headers['x-org-id'] = payload.org_id as string;
    } catch (err) {
      if (err instanceof jose.errors.JWTExpired) {
        throw new UnauthorizedException('Token expired');
      }
      throw new UnauthorizedException('Invalid token');
    }

    next();
  }
}

Token propagation is the key detail. The gateway validates the JWT, extracts claims, and forwards them as trusted internal headers (x-user-id, x-user-roles). Downstream services trust these headers because internal traffic is network-restricted. They never need to re-validate the JWT.

OAuth2 Token Introspection (Accurate, Slower)

For tokens that can be revoked (opaque tokens, reference tokens), the gateway calls the authorization server's introspection endpoint. This adds 5-20ms per request but guarantees the token has not been revoked. Use this for high-security operations (financial transactions, admin actions). Cache introspection results for 30-60 seconds to reduce latency for repeated requests.

API Keys (Simple, for Partners)

API keys are not authentication in the identity sense — they identify an integration. Use them for partner APIs where the "user" is an organization, not a person. Store hashed keys in a fast lookup store (Redis or DynamoDB), and map them to rate-limit tiers and permission scopes.

Rate Limiting Strategies

Rate limiting at the gateway protects your services from traffic spikes and enforces business-tier limits.

Algorithm Comparison

Token bucket — tokens refill at a fixed rate. Allows short bursts. Best for APIs where occasional spikes are acceptable.

Sliding window — counts requests in a rolling time window. More accurate than fixed window, avoids the boundary burst problem. Higher memory cost (stores individual timestamps or uses multiple counters).

Fixed window — counts requests in discrete time windows (e.g., per minute). Simple to implement but allows up to 2x the limit at window boundaries (a burst at :59 and :00).

Redis-Backed Sliding Window

For distributed deployments with multiple gateway instances, rate limiting state must be shared. Redis is the standard choice:

// rate-limiter.service.ts
import { Injectable } from '@nestjs/common';
import Redis from 'ioredis';

@Injectable()
export class RateLimiterService {
  private redis: Redis;

  constructor() {
    this.redis = new Redis({
      host: process.env.REDIS_HOST,
      port: 6379,
      enableReadyCheck: true,
    });
  }

  async checkLimit(
    key: string,
    maxRequests: number,
    windowMs: number,
  ): Promise<{ allowed: boolean; remaining: number; retryAfterMs: number }> {
    const now = Date.now();
    const windowStart = now - windowMs;

    // Lua script for atomic sliding window check
    const script = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local window_start = tonumber(ARGV[2])
      local max_requests = tonumber(ARGV[3])
      local window_ms = tonumber(ARGV[4])

      -- Remove expired entries
      redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

      -- Count current requests in window
      local current = redis.call('ZCARD', key)

      if current < max_requests then
        -- Add this request
        redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
        redis.call('PEXPIRE', key, window_ms)
        return {1, max_requests - current - 1, 0}
      else
        -- Get oldest entry to calculate retry-after
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retry_after = 0
        if #oldest > 0 then
          retry_after = tonumber(oldest[2]) + window_ms - now
        end
        return {0, 0, retry_after}
      end
    `;

    const result = (await this.redis.eval(
      script, 1, key, now, windowStart, maxRequests, windowMs,
    )) as number[];

    return {
      allowed: result[0] === 1,
      remaining: result[1],
      retryAfterMs: result[2],
    };
  }
}

The Lua script runs atomically on Redis — no race conditions between gateway instances. The sorted set stores each request with its timestamp as the score, making window sliding a matter of removing entries below the threshold.

Tier-Based Configuration

Rate limits should reflect both the endpoint sensitivity and the client tier:

# rate-limits.yaml
tiers:
  free:
    global: 100/minute
    endpoints:
      /api/search: 20/minute
      /api/export: 5/hour
  pro:
    global: 1000/minute
    endpoints:
      /api/search: 200/minute
      /api/export: 50/hour
  enterprise:
    global: 10000/minute
    endpoints:
      /api/search: 2000/minute
      /api/export: 500/hour
  partner:
    global: 5000/minute
    endpoints:
      /api/bulk-import: 100/hour
      /api/webhooks/register: 10/minute

Request/Response Transformation

Gateways can reshape data between clients and services without changing either side.

Field filtering — strip internal fields (_internalScore, debugInfo) before returning to clients. Add fields for specific clients (mobile gets formattedPrice, web gets raw price + currency).

Protocol translation — accept REST from clients, proxy as gRPC to internal services. This is increasingly common as teams adopt gRPC internally but maintain REST externally.

Header injection — add correlation IDs, propagate trace context, inject client metadata for analytics.

GraphQL Federation as an Alternative

GraphQL federation (Apollo Federation, or the newer WunderGraph Cosmo) acts as a declarative gateway. Each service exposes a GraphQL subgraph; the federation router composes them into a single schema.

# User subgraph
type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
}

# Order subgraph
type User @key(fields: "id") {
  id: ID!
  orders: [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
}

The federation router resolves { user { name orders { total } } } by calling the user subgraph for profile data and the order subgraph for orders, then merging them.

When GraphQL replaces BFF: If your clients are all web/JS-based and your teams are comfortable with GraphQL, federation can eliminate the need for BFF entirely. Clients request exactly the fields they need — mobile queries request fewer fields, web queries request more. The downside is that GraphQL adds its own complexity (N+1 query problems, query cost analysis, caching challenges). Do not adopt GraphQL just to avoid building a BFF. Adopt it if your data graph is genuinely relational and your clients have diverse data needs.

Circuit Breakers and Retries at the Gateway

When a downstream service degrades, the gateway should fail fast rather than pile up requests that will timeout.

Circuit Breaker States

✓Closed — requests flow normally. Failures are counted.
✓Open — requests are immediately rejected with 503. No traffic reaches the degraded service.
✓Half-open — a single probe request is allowed through. If it succeeds, the circuit closes. If it fails, it reopens.

// circuit-breaker.ts
interface CircuitBreakerConfig {
  failureThreshold: number;    // failures before opening
  resetTimeoutMs: number;      // how long to stay open
  halfOpenMaxAttempts: number;  // probes before closing
}

class CircuitBreaker {
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private failureCount = 0;
  private lastFailureTime = 0;
  private halfOpenAttempts = 0;

  constructor(
    private readonly name: string,
    private readonly config: CircuitBreakerConfig,
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailureTime > this.config.resetTimeoutMs) {
        this.state = 'half-open';
        this.halfOpenAttempts = 0;
      } else {
        throw new CircuitOpenError(
          `Circuit ${this.name} is open. Retry after ${this.config.resetTimeoutMs}ms.`,
        );
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    if (this.state === 'half-open') {
      this.halfOpenAttempts++;
      if (this.halfOpenAttempts >= this.config.halfOpenMaxAttempts) {
        this.state = 'closed';
        this.failureCount = 0;
      }
    } else {
      this.failureCount = 0;
    }
  }

  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.config.failureThreshold) {
      this.state = 'open';
    }
  }
}

Retry Budgets and Amplification

Retries at the gateway are dangerous. If the user service has 3 replicas and each is slightly slow, a gateway that retries 3 times can turn 1 client request into 9 backend requests. This is retry amplification — the thing meant to improve reliability makes an outage worse.

Mitigations:

✓Retry budgets — allow retries only when the overall failure rate is below a threshold (e.g., retry only if fewer than 10% of recent requests have been retries). This caps the amplification factor.
✓Retry only on specific errors — retry on 502/503 (upstream unavailable), never on 500 (application bug) or 429 (rate limited).
✓No retries on non-idempotent operations — never retry a POST that creates a resource unless the API supports idempotency keys.
✓Exponential backoff with jitter — avoid synchronized retry storms across gateway instances.

Performance Overhead

An API gateway adds a network hop. The real-world cost:

Gateway Type	Added Latency	Notes
NGINX reverse proxy	0.5-1ms	Kernel-level optimizations
Kong / Envoy	1-3ms	Plugin chain adds overhead
Custom Node.js gateway	2-5ms	Depends on middleware depth
AWS API Gateway (HTTP)	5-10ms	Managed service overhead
AWS API Gateway (REST)	10-30ms	More features, more latency

When to Bypass the Gateway

Service-to-service (east-west) calls should not go through the API gateway. If the order service needs to call the user service, it calls it directly (or through a service mesh). Routing internal traffic through the gateway adds unnecessary latency, creates a bottleneck, and complicates routing rules.

Single Point of Failure Mitigation

The gateway is on the critical path for every request. If it goes down, everything is down.

✓Deploy at least 3 gateway instances behind a load balancer
✓Use health checks with aggressive failure detection (2-second intervals, 2 failures to remove)
✓Keep gateway logic minimal — the less code, the fewer failure modes
✓Implement graceful degradation — if Redis (rate limiting) is unavailable, fall back to local in-memory limits rather than rejecting all traffic
✓Run canary deployments — route 5% of traffic to a new gateway version before rolling it out

Decision Matrix

Use this table to match your situation to a pattern:

┌──────────────────────┬──────────────┬──────────────┬──────────────┐
│ Factor               │ Direct       │ Aggregator   │ BFF          │
├──────────────────────┼──────────────┼──────────────┼──────────────┤
│ Client types         │ 1            │ 1-2          │ 3+           │
│ Backend services     │ < 5          │ 5-15         │ Any          │
│ Team size            │ < 10         │ 10-30        │ 30+          │
│ Payload differences  │ None         │ Minor        │ Significant  │
│ Auth complexity      │ Simple       │ Moderate     │ Per-client   │
│ Operational maturity │ Low          │ Medium       │ High         │
│ Time to implement    │ None         │ Weeks        │ Months       │
│ Per-client optimize  │ No           │ Limited      │ Full control │
│ Team autonomy        │ High         │ Bottleneck   │ High         │
│ Conway's alignment   │ Monolith     │ Platform     │ Client teams │
└──────────────────────┴──────────────┴──────────────┴──────────────┘

Key heuristics:

✓If you have one client and a handful of services, start with direct access. Add a lightweight reverse proxy (NGINX, Caddy) for SSL and CORS.
✓If you are composing data from multiple services for a single client type, an aggregator gateway is sufficient. Keep it thin.
✓If different clients need fundamentally different representations of the same data, BFF is the right pattern. The implementation cost is real, but the alternative — a bloated "universal" API that satisfies nobody — is worse over time.
✓Conway's Law applies. If your org has distinct web, mobile, and platform teams, BFF mirrors that structure. If you have one full-stack team, an aggregator is more natural.

Case Study: Multi-Platform SaaS with BFF Architecture

A B2B SaaS company approached Stripe Systems to redesign their API layer. They had a web application, an iOS/Android mobile app, a partner REST API, and webhook consumers — all hitting a single monolithic API gateway. The backend consisted of 12 microservices covering user management, billing, analytics, document storage, notifications, audit logging, search, permissions, workflow engine, integrations, reporting, and real-time messaging.

The monolithic gateway had become a bottleneck. Mobile team changes to response payloads required coordinating with the web team. Partner API rate limiting was coupled with internal rate limiting. Average gateway latency had crept to 45ms because of accumulated middleware for every client type. Deployment frequency for the gateway had dropped to once per week because every deployment risked all client types.

Architecture

Stripe Systems designed a three-BFF architecture with a shared infrastructure layer:

                            ┌─────────────────────────┐
                            │     Load Balancer        │
                            │   (path-based routing)   │
                            └─────┬──────┬──────┬──────┘
                                  │      │      │
                    ┌─────────────┘      │      └──────────────┐
                    │                    │                      │
              ┌─────▼─────┐       ┌─────▼──────┐       ┌──────▼──────┐
              │  Web BFF   │       │ Mobile BFF │       │ Partner BFF │
              │  /web/*    │       │ /mobile/*  │       │ /partner/*  │
              │  NestJS    │       │  NestJS    │       │  NestJS     │
              │  Port 3010 │       │  Port 3020 │       │  Port 3030  │
              └─────┬──────┘       └─────┬──────┘       └──────┬──────┘
                    │                    │                      │
              ┌─────▼────────────────────▼──────────────────────▼──────┐
              │              Shared Service Mesh (Linkerd)              │
              ├────────┬─────────┬──────────┬──────────┬───────────────┤
              │ Users  │ Billing │Analytics │  Docs    │ Notifications │
              │ Perms  │ Search  │ Workflow │ Reports  │  Messaging    │
              │        │ Audit   │ Integr.  │          │               │
              └────────┴─────────┴──────────┴──────────┴───────────────┘

The load balancer routes by URL prefix. Each BFF runs as an independent NestJS service with its own deployment pipeline, owned by its respective client team.

A shared library (@internal/gateway-core) provides common middleware — JWT validation, distributed tracing, structured logging, circuit breakers — so each BFF does not reinvent these pieces.

Routing Configuration

Each BFF defines routes relevant to its client type:

// mobile-bff/src/app.module.ts — Mobile BFF routes
const routes: Routes = [
  {
    path: '/mobile/v1',
    children: [
      { path: '/auth',       module: MobileAuthModule },
      { path: '/dashboard',  module: MobileDashboardModule },
      { path: '/documents',  module: MobileDocumentsModule },
      { path: '/search',     module: MobileSearchModule },
    ],
  },
];

// partner-bff/src/app.module.ts — Partner BFF routes
const routes: Routes = [
  {
    path: '/partner/v1',
    children: [
      { path: '/users',      module: PartnerUsersModule },
      { path: '/documents',  module: PartnerDocumentsModule },
      { path: '/webhooks',   module: PartnerWebhooksModule },
      { path: '/bulk',       module: PartnerBulkOpsModule },
    ],
  },
];

The mobile BFF has a /dashboard route (aggregated view) that does not exist in the partner BFF. The partner BFF has /bulk and /webhooks routes that make no sense for mobile. Each BFF exposes only what its client needs.

Mobile Payload Optimization

The most measurable win was payload reduction. The mobile BFF aggressively trims, reshapes, and pre-computes data.

Document list — Web BFF response (~8.6 KB for 10 documents):

{
  "documents": [
    {
      "id": "doc_29fa",
      "title": "Q3 Revenue Analysis",
      "content_preview": "Revenue for Q3 showed a 12% increase...",
      "author": {
        "id": "usr_8a3f2",
        "name": "Dana Chen",
        "email": "[email protected]",
        "avatar": "https://cdn.example.com/avatars/usr_8a3f2/full.jpg",
        "department": "Finance"
      },
      "collaborators": [
        { "id": "usr_c1b3", "name": "Sam Park", "role": "editor" },
        { "id": "usr_d4e5", "name": "Alex Rivera", "role": "viewer" }
      ],
      "tags": ["finance", "quarterly", "revenue"],
      "permissions": { "canEdit": true, "canShare": true, "canDelete": false },
      "versions": 14,
      "currentVersion": "v14",
      "wordCount": 3420,
      "lastEditedAt": "2025-07-29T16:45:00Z",
      "createdAt": "2025-07-01T09:00:00Z",
      "attachments": [
        { "name": "chart.png", "size": 245000, "url": "https://cdn.example.com/..." }
      ]
    }
  ],
  "pagination": { "page": 1, "pageSize": 20, "total": 142 },
  "facets": {
    "tags": [{ "name": "finance", "count": 23 }, { "name": "engineering", "count": 45 }],
    "authors": [{ "id": "usr_8a3f2", "name": "Dana Chen", "count": 12 }]
  }
}

Document list — Mobile BFF response (~3.4 KB for 10 documents):

{
  "documents": [
    {
      "id": "doc_29fa",
      "title": "Q3 Revenue Analysis",
      "authorName": "Dana Chen",
      "authorThumb": "https://cdn.example.com/avatars/usr_8a3f2/thumb_32.jpg",
      "tags": ["finance", "quarterly"],
      "canEdit": true,
      "updatedAgo": "2d",
      "hasAttachments": true
    }
  ],
  "page": 1,
  "hasMore": true
}

Changes in the mobile BFF:

✓Flattened author object to two fields (authorName, authorThumb)
✓Removed collaborators, permissions detail, versions, wordCount, content_preview
✓Pre-computed updatedAgo (relative time string) server-side so the mobile app skips time-zone-aware date formatting
✓Replaced attachments array with a boolean hasAttachments
✓Replaced pagination object with a simple hasMore boolean (mobile uses infinite scroll, not page numbers)
✓Dropped facets entirely (mobile search UI does not support faceted navigation)
✓Limited tags to 2 per document instead of all tags

Result: 60% payload reduction on the document list endpoint. Across all mobile endpoints, the average reduction was 55%, which translated to measurable improvements in app performance on slower cellular connections.

Rate Limiting Per Tier

Each BFF enforces tier-appropriate limits. The partner BFF has the most granular controls:

┌────────────────┬────────────┬────────────┬─────────────┬────────────┐
│ Endpoint Group │ Free       │ Pro        │ Enterprise  │ Partner    │
├────────────────┼────────────┼────────────┼─────────────┼────────────┤
│ Read (list,get)│ 60/min     │ 600/min    │ 6000/min    │ 3000/min   │
│ Write (create) │ 20/min     │ 200/min    │ 2000/min    │ 1000/min   │
│ Search         │ 10/min     │ 100/min    │ 1000/min    │ 500/min    │
│ Export/Report  │ 2/hour     │ 20/hour    │ 200/hour    │ 50/hour    │
│ Bulk import    │ N/A        │ N/A        │ 50/hour     │ 100/hour   │
│ Webhook reg.   │ N/A        │ N/A        │ N/A         │ 10/min     │
│ Burst allowance│ 1.5x/10s   │ 2x/10s    │ 3x/10s     │ 2x/10s    │
└────────────────┴────────────┴────────────┴─────────────┴────────────┘

Partner clients get higher bulk import limits but lower read limits than enterprise users because their access patterns are batch-oriented rather than interactive. The burst allowance uses token bucket on top of the sliding window — enterprise clients can briefly exceed their steady-state limit by 3x for 10-second windows, accommodating dashboard page loads that trigger parallel API calls.

Performance Results

Measured over 30 days after the BFF migration, compared to the 30 days before:

┌─────────────────────────┬────────────┬────────────┬───────────┐
│ Metric                  │ Before     │ After      │ Change    │
├─────────────────────────┼────────────┼────────────┼───────────┤
│ Web p50 latency         │ 42ms       │ 28ms       │ -33%      │
│ Web p99 latency         │ 310ms      │ 145ms      │ -53%      │
│ Mobile p50 latency      │ 48ms       │ 19ms       │ -60%      │
│ Mobile p99 latency      │ 520ms      │ 110ms      │ -79%      │
│ Partner p50 latency     │ 45ms       │ 31ms       │ -31%      │
│ Gateway deploy freq.    │ 1/week     │ 12/week    │ +12x      │
│ Gateway incidents/month │ 4.2        │ 0.8        │ -81%      │
│ Mobile payload (avg)    │ 8.1 KB     │ 3.2 KB     │ -60%      │
│ Cache hit rate          │ 22%        │ 61%        │ +177%     │
└─────────────────────────┴────────────┴────────────┴───────────┘

The mobile latency improvements were the most dramatic. The old gateway was serializing fields the mobile app never used, running middleware (HTML sanitization, facet computation) that only the web client needed, and returning payloads that the mobile app parsed and discarded 60% of. The mobile BFF eliminated all of that. The p99 improvement from 520ms to 110ms came from two factors: smaller payloads (fewer bytes over the wire) and removing unnecessary middleware from the mobile request path.

The cache hit rate jumped because each BFF caches at its own granularity. The web BFF caches full document responses for 30 seconds. The mobile BFF caches trimmed responses for 2 minutes (mobile users tolerate slightly stale data). The old unified gateway could only cache at the lowest common denominator.

Deploy frequency improved because each BFF is independently deployable. The mobile team ships without coordinating with the web team. A bad deployment to the partner BFF does not affect mobile or web users.

Conclusion

API gateway pattern selection is a structural decision that shapes how your teams work, how your services evolve, and how your clients perform. Start with the simplest pattern that works (often direct access or a thin reverse proxy), and move to aggregator or BFF patterns when the problems they solve become real in your system — not before. The BFF pattern carries real implementation and operational cost, but for multi-client platforms at scale, the payload optimization, team autonomy, and fault isolation it provides are difficult to achieve any other way.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

Backend Development

Server-side systems designed for correctness, observability, and horizontal scalability.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

A default Docker image built from `node:18` or `python:3.11` ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary p...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather tha...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pric...

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers me...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public inter...

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there ...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

Backend Development📅 January 15, 2026· 21 min read

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

✍️

Stripe Systems Engineering

What an API Gateway Actually Does

An API gateway sits between your clients and your backend services. Its responsibilities typically include:

✓Routing — mapping external URLs to internal service endpoints
✓Authentication and authorization — validating tokens before requests reach services
✓Rate limiting — protecting services from abuse and enforcing usage tiers
✓Request/response transformation — reshaping payloads, filtering fields, translating protocols
✓Observability — centralized logging, distributed tracing headers, metrics collection
✓SSL termination — handling TLS at the edge so internal traffic can run over plain HTTP
✓Load balancing — distributing traffic across service instances

Pattern 1: Direct-to-Microservice

The simplest approach — clients call services directly.

Client → Service A (users)
Client → Service B (orders)
Client → Service C (inventory)

Each service exposes its own API, handles its own authentication, and manages its own CORS configuration.

When It Works

This is perfectly fine when you have a small team (under 10 engineers), fewer than 5 services, and a single client type. The overhead of a gateway adds complexity that may not pay off early.

When It Breaks Down

As you scale, problems accumulate:

CORS management. Each service needs CORS headers configured for every allowed origin. A new frontend deployment means touching N services.

Versioning sprawl. Service A is on v2, Service B is still on v1. Clients need to know which version to call for each service. Your SDK becomes a compatibility matrix.

Client coupling. Mobile clients end up knowing the internal topology of your backend. When you split a service, every client needs an update.

Observability gaps. Distributed tracing requires every service to propagate headers consistently. Without a central point, one service that drops trace context breaks your entire trace chain.

Pattern 2: Aggregator / Composition Gateway

Client → Gateway → Service A
                 → Service B
                 → Service C

The gateway fans out requests, merges responses, and returns a unified payload to the client.

NestJS Aggregator Implementation

Here is a practical aggregator that calls three services to build a dashboard response. This handles partial failures, timeouts, and response merging:

// dashboard.controller.ts
import { Controller, Get, Req, HttpException, HttpStatus } from '@nestjs/common';
import { HttpService } from '@nestjs/axios';
import { firstValueFrom, timeout, catchError, of } from 'rxjs';

interface DashboardResponse {
  user: UserProfile | null;
  recentOrders: Order[] | null;
  notifications: Notification[] | null;
  _meta: {
    partial: boolean;
    failedSources: string[];
    responseTimeMs: number;
  };
}

@Controller('api/dashboard')
export class DashboardController {
  constructor(private readonly http: HttpService) {}

  @Get()
  async getDashboard(@Req() req: Request): Promise<DashboardResponse> {
    const start = Date.now();
    const token = req.headers['authorization'];
    const failedSources: string[] = [];

    const fetchWithFallback = async <T>(
      url: string,
      source: string,
      timeoutMs: number = 3000,
    ): Promise<T | null> => {
      try {
        const response = await firstValueFrom(
          this.http
            .get<T>(url, { headers: { authorization: token } })
            .pipe(
              timeout(timeoutMs),
              catchError((err) => {
                failedSources.push(source);
                console.error(`${source} failed: ${err.message}`);
                return of(null);
              }),
            ),
        );
        return response?.data ?? null;
      } catch {
        failedSources.push(source);
        return null;
      }
    };

    // Fan out requests in parallel
    const [user, recentOrders, notifications] = await Promise.all([
      fetchWithFallback<UserProfile>(
        'http://user-service:3001/api/profile',
        'user-service',
        2000,
      ),
      fetchWithFallback<Order[]>(
        'http://order-service:3002/api/orders?limit=5',
        'order-service',
        3000,
      ),
      fetchWithFallback<Notification[]>(
        'http://notification-service:3003/api/unread',
        'notification-service',
        1500,
      ),
    ]);

    // If the user service fails, the response is meaningless
    if (!user) {
      throw new HttpException(
        'Core service unavailable',
        HttpStatus.SERVICE_UNAVAILABLE,
      );
    }

    return {
      user,
      recentOrders,
      notifications,
      _meta: {
        partial: failedSources.length > 0,
        failedSources,
        responseTimeMs: Date.now() - start,
      },
    };
  }
}

Partial Failure Handling

Tradeoffs

Pattern 3: Backend for Frontend (BFF)

BFF takes the aggregator idea and splits it by client type. Instead of one gateway serving all clients, you deploy separate gateways — each tailored to its consumer.

Web App     → Web BFF     → Services
Mobile App  → Mobile BFF  → Services
Partner API → Partner BFF → Services

Why Different Clients Need Different APIs

A web dashboard rendering a user profile page might need:

✓Full user profile with avatar URL, bio, social links
✓Last 20 orders with full line items and tracking URLs
✓50 most recent notifications with HTML-formatted bodies

A mobile app showing the same screen needs:

✓User name, avatar thumbnail URL (not the full-res URL)
✓Last 5 orders with status and total only
✓10 unread notification summaries (plain text, not HTML)

Serving the web payload to mobile wastes bandwidth, drains battery parsing unused fields, and increases render time. Serving the mobile payload to web gives an incomplete experience.

The Same Data Through Two BFFs

Web BFF response (~4.2 KB):

{
  "user": {
    "id": "usr_8a3f2",
    "name": "Dana Chen",
    "email": "[email protected]",
    "avatar": "https://cdn.example.com/avatars/usr_8a3f2/full.jpg",
    "bio": "Engineering lead focused on distributed systems.",
    "joinedAt": "2023-01-15T00:00:00Z",
    "socialLinks": {
      "github": "https://github.com/danachen",
      "linkedin": "https://linkedin.com/in/danachen"
    }
  },
  "recentOrders": [
    {
      "id": "ord_f91a",
      "status": "delivered",
      "total": 149.99,
      "currency": "USD",
      "items": [
        { "name": "Ergonomic Keyboard", "qty": 1, "price": 149.99 }
      ],
      "trackingUrl": "https://tracking.example.com/ord_f91a",
      "createdAt": "2025-07-28T14:30:00Z"
    }
  ],
  "notifications": [
    {
      "id": "ntf_3b2a",
      "title": "Order Delivered",
      "body": "<p>Your order <strong>ord_f91a</strong> has been delivered.</p>",
      "read": false,
      "createdAt": "2025-07-30T09:15:00Z"
    }
  ]
}

Mobile BFF response (~1.1 KB):

{
  "user": {
    "id": "usr_8a3f2",
    "name": "Dana Chen",
    "avatarThumb": "https://cdn.example.com/avatars/usr_8a3f2/thumb_64.jpg"
  },
  "recentOrders": [
    {
      "id": "ord_f91a",
      "status": "delivered",
      "total": "$149.99"
    }
  ],
  "unreadCount": 3,
  "notifications": [
    {
      "id": "ntf_3b2a",
      "title": "Order Delivered",
      "summary": "Your order ord_f91a has been delivered."
    }
  ]
}

BFF Ownership

Gateway Frameworks: Honest Comparison

Framework	Best For	Latency Overhead	Auth Built-in	Customization	Pricing Model
Kong	Plugin-heavy setups	1-3ms	Yes (plugins)	Lua/Go plugins	Open source + Enterprise
AWS API Gateway	Serverless stacks	5-15ms (cold)	Cognito/Lambda authorizers	Limited to AWS transforms	Per-request ($3.50/million)
Azure APIM	Enterprise/.NET shops	3-8ms	AAD, certificates	XML policies	Tiered (starts ~$0.80/hr)
NGINX	Raw throughput	<1ms	Module-based	C modules, Lua (OpenResty)	Open source + NGINX Plus
Custom NestJS	Full control	2-5ms	You build it	Unlimited	Your infra cost

Kong has the richest plugin ecosystem (300+), but debugging Lua plugins in production is unpleasant. Use it when off-the-shelf plugins cover 80%+ of your needs.

Azure APIM has powerful policy expressions (think XSLT for HTTP), but the XML policy language has a steep learning curve. The developer portal is genuinely useful for partner APIs.

NGINX is the performance baseline everything else is measured against. For pure reverse-proxy-with-auth, nothing beats it. But building complex aggregation logic in Lua or C modules is painful.

Authentication at the Gateway

The gateway is the natural place to verify identity before requests reach services. Three common strategies:

JWT Validation (Stateless, Fast)

The gateway validates the JWT signature and expiration without calling any external service. This adds sub-millisecond overhead.

// jwt-auth.middleware.ts
import { Injectable, NestMiddleware, UnauthorizedException } from '@nestjs/common';
import { Request, Response, NextFunction } from 'express';
import * as jose from 'jose';

@Injectable()
export class JwtAuthMiddleware implements NestMiddleware {
  private jwks: jose.JWTVerifyGetKey;

  constructor() {
    // Cache the JWKS endpoint. jose handles key rotation automatically.
    this.jwks = jose.createRemoteJWKSet(
      new URL('https://auth.example.com/.well-known/jwks.json'),
    );
  }

  async use(req: Request, res: Response, next: NextFunction) {
    const authHeader = req.headers.authorization;

    if (!authHeader?.startsWith('Bearer ')) {
      throw new UnauthorizedException('Missing bearer token');
    }

    try {
      const token = authHeader.slice(7);
      const { payload } = await jose.jwtVerify(token, this.jwks, {
        issuer: 'https://auth.example.com',
        audience: 'api.example.com',
      });

      // Attach decoded claims for downstream services
      req['user'] = {
        sub: payload.sub,
        email: payload.email,
        roles: payload.roles,
        orgId: payload.org_id,
      };

      // Propagate user context as headers for downstream services
      req.headers['x-user-id'] = payload.sub as string;
      req.headers['x-user-roles'] = (payload.roles as string[]).join(',');
      req.headers['x-org-id'] = payload.org_id as string;
    } catch (err) {
      if (err instanceof jose.errors.JWTExpired) {
        throw new UnauthorizedException('Token expired');
      }
      throw new UnauthorizedException('Invalid token');
    }

    next();
  }
}

OAuth2 Token Introspection (Accurate, Slower)

API Keys (Simple, for Partners)

Rate Limiting Strategies

Rate limiting at the gateway protects your services from traffic spikes and enforces business-tier limits.

Algorithm Comparison

Token bucket — tokens refill at a fixed rate. Allows short bursts. Best for APIs where occasional spikes are acceptable.

Fixed window — counts requests in discrete time windows (e.g., per minute). Simple to implement but allows up to 2x the limit at window boundaries (a burst at :59 and :00).

Redis-Backed Sliding Window

For distributed deployments with multiple gateway instances, rate limiting state must be shared. Redis is the standard choice:

// rate-limiter.service.ts
import { Injectable } from '@nestjs/common';
import Redis from 'ioredis';

@Injectable()
export class RateLimiterService {
  private redis: Redis;

  constructor() {
    this.redis = new Redis({
      host: process.env.REDIS_HOST,
      port: 6379,
      enableReadyCheck: true,
    });
  }

  async checkLimit(
    key: string,
    maxRequests: number,
    windowMs: number,
  ): Promise<{ allowed: boolean; remaining: number; retryAfterMs: number }> {
    const now = Date.now();
    const windowStart = now - windowMs;

    // Lua script for atomic sliding window check
    const script = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local window_start = tonumber(ARGV[2])
      local max_requests = tonumber(ARGV[3])
      local window_ms = tonumber(ARGV[4])

      -- Remove expired entries
      redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

      -- Count current requests in window
      local current = redis.call('ZCARD', key)

      if current < max_requests then
        -- Add this request
        redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
        redis.call('PEXPIRE', key, window_ms)
        return {1, max_requests - current - 1, 0}
      else
        -- Get oldest entry to calculate retry-after
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retry_after = 0
        if #oldest > 0 then
          retry_after = tonumber(oldest[2]) + window_ms - now
        end
        return {0, 0, retry_after}
      end
    `;

    const result = (await this.redis.eval(
      script, 1, key, now, windowStart, maxRequests, windowMs,
    )) as number[];

    return {
      allowed: result[0] === 1,
      remaining: result[1],
      retryAfterMs: result[2],
    };
  }
}

Tier-Based Configuration

Rate limits should reflect both the endpoint sensitivity and the client tier:

# rate-limits.yaml
tiers:
  free:
    global: 100/minute
    endpoints:
      /api/search: 20/minute
      /api/export: 5/hour
  pro:
    global: 1000/minute
    endpoints:
      /api/search: 200/minute
      /api/export: 50/hour
  enterprise:
    global: 10000/minute
    endpoints:
      /api/search: 2000/minute
      /api/export: 500/hour
  partner:
    global: 5000/minute
    endpoints:
      /api/bulk-import: 100/hour
      /api/webhooks/register: 10/minute

Request/Response Transformation

Gateways can reshape data between clients and services without changing either side.

Protocol translation — accept REST from clients, proxy as gRPC to internal services. This is increasingly common as teams adopt gRPC internally but maintain REST externally.

Header injection — add correlation IDs, propagate trace context, inject client metadata for analytics.

GraphQL Federation as an Alternative

# User subgraph
type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
}

# Order subgraph
type User @key(fields: "id") {
  id: ID!
  orders: [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
}

The federation router resolves { user { name orders { total } } } by calling the user subgraph for profile data and the order subgraph for orders, then merging them.

Circuit Breakers and Retries at the Gateway

When a downstream service degrades, the gateway should fail fast rather than pile up requests that will timeout.

Circuit Breaker States

✓Closed — requests flow normally. Failures are counted.
✓Open — requests are immediately rejected with 503. No traffic reaches the degraded service.
✓Half-open — a single probe request is allowed through. If it succeeds, the circuit closes. If it fails, it reopens.

// circuit-breaker.ts
interface CircuitBreakerConfig {
  failureThreshold: number;    // failures before opening
  resetTimeoutMs: number;      // how long to stay open
  halfOpenMaxAttempts: number;  // probes before closing
}

class CircuitBreaker {
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private failureCount = 0;
  private lastFailureTime = 0;
  private halfOpenAttempts = 0;

  constructor(
    private readonly name: string,
    private readonly config: CircuitBreakerConfig,
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailureTime > this.config.resetTimeoutMs) {
        this.state = 'half-open';
        this.halfOpenAttempts = 0;
      } else {
        throw new CircuitOpenError(
          `Circuit ${this.name} is open. Retry after ${this.config.resetTimeoutMs}ms.`,
        );
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    if (this.state === 'half-open') {
      this.halfOpenAttempts++;
      if (this.halfOpenAttempts >= this.config.halfOpenMaxAttempts) {
        this.state = 'closed';
        this.failureCount = 0;
      }
    } else {
      this.failureCount = 0;
    }
  }

  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.config.failureThreshold) {
      this.state = 'open';
    }
  }
}

Retry Budgets and Amplification

Mitigations:

✓Retry budgets — allow retries only when the overall failure rate is below a threshold (e.g., retry only if fewer than 10% of recent requests have been retries). This caps the amplification factor.
✓Retry only on specific errors — retry on 502/503 (upstream unavailable), never on 500 (application bug) or 429 (rate limited).
✓No retries on non-idempotent operations — never retry a POST that creates a resource unless the API supports idempotency keys.
✓Exponential backoff with jitter — avoid synchronized retry storms across gateway instances.

Performance Overhead

An API gateway adds a network hop. The real-world cost:

Gateway Type	Added Latency	Notes
NGINX reverse proxy	0.5-1ms	Kernel-level optimizations
Kong / Envoy	1-3ms	Plugin chain adds overhead
Custom Node.js gateway	2-5ms	Depends on middleware depth
AWS API Gateway (HTTP)	5-10ms	Managed service overhead
AWS API Gateway (REST)	10-30ms	More features, more latency

When to Bypass the Gateway

Single Point of Failure Mitigation

The gateway is on the critical path for every request. If it goes down, everything is down.

✓Deploy at least 3 gateway instances behind a load balancer
✓Use health checks with aggressive failure detection (2-second intervals, 2 failures to remove)
✓Keep gateway logic minimal — the less code, the fewer failure modes
✓Implement graceful degradation — if Redis (rate limiting) is unavailable, fall back to local in-memory limits rather than rejecting all traffic
✓Run canary deployments — route 5% of traffic to a new gateway version before rolling it out

Decision Matrix

Use this table to match your situation to a pattern:

┌──────────────────────┬──────────────┬──────────────┬──────────────┐
│ Factor               │ Direct       │ Aggregator   │ BFF          │
├──────────────────────┼──────────────┼──────────────┼──────────────┤
│ Client types         │ 1            │ 1-2          │ 3+           │
│ Backend services     │ < 5          │ 5-15         │ Any          │
│ Team size            │ < 10         │ 10-30        │ 30+          │
│ Payload differences  │ None         │ Minor        │ Significant  │
│ Auth complexity      │ Simple       │ Moderate     │ Per-client   │
│ Operational maturity │ Low          │ Medium       │ High         │
│ Time to implement    │ None         │ Weeks        │ Months       │
│ Per-client optimize  │ No           │ Limited      │ Full control │
│ Team autonomy        │ High         │ Bottleneck   │ High         │
│ Conway's alignment   │ Monolith     │ Platform     │ Client teams │
└──────────────────────┴──────────────┴──────────────┴──────────────┘

Key heuristics:

✓If you have one client and a handful of services, start with direct access. Add a lightweight reverse proxy (NGINX, Caddy) for SSL and CORS.
✓If you are composing data from multiple services for a single client type, an aggregator gateway is sufficient. Keep it thin.
✓If different clients need fundamentally different representations of the same data, BFF is the right pattern. The implementation cost is real, but the alternative — a bloated "universal" API that satisfies nobody — is worse over time.
✓Conway's Law applies. If your org has distinct web, mobile, and platform teams, BFF mirrors that structure. If you have one full-stack team, an aggregator is more natural.

Case Study: Multi-Platform SaaS with BFF Architecture

Architecture

Stripe Systems designed a three-BFF architecture with a shared infrastructure layer:

                            ┌─────────────────────────┐
                            │     Load Balancer        │
                            │   (path-based routing)   │
                            └─────┬──────┬──────┬──────┘
                                  │      │      │
                    ┌─────────────┘      │      └──────────────┐
                    │                    │                      │
              ┌─────▼─────┐       ┌─────▼──────┐       ┌──────▼──────┐
              │  Web BFF   │       │ Mobile BFF │       │ Partner BFF │
              │  /web/*    │       │ /mobile/*  │       │ /partner/*  │
              │  NestJS    │       │  NestJS    │       │  NestJS     │
              │  Port 3010 │       │  Port 3020 │       │  Port 3030  │
              └─────┬──────┘       └─────┬──────┘       └──────┬──────┘
                    │                    │                      │
              ┌─────▼────────────────────▼──────────────────────▼──────┐
              │              Shared Service Mesh (Linkerd)              │
              ├────────┬─────────┬──────────┬──────────┬───────────────┤
              │ Users  │ Billing │Analytics │  Docs    │ Notifications │
              │ Perms  │ Search  │ Workflow │ Reports  │  Messaging    │
              │        │ Audit   │ Integr.  │          │               │
              └────────┴─────────┴──────────┴──────────┴───────────────┘

The load balancer routes by URL prefix. Each BFF runs as an independent NestJS service with its own deployment pipeline, owned by its respective client team.

A shared library (@internal/gateway-core) provides common middleware — JWT validation, distributed tracing, structured logging, circuit breakers — so each BFF does not reinvent these pieces.

Routing Configuration

Each BFF defines routes relevant to its client type:

// mobile-bff/src/app.module.ts — Mobile BFF routes
const routes: Routes = [
  {
    path: '/mobile/v1',
    children: [
      { path: '/auth',       module: MobileAuthModule },
      { path: '/dashboard',  module: MobileDashboardModule },
      { path: '/documents',  module: MobileDocumentsModule },
      { path: '/search',     module: MobileSearchModule },
    ],
  },
];

// partner-bff/src/app.module.ts — Partner BFF routes
const routes: Routes = [
  {
    path: '/partner/v1',
    children: [
      { path: '/users',      module: PartnerUsersModule },
      { path: '/documents',  module: PartnerDocumentsModule },
      { path: '/webhooks',   module: PartnerWebhooksModule },
      { path: '/bulk',       module: PartnerBulkOpsModule },
    ],
  },
];

Mobile Payload Optimization

The most measurable win was payload reduction. The mobile BFF aggressively trims, reshapes, and pre-computes data.

Document list — Web BFF response (~8.6 KB for 10 documents):

{
  "documents": [
    {
      "id": "doc_29fa",
      "title": "Q3 Revenue Analysis",
      "content_preview": "Revenue for Q3 showed a 12% increase...",
      "author": {
        "id": "usr_8a3f2",
        "name": "Dana Chen",
        "email": "[email protected]",
        "avatar": "https://cdn.example.com/avatars/usr_8a3f2/full.jpg",
        "department": "Finance"
      },
      "collaborators": [
        { "id": "usr_c1b3", "name": "Sam Park", "role": "editor" },
        { "id": "usr_d4e5", "name": "Alex Rivera", "role": "viewer" }
      ],
      "tags": ["finance", "quarterly", "revenue"],
      "permissions": { "canEdit": true, "canShare": true, "canDelete": false },
      "versions": 14,
      "currentVersion": "v14",
      "wordCount": 3420,
      "lastEditedAt": "2025-07-29T16:45:00Z",
      "createdAt": "2025-07-01T09:00:00Z",
      "attachments": [
        { "name": "chart.png", "size": 245000, "url": "https://cdn.example.com/..." }
      ]
    }
  ],
  "pagination": { "page": 1, "pageSize": 20, "total": 142 },
  "facets": {
    "tags": [{ "name": "finance", "count": 23 }, { "name": "engineering", "count": 45 }],
    "authors": [{ "id": "usr_8a3f2", "name": "Dana Chen", "count": 12 }]
  }
}

Document list — Mobile BFF response (~3.4 KB for 10 documents):

{
  "documents": [
    {
      "id": "doc_29fa",
      "title": "Q3 Revenue Analysis",
      "authorName": "Dana Chen",
      "authorThumb": "https://cdn.example.com/avatars/usr_8a3f2/thumb_32.jpg",
      "tags": ["finance", "quarterly"],
      "canEdit": true,
      "updatedAgo": "2d",
      "hasAttachments": true
    }
  ],
  "page": 1,
  "hasMore": true
}

Changes in the mobile BFF:

✓Flattened author object to two fields (authorName, authorThumb)
✓Removed collaborators, permissions detail, versions, wordCount, content_preview
✓Pre-computed updatedAgo (relative time string) server-side so the mobile app skips time-zone-aware date formatting
✓Replaced attachments array with a boolean hasAttachments
✓Replaced pagination object with a simple hasMore boolean (mobile uses infinite scroll, not page numbers)
✓Dropped facets entirely (mobile search UI does not support faceted navigation)
✓Limited tags to 2 per document instead of all tags

Rate Limiting Per Tier

Each BFF enforces tier-appropriate limits. The partner BFF has the most granular controls:

┌────────────────┬────────────┬────────────┬─────────────┬────────────┐
│ Endpoint Group │ Free       │ Pro        │ Enterprise  │ Partner    │
├────────────────┼────────────┼────────────┼─────────────┼────────────┤
│ Read (list,get)│ 60/min     │ 600/min    │ 6000/min    │ 3000/min   │
│ Write (create) │ 20/min     │ 200/min    │ 2000/min    │ 1000/min   │
│ Search         │ 10/min     │ 100/min    │ 1000/min    │ 500/min    │
│ Export/Report  │ 2/hour     │ 20/hour    │ 200/hour    │ 50/hour    │
│ Bulk import    │ N/A        │ N/A        │ 50/hour     │ 100/hour   │
│ Webhook reg.   │ N/A        │ N/A        │ N/A         │ 10/min     │
│ Burst allowance│ 1.5x/10s   │ 2x/10s    │ 3x/10s     │ 2x/10s    │
└────────────────┴────────────┴────────────┴─────────────┴────────────┘

Performance Results

Measured over 30 days after the BFF migration, compared to the 30 days before:

┌─────────────────────────┬────────────┬────────────┬───────────┐
│ Metric                  │ Before     │ After      │ Change    │
├─────────────────────────┼────────────┼────────────┼───────────┤
│ Web p50 latency         │ 42ms       │ 28ms       │ -33%      │
│ Web p99 latency         │ 310ms      │ 145ms      │ -53%      │
│ Mobile p50 latency      │ 48ms       │ 19ms       │ -60%      │
│ Mobile p99 latency      │ 520ms      │ 110ms      │ -79%      │
│ Partner p50 latency     │ 45ms       │ 31ms       │ -31%      │
│ Gateway deploy freq.    │ 1/week     │ 12/week    │ +12x      │
│ Gateway incidents/month │ 4.2        │ 0.8        │ -81%      │
│ Mobile payload (avg)    │ 8.1 KB     │ 3.2 KB     │ -60%      │
│ Cache hit rate          │ 22%        │ 61%        │ +177%     │
└─────────────────────────┴────────────┴────────────┴───────────┘

Conclusion

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

Backend Development

Server-side systems designed for correctness, observability, and horizontal scalability.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

What an API Gateway Actually Does

Pattern 1: Direct-to-Microservice

When It Works

When It Breaks Down

Pattern 2: Aggregator / Composition Gateway

NestJS Aggregator Implementation

Partial Failure Handling

Tradeoffs

Pattern 3: Backend for Frontend (BFF)

Why Different Clients Need Different APIs

The Same Data Through Two BFFs

BFF Ownership

Gateway Frameworks: Honest Comparison

Authentication at the Gateway

JWT Validation (Stateless, Fast)

OAuth2 Token Introspection (Accurate, Slower)

API Keys (Simple, for Partners)

Rate Limiting Strategies

Algorithm Comparison

Redis-Backed Sliding Window

Tier-Based Configuration

Request/Response Transformation

GraphQL Federation as an Alternative

Circuit Breakers and Retries at the Gateway

Circuit Breaker States

Retry Budgets and Amplification

Performance Overhead

When to Bypass the Gateway

Single Point of Failure Mitigation

Decision Matrix

Case Study: Multi-Platform SaaS with BFF Architecture

Architecture

Routing Configuration

Mobile Payload Optimization

Rate Limiting Per Tier

Performance Results

Conclusion

Related Services from Stripe Systems

Backend Development

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change