Skip to main content
Stripe SystemsStripe Systems
Backend Development📅 January 15, 2026· 21 min read

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

✍️
Stripe Systems Engineering

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting consequences for performance, maintainability, and team velocity. This post breaks down three dominant patterns, compares gateway frameworks honestly, and walks through a real architecture we shipped for a multi-platform SaaS product.

What an API Gateway Actually Does

An API gateway sits between your clients and your backend services. Its responsibilities typically include:

  • Routing — mapping external URLs to internal service endpoints
  • Authentication and authorization — validating tokens before requests reach services
  • Rate limiting — protecting services from abuse and enforcing usage tiers
  • Request/response transformation — reshaping payloads, filtering fields, translating protocols
  • Observability — centralized logging, distributed tracing headers, metrics collection
  • SSL termination — handling TLS at the edge so internal traffic can run over plain HTTP
  • Load balancing — distributing traffic across service instances

A common confusion: gateways are not service meshes. A service mesh (Istio, Linkerd) handles service-to-service (east-west) traffic with sidecar proxies. A gateway handles client-to-service (north-south) traffic. They solve different problems and often coexist. The gateway is your public edge; the mesh manages internal communication.

Pattern 1: Direct-to-Microservice

The simplest approach — clients call services directly.

Client → Service A (users)
Client → Service B (orders)
Client → Service C (inventory)

Each service exposes its own API, handles its own authentication, and manages its own CORS configuration.

When It Works

This is perfectly fine when you have a small team (under 10 engineers), fewer than 5 services, and a single client type. The overhead of a gateway adds complexity that may not pay off early.

When It Breaks Down

As you scale, problems accumulate:

Auth duplication. Every service validates JWTs independently. When you need to change your token format or add a new auth provider, you touch every service. One team forgets to update, and you have a security gap.

CORS management. Each service needs CORS headers configured for every allowed origin. A new frontend deployment means touching N services.

Versioning sprawl. Service A is on v2, Service B is still on v1. Clients need to know which version to call for each service. Your SDK becomes a compatibility matrix.

Client coupling. Mobile clients end up knowing the internal topology of your backend. When you split a service, every client needs an update.

Observability gaps. Distributed tracing requires every service to propagate headers consistently. Without a central point, one service that drops trace context breaks your entire trace chain.

The direct pattern trades short-term simplicity for long-term operational debt. Most teams cross the threshold where a gateway becomes necessary around 6-8 services or when they add a second client type.

Pattern 2: Aggregator / Composition Gateway

The aggregator pattern places a single gateway in front of all services. This gateway handles cross-cutting concerns and — critically — can compose responses from multiple backend calls into a single client response.

Client → Gateway → Service A
                 → Service B
                 → Service C

The gateway fans out requests, merges responses, and returns a unified payload to the client.

NestJS Aggregator Implementation

Here is a practical aggregator that calls three services to build a dashboard response. This handles partial failures, timeouts, and response merging:

// dashboard.controller.ts
import { Controller, Get, Req, HttpException, HttpStatus } from '@nestjs/common';
import { HttpService } from '@nestjs/axios';
import { firstValueFrom, timeout, catchError, of } from 'rxjs';

interface DashboardResponse {
  user: UserProfile | null;
  recentOrders: Order[] | null;
  notifications: Notification[] | null;
  _meta: {
    partial: boolean;
    failedSources: string[];
    responseTimeMs: number;
  };
}

@Controller('api/dashboard')
export class DashboardController {
  constructor(private readonly http: HttpService) {}

  @Get()
  async getDashboard(@Req() req: Request): Promise<DashboardResponse> {
    const start = Date.now();
    const token = req.headers['authorization'];
    const failedSources: string[] = [];

    const fetchWithFallback = async <T>(
      url: string,
      source: string,
      timeoutMs: number = 3000,
    ): Promise<T | null> => {
      try {
        const response = await firstValueFrom(
          this.http
            .get<T>(url, { headers: { authorization: token } })
            .pipe(
              timeout(timeoutMs),
              catchError((err) => {
                failedSources.push(source);
                console.error(`${source} failed: ${err.message}`);
                return of(null);
              }),
            ),
        );
        return response?.data ?? null;
      } catch {
        failedSources.push(source);
        return null;
      }
    };

    // Fan out requests in parallel
    const [user, recentOrders, notifications] = await Promise.all([
      fetchWithFallback<UserProfile>(
        'http://user-service:3001/api/profile',
        'user-service',
        2000,
      ),
      fetchWithFallback<Order[]>(
        'http://order-service:3002/api/orders?limit=5',
        'order-service',
        3000,
      ),
      fetchWithFallback<Notification[]>(
        'http://notification-service:3003/api/unread',
        'notification-service',
        1500,
      ),
    ]);

    // If the user service fails, the response is meaningless
    if (!user) {
      throw new HttpException(
        'Core service unavailable',
        HttpStatus.SERVICE_UNAVAILABLE,
      );
    }

    return {
      user,
      recentOrders,
      notifications,
      _meta: {
        partial: failedSources.length > 0,
        failedSources,
        responseTimeMs: Date.now() - start,
      },
    };
  }
}

Partial Failure Handling

The _meta field in the response is deliberate. Clients can inspect partial: true and failedSources to show degraded UI instead of a full error page. The notification service being down should not prevent a user from seeing their dashboard.

The timeout hierarchy matters too — the user service gets 2000ms (critical path), orders get 3000ms (can be stale), and notifications get 1500ms (low-priority, fail fast). These numbers should come from your p99 latency measurements, not guesswork.

Tradeoffs

The aggregator gateway centralizes cross-cutting logic, but it also centralizes risk. A bug in the gateway takes down everything. It also becomes a coordination bottleneck — if the orders team and notifications team both need gateway changes, they compete for merge priority. This is where the BFF pattern enters.

Pattern 3: Backend for Frontend (BFF)

BFF takes the aggregator idea and splits it by client type. Instead of one gateway serving all clients, you deploy separate gateways — each tailored to its consumer.

Web App     → Web BFF     → Services
Mobile App  → Mobile BFF  → Services
Partner API → Partner BFF → Services

Why Different Clients Need Different APIs

A web dashboard rendering a user profile page might need:

  • Full user profile with avatar URL, bio, social links
  • Last 20 orders with full line items and tracking URLs
  • 50 most recent notifications with HTML-formatted bodies

A mobile app showing the same screen needs:

  • User name, avatar thumbnail URL (not the full-res URL)
  • Last 5 orders with status and total only
  • 10 unread notification summaries (plain text, not HTML)

Serving the web payload to mobile wastes bandwidth, drains battery parsing unused fields, and increases render time. Serving the mobile payload to web gives an incomplete experience.

The Same Data Through Two BFFs

Web BFF response (~4.2 KB):

{
  "user": {
    "id": "usr_8a3f2",
    "name": "Dana Chen",
    "email": "[email protected]",
    "avatar": "https://cdn.example.com/avatars/usr_8a3f2/full.jpg",
    "bio": "Engineering lead focused on distributed systems.",
    "joinedAt": "2023-01-15T00:00:00Z",
    "socialLinks": {
      "github": "https://github.com/danachen",
      "linkedin": "https://linkedin.com/in/danachen"
    }
  },
  "recentOrders": [
    {
      "id": "ord_f91a",
      "status": "delivered",
      "total": 149.99,
      "currency": "USD",
      "items": [
        { "name": "Ergonomic Keyboard", "qty": 1, "price": 149.99 }
      ],
      "trackingUrl": "https://tracking.example.com/ord_f91a",
      "createdAt": "2025-07-28T14:30:00Z"
    }
  ],
  "notifications": [
    {
      "id": "ntf_3b2a",
      "title": "Order Delivered",
      "body": "<p>Your order <strong>ord_f91a</strong> has been delivered.</p>",
      "read": false,
      "createdAt": "2025-07-30T09:15:00Z"
    }
  ]
}

Mobile BFF response (~1.1 KB):

{
  "user": {
    "id": "usr_8a3f2",
    "name": "Dana Chen",
    "avatarThumb": "https://cdn.example.com/avatars/usr_8a3f2/thumb_64.jpg"
  },
  "recentOrders": [
    {
      "id": "ord_f91a",
      "status": "delivered",
      "total": "$149.99"
    }
  ],
  "unreadCount": 3,
  "notifications": [
    {
      "id": "ntf_3b2a",
      "title": "Order Delivered",
      "summary": "Your order ord_f91a has been delivered."
    }
  ]
}

The mobile BFF drops social links, full-resolution images, HTML notification bodies, and line item details. It pre-formats the currency string (the mobile app does not need to carry locale-aware currency formatting logic). The payload is 74% smaller.

BFF Ownership

Each BFF should be owned by the team building its client. The web team owns the web BFF. The mobile team owns the mobile BFF. This aligns incentives — the people who know what the client needs control the API shaping layer. It follows Conway's Law intentionally rather than fighting it.

Gateway Frameworks: Honest Comparison

FrameworkBest ForLatency OverheadAuth Built-inCustomizationPricing Model
KongPlugin-heavy setups1-3msYes (plugins)Lua/Go pluginsOpen source + Enterprise
AWS API GatewayServerless stacks5-15ms (cold)Cognito/Lambda authorizersLimited to AWS transformsPer-request ($3.50/million)
Azure APIMEnterprise/.NET shops3-8msAAD, certificatesXML policiesTiered (starts ~$0.80/hr)
NGINXRaw throughput<1msModule-basedC modules, Lua (OpenResty)Open source + NGINX Plus
Custom NestJSFull control2-5msYou build itUnlimitedYour infra cost

Kong has the richest plugin ecosystem (300+), but debugging Lua plugins in production is unpleasant. Use it when off-the-shelf plugins cover 80%+ of your needs.

AWS API Gateway is the right choice if you are already on Lambda. The per-request pricing looks cheap until you hit scale — 100M requests/month costs $350 before you count data transfer. Cold start latency on HTTP APIs is lower than REST APIs, but still noticeable.

Azure APIM has powerful policy expressions (think XSLT for HTTP), but the XML policy language has a steep learning curve. The developer portal is genuinely useful for partner APIs.

NGINX is the performance baseline everything else is measured against. For pure reverse-proxy-with-auth, nothing beats it. But building complex aggregation logic in Lua or C modules is painful.

Custom NestJS (or Express, Fastify, Go) gives you the most flexibility. You write TypeScript, test it like any other service, and deploy it like any other service. The cost is that you own every feature — rate limiting, circuit breakers, caching. For BFF gateways, this is often the right choice because the gateway logic is inherently application-specific.

Authentication at the Gateway

The gateway is the natural place to verify identity before requests reach services. Three common strategies:

JWT Validation (Stateless, Fast)

The gateway validates the JWT signature and expiration without calling any external service. This adds sub-millisecond overhead.

// jwt-auth.middleware.ts
import { Injectable, NestMiddleware, UnauthorizedException } from '@nestjs/common';
import { Request, Response, NextFunction } from 'express';
import * as jose from 'jose';

@Injectable()
export class JwtAuthMiddleware implements NestMiddleware {
  private jwks: jose.JWTVerifyGetKey;

  constructor() {
    // Cache the JWKS endpoint. jose handles key rotation automatically.
    this.jwks = jose.createRemoteJWKSet(
      new URL('https://auth.example.com/.well-known/jwks.json'),
    );
  }

  async use(req: Request, res: Response, next: NextFunction) {
    const authHeader = req.headers.authorization;

    if (!authHeader?.startsWith('Bearer ')) {
      throw new UnauthorizedException('Missing bearer token');
    }

    try {
      const token = authHeader.slice(7);
      const { payload } = await jose.jwtVerify(token, this.jwks, {
        issuer: 'https://auth.example.com',
        audience: 'api.example.com',
      });

      // Attach decoded claims for downstream services
      req['user'] = {
        sub: payload.sub,
        email: payload.email,
        roles: payload.roles,
        orgId: payload.org_id,
      };

      // Propagate user context as headers for downstream services
      req.headers['x-user-id'] = payload.sub as string;
      req.headers['x-user-roles'] = (payload.roles as string[]).join(',');
      req.headers['x-org-id'] = payload.org_id as string;
    } catch (err) {
      if (err instanceof jose.errors.JWTExpired) {
        throw new UnauthorizedException('Token expired');
      }
      throw new UnauthorizedException('Invalid token');
    }

    next();
  }
}

Token propagation is the key detail. The gateway validates the JWT, extracts claims, and forwards them as trusted internal headers (x-user-id, x-user-roles). Downstream services trust these headers because internal traffic is network-restricted. They never need to re-validate the JWT.

OAuth2 Token Introspection (Accurate, Slower)

For tokens that can be revoked (opaque tokens, reference tokens), the gateway calls the authorization server's introspection endpoint. This adds 5-20ms per request but guarantees the token has not been revoked. Use this for high-security operations (financial transactions, admin actions). Cache introspection results for 30-60 seconds to reduce latency for repeated requests.

API Keys (Simple, for Partners)

API keys are not authentication in the identity sense — they identify an integration. Use them for partner APIs where the "user" is an organization, not a person. Store hashed keys in a fast lookup store (Redis or DynamoDB), and map them to rate-limit tiers and permission scopes.

Rate Limiting Strategies

Rate limiting at the gateway protects your services from traffic spikes and enforces business-tier limits.

Algorithm Comparison

Token bucket — tokens refill at a fixed rate. Allows short bursts. Best for APIs where occasional spikes are acceptable.

Sliding window — counts requests in a rolling time window. More accurate than fixed window, avoids the boundary burst problem. Higher memory cost (stores individual timestamps or uses multiple counters).

Fixed window — counts requests in discrete time windows (e.g., per minute). Simple to implement but allows up to 2x the limit at window boundaries (a burst at :59 and :00).

Redis-Backed Sliding Window

For distributed deployments with multiple gateway instances, rate limiting state must be shared. Redis is the standard choice:

// rate-limiter.service.ts
import { Injectable } from '@nestjs/common';
import Redis from 'ioredis';

@Injectable()
export class RateLimiterService {
  private redis: Redis;

  constructor() {
    this.redis = new Redis({
      host: process.env.REDIS_HOST,
      port: 6379,
      enableReadyCheck: true,
    });
  }

  async checkLimit(
    key: string,
    maxRequests: number,
    windowMs: number,
  ): Promise<{ allowed: boolean; remaining: number; retryAfterMs: number }> {
    const now = Date.now();
    const windowStart = now - windowMs;

    // Lua script for atomic sliding window check
    const script = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local window_start = tonumber(ARGV[2])
      local max_requests = tonumber(ARGV[3])
      local window_ms = tonumber(ARGV[4])

      -- Remove expired entries
      redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

      -- Count current requests in window
      local current = redis.call('ZCARD', key)

      if current < max_requests then
        -- Add this request
        redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
        redis.call('PEXPIRE', key, window_ms)
        return {1, max_requests - current - 1, 0}
      else
        -- Get oldest entry to calculate retry-after
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retry_after = 0
        if #oldest > 0 then
          retry_after = tonumber(oldest[2]) + window_ms - now
        end
        return {0, 0, retry_after}
      end
    `;

    const result = (await this.redis.eval(
      script, 1, key, now, windowStart, maxRequests, windowMs,
    )) as number[];

    return {
      allowed: result[0] === 1,
      remaining: result[1],
      retryAfterMs: result[2],
    };
  }
}

The Lua script runs atomically on Redis — no race conditions between gateway instances. The sorted set stores each request with its timestamp as the score, making window sliding a matter of removing entries below the threshold.

Tier-Based Configuration

Rate limits should reflect both the endpoint sensitivity and the client tier:

# rate-limits.yaml
tiers:
  free:
    global: 100/minute
    endpoints:
      /api/search: 20/minute
      /api/export: 5/hour
  pro:
    global: 1000/minute
    endpoints:
      /api/search: 200/minute
      /api/export: 50/hour
  enterprise:
    global: 10000/minute
    endpoints:
      /api/search: 2000/minute
      /api/export: 500/hour
  partner:
    global: 5000/minute
    endpoints:
      /api/bulk-import: 100/hour
      /api/webhooks/register: 10/minute

Request/Response Transformation

Gateways can reshape data between clients and services without changing either side.

Field filtering — strip internal fields (_internalScore, debugInfo) before returning to clients. Add fields for specific clients (mobile gets formattedPrice, web gets raw price + currency).

Protocol translation — accept REST from clients, proxy as gRPC to internal services. This is increasingly common as teams adopt gRPC internally but maintain REST externally.

Header injection — add correlation IDs, propagate trace context, inject client metadata for analytics.

GraphQL Federation as an Alternative

GraphQL federation (Apollo Federation, or the newer WunderGraph Cosmo) acts as a declarative gateway. Each service exposes a GraphQL subgraph; the federation router composes them into a single schema.

# User subgraph
type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
}

# Order subgraph
type User @key(fields: "id") {
  id: ID!
  orders: [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
}

The federation router resolves { user { name orders { total } } } by calling the user subgraph for profile data and the order subgraph for orders, then merging them.

When GraphQL replaces BFF: If your clients are all web/JS-based and your teams are comfortable with GraphQL, federation can eliminate the need for BFF entirely. Clients request exactly the fields they need — mobile queries request fewer fields, web queries request more. The downside is that GraphQL adds its own complexity (N+1 query problems, query cost analysis, caching challenges). Do not adopt GraphQL just to avoid building a BFF. Adopt it if your data graph is genuinely relational and your clients have diverse data needs.

Circuit Breakers and Retries at the Gateway

When a downstream service degrades, the gateway should fail fast rather than pile up requests that will timeout.

Circuit Breaker States

  • Closed — requests flow normally. Failures are counted.
  • Open — requests are immediately rejected with 503. No traffic reaches the degraded service.
  • Half-open — a single probe request is allowed through. If it succeeds, the circuit closes. If it fails, it reopens.
// circuit-breaker.ts
interface CircuitBreakerConfig {
  failureThreshold: number;    // failures before opening
  resetTimeoutMs: number;      // how long to stay open
  halfOpenMaxAttempts: number;  // probes before closing
}

class CircuitBreaker {
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private failureCount = 0;
  private lastFailureTime = 0;
  private halfOpenAttempts = 0;

  constructor(
    private readonly name: string,
    private readonly config: CircuitBreakerConfig,
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailureTime > this.config.resetTimeoutMs) {
        this.state = 'half-open';
        this.halfOpenAttempts = 0;
      } else {
        throw new CircuitOpenError(
          `Circuit ${this.name} is open. Retry after ${this.config.resetTimeoutMs}ms.`,
        );
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    if (this.state === 'half-open') {
      this.halfOpenAttempts++;
      if (this.halfOpenAttempts >= this.config.halfOpenMaxAttempts) {
        this.state = 'closed';
        this.failureCount = 0;
      }
    } else {
      this.failureCount = 0;
    }
  }

  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.config.failureThreshold) {
      this.state = 'open';
    }
  }
}

Retry Budgets and Amplification

Retries at the gateway are dangerous. If the user service has 3 replicas and each is slightly slow, a gateway that retries 3 times can turn 1 client request into 9 backend requests. This is retry amplification — the thing meant to improve reliability makes an outage worse.

Mitigations:

  • Retry budgets — allow retries only when the overall failure rate is below a threshold (e.g., retry only if fewer than 10% of recent requests have been retries). This caps the amplification factor.
  • Retry only on specific errors — retry on 502/503 (upstream unavailable), never on 500 (application bug) or 429 (rate limited).
  • No retries on non-idempotent operations — never retry a POST that creates a resource unless the API supports idempotency keys.
  • Exponential backoff with jitter — avoid synchronized retry storms across gateway instances.

Performance Overhead

An API gateway adds a network hop. The real-world cost:

Gateway TypeAdded LatencyNotes
NGINX reverse proxy0.5-1msKernel-level optimizations
Kong / Envoy1-3msPlugin chain adds overhead
Custom Node.js gateway2-5msDepends on middleware depth
AWS API Gateway (HTTP)5-10msManaged service overhead
AWS API Gateway (REST)10-30msMore features, more latency

When to Bypass the Gateway

Service-to-service (east-west) calls should not go through the API gateway. If the order service needs to call the user service, it calls it directly (or through a service mesh). Routing internal traffic through the gateway adds unnecessary latency, creates a bottleneck, and complicates routing rules.

Single Point of Failure Mitigation

The gateway is on the critical path for every request. If it goes down, everything is down.

  • Deploy at least 3 gateway instances behind a load balancer
  • Use health checks with aggressive failure detection (2-second intervals, 2 failures to remove)
  • Keep gateway logic minimal — the less code, the fewer failure modes
  • Implement graceful degradation — if Redis (rate limiting) is unavailable, fall back to local in-memory limits rather than rejecting all traffic
  • Run canary deployments — route 5% of traffic to a new gateway version before rolling it out

Decision Matrix

Use this table to match your situation to a pattern:

┌──────────────────────┬──────────────┬──────────────┬──────────────┐
│ Factor               │ Direct       │ Aggregator   │ BFF          │
├──────────────────────┼──────────────┼──────────────┼──────────────┤
│ Client types         │ 1            │ 1-2          │ 3+           │
│ Backend services     │ < 5          │ 5-15         │ Any          │
│ Team size            │ < 10         │ 10-30        │ 30+          │
│ Payload differences  │ None         │ Minor        │ Significant  │
│ Auth complexity      │ Simple       │ Moderate     │ Per-client   │
│ Operational maturity │ Low          │ Medium       │ High         │
│ Time to implement    │ None         │ Weeks        │ Months       │
│ Per-client optimize  │ No           │ Limited      │ Full control │
│ Team autonomy        │ High         │ Bottleneck   │ High         │
│ Conway's alignment   │ Monolith     │ Platform     │ Client teams │
└──────────────────────┴──────────────┴──────────────┴──────────────┘

Key heuristics:

  • If you have one client and a handful of services, start with direct access. Add a lightweight reverse proxy (NGINX, Caddy) for SSL and CORS.
  • If you are composing data from multiple services for a single client type, an aggregator gateway is sufficient. Keep it thin.
  • If different clients need fundamentally different representations of the same data, BFF is the right pattern. The implementation cost is real, but the alternative — a bloated "universal" API that satisfies nobody — is worse over time.
  • Conway's Law applies. If your org has distinct web, mobile, and platform teams, BFF mirrors that structure. If you have one full-stack team, an aggregator is more natural.

Case Study: Multi-Platform SaaS with BFF Architecture

A B2B SaaS company approached Stripe Systems to redesign their API layer. They had a web application, an iOS/Android mobile app, a partner REST API, and webhook consumers — all hitting a single monolithic API gateway. The backend consisted of 12 microservices covering user management, billing, analytics, document storage, notifications, audit logging, search, permissions, workflow engine, integrations, reporting, and real-time messaging.

The monolithic gateway had become a bottleneck. Mobile team changes to response payloads required coordinating with the web team. Partner API rate limiting was coupled with internal rate limiting. Average gateway latency had crept to 45ms because of accumulated middleware for every client type. Deployment frequency for the gateway had dropped to once per week because every deployment risked all client types.

Architecture

Stripe Systems designed a three-BFF architecture with a shared infrastructure layer:

                            ┌─────────────────────────┐
                            │     Load Balancer        │
                            │   (path-based routing)   │
                            └─────┬──────┬──────┬──────┘
                                  │      │      │
                    ┌─────────────┘      │      └──────────────┐
                    │                    │                      │
              ┌─────▼─────┐       ┌─────▼──────┐       ┌──────▼──────┐
              │  Web BFF   │       │ Mobile BFF │       │ Partner BFF │
              │  /web/*    │       │ /mobile/*  │       │ /partner/*  │
              │  NestJS    │       │  NestJS    │       │  NestJS     │
              │  Port 3010 │       │  Port 3020 │       │  Port 3030  │
              └─────┬──────┘       └─────┬──────┘       └──────┬──────┘
                    │                    │                      │
              ┌─────▼────────────────────▼──────────────────────▼──────┐
              │              Shared Service Mesh (Linkerd)              │
              ├────────┬─────────┬──────────┬──────────┬───────────────┤
              │ Users  │ Billing │Analytics │  Docs    │ Notifications │
              │ Perms  │ Search  │ Workflow │ Reports  │  Messaging    │
              │        │ Audit   │ Integr.  │          │               │
              └────────┴─────────┴──────────┴──────────┴───────────────┘

The load balancer routes by URL prefix. Each BFF runs as an independent NestJS service with its own deployment pipeline, owned by its respective client team.

A shared library (@internal/gateway-core) provides common middleware — JWT validation, distributed tracing, structured logging, circuit breakers — so each BFF does not reinvent these pieces.

Routing Configuration

Each BFF defines routes relevant to its client type:

// mobile-bff/src/app.module.ts — Mobile BFF routes
const routes: Routes = [
  {
    path: '/mobile/v1',
    children: [
      { path: '/auth',       module: MobileAuthModule },
      { path: '/dashboard',  module: MobileDashboardModule },
      { path: '/documents',  module: MobileDocumentsModule },
      { path: '/search',     module: MobileSearchModule },
    ],
  },
];

// partner-bff/src/app.module.ts — Partner BFF routes
const routes: Routes = [
  {
    path: '/partner/v1',
    children: [
      { path: '/users',      module: PartnerUsersModule },
      { path: '/documents',  module: PartnerDocumentsModule },
      { path: '/webhooks',   module: PartnerWebhooksModule },
      { path: '/bulk',       module: PartnerBulkOpsModule },
    ],
  },
];

The mobile BFF has a /dashboard route (aggregated view) that does not exist in the partner BFF. The partner BFF has /bulk and /webhooks routes that make no sense for mobile. Each BFF exposes only what its client needs.

Mobile Payload Optimization

The most measurable win was payload reduction. The mobile BFF aggressively trims, reshapes, and pre-computes data.

Document list — Web BFF response (~8.6 KB for 10 documents):

{
  "documents": [
    {
      "id": "doc_29fa",
      "title": "Q3 Revenue Analysis",
      "content_preview": "Revenue for Q3 showed a 12% increase...",
      "author": {
        "id": "usr_8a3f2",
        "name": "Dana Chen",
        "email": "[email protected]",
        "avatar": "https://cdn.example.com/avatars/usr_8a3f2/full.jpg",
        "department": "Finance"
      },
      "collaborators": [
        { "id": "usr_c1b3", "name": "Sam Park", "role": "editor" },
        { "id": "usr_d4e5", "name": "Alex Rivera", "role": "viewer" }
      ],
      "tags": ["finance", "quarterly", "revenue"],
      "permissions": { "canEdit": true, "canShare": true, "canDelete": false },
      "versions": 14,
      "currentVersion": "v14",
      "wordCount": 3420,
      "lastEditedAt": "2025-07-29T16:45:00Z",
      "createdAt": "2025-07-01T09:00:00Z",
      "attachments": [
        { "name": "chart.png", "size": 245000, "url": "https://cdn.example.com/..." }
      ]
    }
  ],
  "pagination": { "page": 1, "pageSize": 20, "total": 142 },
  "facets": {
    "tags": [{ "name": "finance", "count": 23 }, { "name": "engineering", "count": 45 }],
    "authors": [{ "id": "usr_8a3f2", "name": "Dana Chen", "count": 12 }]
  }
}

Document list — Mobile BFF response (~3.4 KB for 10 documents):

{
  "documents": [
    {
      "id": "doc_29fa",
      "title": "Q3 Revenue Analysis",
      "authorName": "Dana Chen",
      "authorThumb": "https://cdn.example.com/avatars/usr_8a3f2/thumb_32.jpg",
      "tags": ["finance", "quarterly"],
      "canEdit": true,
      "updatedAgo": "2d",
      "hasAttachments": true
    }
  ],
  "page": 1,
  "hasMore": true
}

Changes in the mobile BFF:

  • Flattened author object to two fields (authorName, authorThumb)
  • Removed collaborators, permissions detail, versions, wordCount, content_preview
  • Pre-computed updatedAgo (relative time string) server-side so the mobile app skips time-zone-aware date formatting
  • Replaced attachments array with a boolean hasAttachments
  • Replaced pagination object with a simple hasMore boolean (mobile uses infinite scroll, not page numbers)
  • Dropped facets entirely (mobile search UI does not support faceted navigation)
  • Limited tags to 2 per document instead of all tags

Result: 60% payload reduction on the document list endpoint. Across all mobile endpoints, the average reduction was 55%, which translated to measurable improvements in app performance on slower cellular connections.

Rate Limiting Per Tier

Each BFF enforces tier-appropriate limits. The partner BFF has the most granular controls:

┌────────────────┬────────────┬────────────┬─────────────┬────────────┐
│ Endpoint Group │ Free       │ Pro        │ Enterprise  │ Partner    │
├────────────────┼────────────┼────────────┼─────────────┼────────────┤
│ Read (list,get)│ 60/min     │ 600/min    │ 6000/min    │ 3000/min   │
│ Write (create) │ 20/min     │ 200/min    │ 2000/min    │ 1000/min   │
│ Search         │ 10/min     │ 100/min    │ 1000/min    │ 500/min    │
│ Export/Report  │ 2/hour     │ 20/hour    │ 200/hour    │ 50/hour    │
│ Bulk import    │ N/A        │ N/A        │ 50/hour     │ 100/hour   │
│ Webhook reg.   │ N/A        │ N/A        │ N/A         │ 10/min     │
│ Burst allowance│ 1.5x/10s   │ 2x/10s    │ 3x/10s     │ 2x/10s    │
└────────────────┴────────────┴────────────┴─────────────┴────────────┘

Partner clients get higher bulk import limits but lower read limits than enterprise users because their access patterns are batch-oriented rather than interactive. The burst allowance uses token bucket on top of the sliding window — enterprise clients can briefly exceed their steady-state limit by 3x for 10-second windows, accommodating dashboard page loads that trigger parallel API calls.

Performance Results

Measured over 30 days after the BFF migration, compared to the 30 days before:

┌─────────────────────────┬────────────┬────────────┬───────────┐
│ Metric                  │ Before     │ After      │ Change    │
├─────────────────────────┼────────────┼────────────┼───────────┤
│ Web p50 latency         │ 42ms       │ 28ms       │ -33%      │
│ Web p99 latency         │ 310ms      │ 145ms      │ -53%      │
│ Mobile p50 latency      │ 48ms       │ 19ms       │ -60%      │
│ Mobile p99 latency      │ 520ms      │ 110ms      │ -79%      │
│ Partner p50 latency     │ 45ms       │ 31ms       │ -31%      │
│ Gateway deploy freq.    │ 1/week     │ 12/week    │ +12x      │
│ Gateway incidents/month │ 4.2        │ 0.8        │ -81%      │
│ Mobile payload (avg)    │ 8.1 KB     │ 3.2 KB     │ -60%      │
│ Cache hit rate          │ 22%        │ 61%        │ +177%     │
└─────────────────────────┴────────────┴────────────┴───────────┘

The mobile latency improvements were the most dramatic. The old gateway was serializing fields the mobile app never used, running middleware (HTML sanitization, facet computation) that only the web client needed, and returning payloads that the mobile app parsed and discarded 60% of. The mobile BFF eliminated all of that. The p99 improvement from 520ms to 110ms came from two factors: smaller payloads (fewer bytes over the wire) and removing unnecessary middleware from the mobile request path.

The cache hit rate jumped because each BFF caches at its own granularity. The web BFF caches full document responses for 30 seconds. The mobile BFF caches trimmed responses for 2 minutes (mobile users tolerate slightly stale data). The old unified gateway could only cache at the lowest common denominator.

Deploy frequency improved because each BFF is independently deployable. The mobile team ships without coordinating with the web team. A bad deployment to the partner BFF does not affect mobile or web users.

Conclusion

API gateway pattern selection is a structural decision that shapes how your teams work, how your services evolve, and how your clients perform. Start with the simplest pattern that works (often direct access or a thin reverse proxy), and move to aggregator or BFF patterns when the problems they solve become real in your system — not before. The BFF pattern carries real implementation and operational cost, but for multi-client platforms at scale, the payload optimization, team autonomy, and fault isolation it provides are difficult to achieve any other way.

Ready to discuss your project?

Get in Touch →
← Back to Blog

More Articles