DevSecOps📅 January 21, 2026· 15 min read

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

✍️

Stripe Systems Engineering

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public internet. For APIs, this translates to verifying identity, validating authorization, enforcing rate limits, and inspecting payloads on every request — regardless of whether the caller is an external client, an internal microservice, or a batch job running in the same cluster.

This post covers the specific technologies and configurations required to implement zero-trust API security on Kubernetes: Istio service mesh for automatic mTLS, JWT validation at the ingress and mesh level, rate limiting with both local and global strategies, and OPA for fine-grained authorization decisions.

mTLS: Mutual Authentication Between Services

Standard TLS (what your browser uses for HTTPS) authenticates the server to the client: you verify that api.example.com is who it claims to be. Mutual TLS (mTLS) adds the reverse: the server also authenticates the client. Both parties present certificates, and both verify the other's identity.

In a microservice architecture, mTLS between services means:

✓Service A proves its identity to Service B when making a request
✓Service B proves its identity to Service A in the response
✓The communication channel is encrypted
✓No service can impersonate another without possessing its private key

Istio Service Mesh for Automatic mTLS

Istio injects a sidecar proxy (Envoy) into every pod. These proxies handle mTLS automatically — application code doesn't need to manage certificates or TLS configuration.

Install Istio with strict mTLS by default:

istioctl install --set profile=default \
  --set meshConfig.defaultConfig.holdApplicationUntilProxyStarts=true

# Enable sidecar injection for your namespace
kubectl label namespace production istio-injection=enabled

Enforce strict mTLS across the mesh:

# peer-authentication.yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system  # Mesh-wide policy
spec:
  mtls:
    mode: STRICT

With STRICT mode, any connection attempt without a valid mTLS certificate is rejected. This applies to all service-to-service communication within the mesh.

Per-service exceptions (when necessary):

Some services need to accept non-mTLS traffic — for example, a health check endpoint called by a load balancer that doesn't participate in the mesh:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: health-check-exception
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  mtls:
    mode: STRICT
  portLevelMtls:
    8080:
      mode: PERMISSIVE  # Allow non-mTLS on health check port only

Certificate Management with cert-manager

Istio manages its own certificate authority (Citadel) for mesh-internal mTLS. For external-facing TLS certificates, cert-manager automates certificate issuance and renewal:

# cert-manager ClusterIssuer with Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            class: istio

---
# Certificate for API domain
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: istio-system
spec:
  secretName: api-tls-cert
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - api.example.com
    - "*.api.example.com"
  renewBefore: 720h  # Renew 30 days before expiry

SPIFFE/SPIRE for Workload Identity

For environments requiring stronger workload identity than Istio's built-in CA, SPIFFE (Secure Production Identity Framework for Everyone) provides a standardized identity framework. SPIRE is the reference implementation.

Each workload receives a SPIFFE ID (e.g., spiffe://example.com/ns/production/sa/payment-service) and a short-lived X.509 certificate. Unlike Istio's Citadel, SPIRE supports multi-cluster and hybrid environments, and integrates with external identity providers.

# SPIRE ClusterSPIFFEID for a service
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: payment-service
spec:
  spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
  podSelector:
    matchLabels:
      app: payment-service
  namespaceSelector:
    matchLabels:
      environment: production

JWT Validation at the Mesh Level

JSON Web Tokens (JWTs) carry identity and authorization claims. Validating JWTs at the Istio ingress gateway or sidecar proxy means your application code doesn't need to implement JWT verification — the mesh handles it before the request reaches your service.

Istio RequestAuthentication

RequestAuthentication tells Istio how to validate incoming JWTs:

# request-authentication.yaml
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  jwtRules:
    - issuer: "https://auth.example.com/"
      jwksUri: "https://auth.example.com/.well-known/jwks.json"
      audiences:
        - "api.example.com"
      forwardOriginalToken: true
      outputPayloadToHeader: "x-jwt-payload"
    - issuer: "https://accounts.google.com"
      jwksUri: "https://www.googleapis.com/oauth2/v3/certs"
      audiences:
        - "api.example.com"

This configuration:

✓Validates tokens from two issuers (your auth service and Google)
✓Verifies the aud (audience) claim matches your API
✓Forwards the original token to the upstream service
✓Extracts the JWT payload into a header for downstream use

Important: RequestAuthentication only validates tokens that are present. It does not reject requests without tokens. To require authentication, pair it with an AuthorizationPolicy.

Istio AuthorizationPolicy

AuthorizationPolicy controls which requests are allowed based on JWT claims, source identity, or request attributes:

# authorization-policy.yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  action: DENY
  rules:
    - from:
        - source:
            notRequestPrincipals: ["*"]
      to:
        - operation:
            notPaths: ["/health", "/ready", "/metrics"]

---
# Role-based access
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: admin-endpoints
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  action: ALLOW
  rules:
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            paths: ["/admin/*"]
      when:
        - key: request.auth.claims[role]
          values: ["admin"]

The first policy denies any request without a valid JWT (except health/ready/metrics endpoints). The second policy restricts /admin/* endpoints to tokens containing "role": "admin".

Token Architecture

A robust token architecture separates short-lived access tokens from longer-lived refresh tokens:

Token Type	Lifetime	Contains	Storage
Access token	15–60 minutes	User ID, roles, permissions, tenant ID	Authorization header (Bearer)
Refresh token	7–30 days	User ID, token family ID	HTTP-only secure cookie or secure storage
API key	Long-lived (rotate quarterly)	Client ID, tier, rate limit config	Authorization header

Token scoping: access tokens should contain the minimum claims needed for authorization. A token for the billing API doesn't need permissions for the user management API. Scoped tokens limit the blast radius of a compromised token.

Audience validation: every token should specify its intended audience (aud claim), and every API should verify it. A token issued for billing.api.example.com should be rejected by users.api.example.com.

Rate Limiting

Rate limiting prevents abuse, ensures fair resource allocation, and protects backend services from traffic spikes. In a Kubernetes-native stack, you have two options:

Local Rate Limiting (Envoy)

Local rate limiting runs in each Envoy sidecar independently. It's simple to configure and doesn't require external dependencies, but each pod maintains its own counter — 5 replicas with a 100 req/min limit allows 500 req/min total.

# envoy-filter-local-ratelimit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: local-ratelimit
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            "@type": type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: http_local_rate_limiter
              token_bucket:
                max_tokens: 100
                tokens_per_fill: 100
                fill_interval: 60s
              filter_enabled:
                runtime_key: local_rate_limit_enabled
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              filter_enforced:
                runtime_key: local_rate_limit_enforced
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              response_headers_to_add:
                - append_action: OVERWRITE_IF_EXISTS_OR_ADD
                  header:
                    key: x-local-rate-limit
                    value: "true"

Global Rate Limiting (External Service)

Global rate limiting uses a centralized service (typically Redis-backed) that all instances share. This provides accurate, cluster-wide rate limits.

Rate limit service deployment:

# ratelimit-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ratelimit
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ratelimit
  template:
    metadata:
      labels:
        app: ratelimit
    spec:
      containers:
        - name: ratelimit
          image: envoyproxy/ratelimit:master
          ports:
            - containerPort: 8081  # gRPC
          env:
            - name: REDIS_SOCKET_TYPE
              value: "tcp"
            - name: REDIS_URL
              value: "redis.production.svc.cluster.local:6379"
            - name: RUNTIME_ROOT
              value: "/data"
            - name: RUNTIME_SUBDIRECTORY
              value: "ratelimit"
            - name: USE_STATSD
              value: "false"
          volumeMounts:
            - name: config
              mountPath: /data/ratelimit/config
      volumes:
        - name: config
          configMap:
            name: ratelimit-config

Rate limit configuration per API tier:

# ratelimit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ratelimit-config
  namespace: production
data:
  config.yaml: |
    domain: api-gateway
    descriptors:
      # Per-customer rate limits based on API tier
      - key: api_tier
        value: free
        rate_limit:
          unit: minute
          requests_per_unit: 60
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 10

      - key: api_tier
        value: starter
        rate_limit:
          unit: minute
          requests_per_unit: 600
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 100

      - key: api_tier
        value: enterprise
        rate_limit:
          unit: minute
          requests_per_unit: 6000
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 1000

      # Global safety limit
      - key: generic_key
        value: default
        rate_limit:
          unit: second
          requests_per_unit: 5000

Istio EnvoyFilter to connect to the rate limit service:

# envoy-filter-global-ratelimit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: global-ratelimit
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.ratelimit
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
            domain: api-gateway
            failure_mode_deny: false
            rate_limit_service:
              grpc_service:
                envoy_grpc:
                  cluster_name: rate_limit_cluster
              transport_api_version: V3
            request_type: external

Setting failure_mode_deny: false means that if the rate limit service is unavailable, requests are allowed through. This is a deliberate choice — a rate limiter outage shouldn't cause a complete API outage. Monitor the rate limit service availability separately.

Request-Level Authorization with OPA

JWT claims handle identity and coarse-grained roles. For fine-grained authorization — "can this user access this specific resource?" — OPA provides a policy engine that evaluates complex rules without embedding authorization logic in application code.

OPA sidecar deployment pattern:

# deployment-with-opa.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  template:
    spec:
      containers:
        - name: api-service
          image: api-service:v1.2.3
          ports:
            - containerPort: 8080

        - name: opa
          image: openpolicyagent/opa:latest-envoy
          ports:
            - containerPort: 9191  # Decision API
            - containerPort: 8181  # Management API
          args:
            - "run"
            - "--server"
            - "--addr=0.0.0.0:8181"
            - "--diagnostic-addr=0.0.0.0:8282"
            - "--set=plugins.envoy_ext_authz_grpc.addr=:9191"
            - "--set=plugins.envoy_ext_authz_grpc.path=envoy/authz/allow"
            - "--set=decision_logs.console=true"
            - "/policies"
          volumeMounts:
            - name: opa-policies
              mountPath: /policies
      volumes:
        - name: opa-policies
          configMap:
            name: opa-policies

OPA policy for tenant isolation:

# policies/tenant_isolation.rego
package envoy.authz

import rego.v1

default allow := false

allow if {
    is_valid_token
    is_authorized_for_resource
}

is_valid_token if {
    token := input.attributes.request.http.headers.authorization
    startswith(token, "Bearer ")
    jwt := substring(token, 7, -1)
    [header, payload, _] := io.jwt.decode(jwt)
    payload.exp > time.now_ns() / 1e9
}

# Extract tenant ID from JWT
tenant_id := tid if {
    token := input.attributes.request.http.headers.authorization
    jwt := substring(token, 7, -1)
    [_, payload, _] := io.jwt.decode(jwt)
    tid := payload.tenant_id
}

# Extract tenant ID from request path (e.g., /api/v1/tenants/{tenant_id}/resources)
path_tenant_id := ptid if {
    path := input.attributes.request.http.path
    parts := split(path, "/")
    parts[3] == "tenants"
    ptid := parts[4]
}

# Tenant can only access their own resources
is_authorized_for_resource if {
    path_tenant_id
    tenant_id == path_tenant_id
}

# Non-tenant-scoped paths are allowed for any authenticated user
is_authorized_for_resource if {
    not path_tenant_id
}

# Admin override — admins can access any tenant's resources
is_authorized_for_resource if {
    token := input.attributes.request.http.headers.authorization
    jwt := substring(token, 7, -1)
    [_, payload, _] := io.jwt.decode(jwt)
    "admin" in payload.roles
}

This policy enforces strict tenant isolation: a request to /api/v1/tenants/tenant-123/resources is only allowed if the JWT's tenant_id claim is tenant-123 — or if the caller has the admin role.

API Threat Protection

Beyond authentication and authorization, APIs need protection against malformed and malicious payloads.

Payload Validation with JSON Schema

Validate request bodies against a JSON schema before they reach your application:

# Istio EnvoyFilter for request body validation
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: payload-limits
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.buffer
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.buffer.v3.Buffer
            max_request_bytes: 1048576  # 1MB max request size

For JSON schema validation, implement it at the API gateway level (e.g., Kong, Ambassador) or as middleware in your application:

// Express middleware for JSON schema validation
const Ajv = require('ajv');
const ajv = new Ajv({ allErrors: true, removeAdditional: true });

const createUserSchema = {
  type: 'object',
  required: ['email', 'name'],
  properties: {
    email: { type: 'string', format: 'email', maxLength: 254 },
    name: { type: 'string', minLength: 1, maxLength: 100 },
    role: { type: 'string', enum: ['viewer', 'editor', 'admin'] }
  },
  additionalProperties: false
};

function validateBody(schema) {
  const validate = ajv.compile(schema);
  return (req, res, next) => {
    if (!validate(req.body)) {
      return res.status(400).json({
        error: 'Validation failed',
        details: validate.errors.map(e => ({
          field: e.instancePath,
          message: e.message
        }))
      });
    }
    next();
  };
}

app.post('/api/v1/users', validateBody(createUserSchema), createUser);

Observability: Monitoring the Security Stack

A zero-trust stack generates a large volume of security-relevant telemetry. Structured monitoring is essential for detecting anomalies and debugging legitimate access issues.

Key Metrics to Track

# Prometheus rules for API security monitoring
groups:
  - name: api-security
    rules:
      - alert: HighAuthFailureRate
        expr: |
          sum(rate(istio_requests_total{response_code="401"}[5m])) by (destination_service)
          /
          sum(rate(istio_requests_total[5m])) by (destination_service)
          > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Auth failure rate above 10% for {{ $labels.destination_service }}"

      - alert: RateLimitExceeded
        expr: |
          sum(rate(istio_requests_total{response_code="429"}[5m])) by (source_principal)
          > 100
        for: 2m
        labels:
          severity: info
        annotations:
          summary: "Client {{ $labels.source_principal }} exceeding rate limits"

      - alert: OPADecisionLatencyHigh
        expr: |
          histogram_quantile(0.99, rate(opa_decision_duration_seconds_bucket[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "OPA decision latency p99 above 50ms"

      - alert: mTLSHandshakeFailures
        expr: |
          sum(rate(envoy_ssl_connection_error[5m])) by (pod) > 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "mTLS handshake failures on {{ $labels.pod }}"

Distributed Tracing for Auth Decisions

Include authorization decision metadata in distributed traces:

# Istio telemetry configuration
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: api-tracing
  namespace: production
spec:
  tracing:
    - providers:
        - name: jaeger
      randomSamplingPercentage: 10
      customTags:
        auth.principal:
          header:
            name: x-jwt-payload
        auth.tenant_id:
          header:
            name: x-tenant-id
        rate_limit.remaining:
          header:
            name: x-ratelimit-remaining

Defense in Depth: The Complete Stack

Each layer addresses specific threat vectors:

┌─────────────────────────────────────────────────────┐
│ Layer 1: Network (WAF / Cloud Load Balancer)        │
│  • DDoS protection                                  │
│  • Bot detection                                    │
│  • IP reputation filtering                          │
│  • Geographic restrictions                          │
├─────────────────────────────────────────────────────┤
│ Layer 2: Ingress (Istio Gateway)                    │
│  • TLS termination (external)                       │
│  • JWT validation                                   │
│  • Global rate limiting                             │
│  • Request size limits                              │
├─────────────────────────────────────────────────────┤
│ Layer 3: Mesh (Istio Sidecars)                      │
│  • mTLS between all services                        │
│  • Service-level AuthorizationPolicies              │
│  • Local rate limiting                              │
├─────────────────────────────────────────────────────┤
│ Layer 4: Application (OPA + Middleware)              │
│  • Tenant isolation                                 │
│  • Fine-grained resource authorization              │
│  • JSON schema validation                           │
│  • Business logic authorization                     │
├─────────────────────────────────────────────────────┤
│ Layer 5: Data (Encryption + Access Control)          │
│  • Encryption at rest (KMS)                         │
│  • Row-level security (database)                    │
│  • Field-level encryption for sensitive data        │
│  • Audit logging for data access                    │
└─────────────────────────────────────────────────────┘

A request that passes Layer 1 still faces JWT validation at Layer 2, mTLS verification at Layer 3, tenant isolation at Layer 4, and data-level access controls at Layer 5. Compromising any single layer doesn't grant unrestricted access.

Case Study: Multi-Tenant B2B API Platform

Background

A multi-tenant B2B API platform serving 200+ customers with three pricing tiers (Free, Starter, Enterprise) needed a comprehensive security stack. Customers ranged from individual developers on the free tier to financial institutions on enterprise plans processing thousands of requests per second. The Stripe Systems team designed and implemented the Kubernetes-native security architecture.

Requirements

✓mTLS between all internal services — no plaintext traffic within the cluster
✓JWT validation at ingress — reject unauthenticated requests before they reach application code
✓Per-customer rate limits — different limits per pricing tier, per API endpoint
✓Tenant isolation — tenant A cannot access tenant B's data, enforceable at the infrastructure level
✓Audit trail — every API call logged with caller identity, tenant context, and authorization decision

Implementation

mTLS with Istio:

Strict mTLS across the production namespace. The PeerAuthentication policy rejected any non-mTLS connection:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: strict-mtls
  namespace: production
spec:
  mtls:
    mode: STRICT

Service-to-service communication verification ensured that only authorized services could call each other:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: payment-service-callers
  namespace: production
spec:
  selector:
    matchLabels:
      app: payment-service
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/order-service"
              - "cluster.local/ns/production/sa/billing-service"
      to:
        - operation:
            methods: ["POST", "GET"]
            paths: ["/api/v1/payments/*"]

Only order-service and billing-service could call payment-service. Any other service — even with valid mTLS — received a 403.

JWT Validation at Ingress:

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: api-jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      istio: ingressgateway
  jwtRules:
    - issuer: "https://auth.platform.example.com/"
      jwksUri: "https://auth.platform.example.com/.well-known/jwks.json"
      audiences: ["api.platform.example.com"]
      forwardOriginalToken: true
      outputClaimsToHeaders:
        - header: "x-tenant-id"
          claim: "tenant_id"
        - header: "x-api-tier"
          claim: "tier"
        - header: "x-user-roles"
          claim: "roles"

---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-auth
  namespace: production
spec:
  selector:
    matchLabels:
      istio: ingressgateway
  action: DENY
  rules:
    - from:
        - source:
            notRequestPrincipals: ["*"]
      to:
        - operation:
            notPaths:
              - "/health"
              - "/ready"
              - "/.well-known/*"
              - "/docs/*"

The RequestAuthentication extracted tenant_id, tier, and roles claims into headers, making them available to downstream services and the rate limiter without requiring each service to parse the JWT.

Per-Customer Rate Limiting:

The rate limit configuration provided different limits per tier and per endpoint:

# Rate limit config
domain: platform-api
descriptors:
  - key: api_tier
    value: free
    rate_limit:
      unit: minute
      requests_per_unit: 60
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 10
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 5
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 10

  - key: api_tier
    value: starter
    rate_limit:
      unit: minute
      requests_per_unit: 600
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 100
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 50
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 100

  - key: api_tier
    value: enterprise
    rate_limit:
      unit: minute
      requests_per_unit: 6000
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 1000
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 500
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 1000

Rate limit headers were returned on every response:

X-RateLimit-Limit: 600
X-RateLimit-Remaining: 423
X-RateLimit-Reset: 1694188800

OPA Tenant Isolation:

The OPA policy ensured strict tenant isolation — a request authenticated as tenant A could not access resources belonging to tenant B:

package platform.authz

import rego.v1

default allow := false

# Allow request if tenant context matches
allow if {
    input.tenant_id != ""
    resource_tenant := extract_tenant_from_path(input.path)
    resource_tenant != ""
    input.tenant_id == resource_tenant
}

# Allow non-tenant-scoped endpoints
allow if {
    input.tenant_id != ""
    not is_tenant_scoped_path(input.path)
}

# Admin bypass for platform operators
allow if {
    "platform_admin" in input.roles
}

extract_tenant_from_path(path) := tenant if {
    parts := split(path, "/")
    some i
    parts[i] == "tenants"
    tenant := parts[i + 1]
}

extract_tenant_from_path(path) := "" if {
    parts := split(path, "/")
    not path_contains_tenants(parts)
}

path_contains_tenants(parts) if {
    some i
    parts[i] == "tenants"
}

is_tenant_scoped_path(path) if {
    contains(path, "/tenants/")
}

This policy ran on every request. OPA decision latency averaged 1.2ms at p50 and 4.8ms at p99 — acceptable overhead for the security guarantee.

Results

After deployment, the platform operated with the following security posture:

✓100% mTLS coverage — verified by Istio telemetry showing zero plaintext connections
✓0 cross-tenant data access incidents in the first 12 months of operation
✓99.97% rate limiter availability — Redis cluster with automatic failover
✓Average auth overhead: 6.3ms per request (JWT validation + OPA decision + rate limit check)
✓3 blocked unauthorized access attempts detected in the first quarter via auth failure alerting — all were misconfigured API clients, not attacks, but the detection mechanism proved operational

Architecture Decisions and Trade-offs

Why Istio over Linkerd: the platform needed JWT validation and rate limiting at the mesh level. Linkerd focuses on mTLS and observability; Istio provides the AuthorizationPolicy and EnvoyFilter extension points required for the full security stack.

Why OPA sidecar over centralized OPA: deploying OPA as a sidecar with each service eliminated a network hop for authorization decisions and removed a single point of failure. The trade-off was higher resource consumption (each pod runs an OPA container), but the latency and reliability improvements justified it.

Why Redis-backed global rate limiting over local: the free tier limit of 60 requests/minute needed to be accurate regardless of how many pod replicas served the traffic. Local rate limiting would allow 60 * N requests where N is the replica count.

Why failure_mode_deny: false on rate limiting: a rate limiter outage that blocks all API traffic is worse than temporarily allowing over-limit requests. The team monitored rate limiter availability separately and had alerts for when it was unavailable.

Summary

Zero-trust API security is a layered implementation, not a single technology. mTLS provides transport-level identity and encryption. JWT validation provides caller authentication. Rate limiting provides abuse prevention and fairness. OPA provides fine-grained authorization.

Each layer is independently valuable, and implementing them incrementally is practical: start with mTLS (Istio makes this straightforward), add JWT validation at the ingress, then add rate limiting and OPA as your authorization requirements grow.

The overhead — 5–10ms per request for the full stack — is acceptable for most API workloads and significantly cheaper than the cost of a security incident caused by trusting network boundaries that no longer exist.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting co...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

A default Docker image built from `node:18` or `python:3.11` ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary p...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather tha...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pric...

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers me...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there ...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

DevSecOps📅 January 21, 2026· 15 min read

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

✍️

Stripe Systems Engineering

mTLS: Mutual Authentication Between Services

In a microservice architecture, mTLS between services means:

✓Service A proves its identity to Service B when making a request
✓Service B proves its identity to Service A in the response
✓The communication channel is encrypted
✓No service can impersonate another without possessing its private key

Istio Service Mesh for Automatic mTLS

Istio injects a sidecar proxy (Envoy) into every pod. These proxies handle mTLS automatically — application code doesn't need to manage certificates or TLS configuration.

Install Istio with strict mTLS by default:

istioctl install --set profile=default \
  --set meshConfig.defaultConfig.holdApplicationUntilProxyStarts=true

# Enable sidecar injection for your namespace
kubectl label namespace production istio-injection=enabled

Enforce strict mTLS across the mesh:

# peer-authentication.yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system  # Mesh-wide policy
spec:
  mtls:
    mode: STRICT

With STRICT mode, any connection attempt without a valid mTLS certificate is rejected. This applies to all service-to-service communication within the mesh.

Per-service exceptions (when necessary):

Some services need to accept non-mTLS traffic — for example, a health check endpoint called by a load balancer that doesn't participate in the mesh:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: health-check-exception
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  mtls:
    mode: STRICT
  portLevelMtls:
    8080:
      mode: PERMISSIVE  # Allow non-mTLS on health check port only

Certificate Management with cert-manager

Istio manages its own certificate authority (Citadel) for mesh-internal mTLS. For external-facing TLS certificates, cert-manager automates certificate issuance and renewal:

# cert-manager ClusterIssuer with Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            class: istio

---
# Certificate for API domain
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: istio-system
spec:
  secretName: api-tls-cert
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - api.example.com
    - "*.api.example.com"
  renewBefore: 720h  # Renew 30 days before expiry

SPIFFE/SPIRE for Workload Identity

# SPIRE ClusterSPIFFEID for a service
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: payment-service
spec:
  spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
  podSelector:
    matchLabels:
      app: payment-service
  namespaceSelector:
    matchLabels:
      environment: production

JWT Validation at the Mesh Level

Istio RequestAuthentication

RequestAuthentication tells Istio how to validate incoming JWTs:

# request-authentication.yaml
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  jwtRules:
    - issuer: "https://auth.example.com/"
      jwksUri: "https://auth.example.com/.well-known/jwks.json"
      audiences:
        - "api.example.com"
      forwardOriginalToken: true
      outputPayloadToHeader: "x-jwt-payload"
    - issuer: "https://accounts.google.com"
      jwksUri: "https://www.googleapis.com/oauth2/v3/certs"
      audiences:
        - "api.example.com"

This configuration:

✓Validates tokens from two issuers (your auth service and Google)
✓Verifies the aud (audience) claim matches your API
✓Forwards the original token to the upstream service
✓Extracts the JWT payload into a header for downstream use

Important: RequestAuthentication only validates tokens that are present. It does not reject requests without tokens. To require authentication, pair it with an AuthorizationPolicy.

Istio AuthorizationPolicy

AuthorizationPolicy controls which requests are allowed based on JWT claims, source identity, or request attributes:

# authorization-policy.yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  action: DENY
  rules:
    - from:
        - source:
            notRequestPrincipals: ["*"]
      to:
        - operation:
            notPaths: ["/health", "/ready", "/metrics"]

---
# Role-based access
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: admin-endpoints
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  action: ALLOW
  rules:
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            paths: ["/admin/*"]
      when:
        - key: request.auth.claims[role]
          values: ["admin"]

The first policy denies any request without a valid JWT (except health/ready/metrics endpoints). The second policy restricts /admin/* endpoints to tokens containing "role": "admin".

Token Architecture

A robust token architecture separates short-lived access tokens from longer-lived refresh tokens:

Token Type	Lifetime	Contains	Storage
Access token	15–60 minutes	User ID, roles, permissions, tenant ID	Authorization header (Bearer)
Refresh token	7–30 days	User ID, token family ID	HTTP-only secure cookie or secure storage
API key	Long-lived (rotate quarterly)	Client ID, tier, rate limit config	Authorization header

Rate Limiting

Rate limiting prevents abuse, ensures fair resource allocation, and protects backend services from traffic spikes. In a Kubernetes-native stack, you have two options:

Local Rate Limiting (Envoy)

# envoy-filter-local-ratelimit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: local-ratelimit
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            "@type": type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: http_local_rate_limiter
              token_bucket:
                max_tokens: 100
                tokens_per_fill: 100
                fill_interval: 60s
              filter_enabled:
                runtime_key: local_rate_limit_enabled
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              filter_enforced:
                runtime_key: local_rate_limit_enforced
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              response_headers_to_add:
                - append_action: OVERWRITE_IF_EXISTS_OR_ADD
                  header:
                    key: x-local-rate-limit
                    value: "true"

Global Rate Limiting (External Service)

Global rate limiting uses a centralized service (typically Redis-backed) that all instances share. This provides accurate, cluster-wide rate limits.

Rate limit service deployment:

# ratelimit-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ratelimit
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ratelimit
  template:
    metadata:
      labels:
        app: ratelimit
    spec:
      containers:
        - name: ratelimit
          image: envoyproxy/ratelimit:master
          ports:
            - containerPort: 8081  # gRPC
          env:
            - name: REDIS_SOCKET_TYPE
              value: "tcp"
            - name: REDIS_URL
              value: "redis.production.svc.cluster.local:6379"
            - name: RUNTIME_ROOT
              value: "/data"
            - name: RUNTIME_SUBDIRECTORY
              value: "ratelimit"
            - name: USE_STATSD
              value: "false"
          volumeMounts:
            - name: config
              mountPath: /data/ratelimit/config
      volumes:
        - name: config
          configMap:
            name: ratelimit-config

Rate limit configuration per API tier:

# ratelimit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ratelimit-config
  namespace: production
data:
  config.yaml: |
    domain: api-gateway
    descriptors:
      # Per-customer rate limits based on API tier
      - key: api_tier
        value: free
        rate_limit:
          unit: minute
          requests_per_unit: 60
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 10

      - key: api_tier
        value: starter
        rate_limit:
          unit: minute
          requests_per_unit: 600
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 100

      - key: api_tier
        value: enterprise
        rate_limit:
          unit: minute
          requests_per_unit: 6000
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 1000

      # Global safety limit
      - key: generic_key
        value: default
        rate_limit:
          unit: second
          requests_per_unit: 5000

Istio EnvoyFilter to connect to the rate limit service:

# envoy-filter-global-ratelimit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: global-ratelimit
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.ratelimit
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
            domain: api-gateway
            failure_mode_deny: false
            rate_limit_service:
              grpc_service:
                envoy_grpc:
                  cluster_name: rate_limit_cluster
              transport_api_version: V3
            request_type: external

Request-Level Authorization with OPA

OPA sidecar deployment pattern:

# deployment-with-opa.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  template:
    spec:
      containers:
        - name: api-service
          image: api-service:v1.2.3
          ports:
            - containerPort: 8080

        - name: opa
          image: openpolicyagent/opa:latest-envoy
          ports:
            - containerPort: 9191  # Decision API
            - containerPort: 8181  # Management API
          args:
            - "run"
            - "--server"
            - "--addr=0.0.0.0:8181"
            - "--diagnostic-addr=0.0.0.0:8282"
            - "--set=plugins.envoy_ext_authz_grpc.addr=:9191"
            - "--set=plugins.envoy_ext_authz_grpc.path=envoy/authz/allow"
            - "--set=decision_logs.console=true"
            - "/policies"
          volumeMounts:
            - name: opa-policies
              mountPath: /policies
      volumes:
        - name: opa-policies
          configMap:
            name: opa-policies

OPA policy for tenant isolation:

# policies/tenant_isolation.rego
package envoy.authz

import rego.v1

default allow := false

allow if {
    is_valid_token
    is_authorized_for_resource
}

is_valid_token if {
    token := input.attributes.request.http.headers.authorization
    startswith(token, "Bearer ")
    jwt := substring(token, 7, -1)
    [header, payload, _] := io.jwt.decode(jwt)
    payload.exp > time.now_ns() / 1e9
}

# Extract tenant ID from JWT
tenant_id := tid if {
    token := input.attributes.request.http.headers.authorization
    jwt := substring(token, 7, -1)
    [_, payload, _] := io.jwt.decode(jwt)
    tid := payload.tenant_id
}

# Extract tenant ID from request path (e.g., /api/v1/tenants/{tenant_id}/resources)
path_tenant_id := ptid if {
    path := input.attributes.request.http.path
    parts := split(path, "/")
    parts[3] == "tenants"
    ptid := parts[4]
}

# Tenant can only access their own resources
is_authorized_for_resource if {
    path_tenant_id
    tenant_id == path_tenant_id
}

# Non-tenant-scoped paths are allowed for any authenticated user
is_authorized_for_resource if {
    not path_tenant_id
}

# Admin override — admins can access any tenant's resources
is_authorized_for_resource if {
    token := input.attributes.request.http.headers.authorization
    jwt := substring(token, 7, -1)
    [_, payload, _] := io.jwt.decode(jwt)
    "admin" in payload.roles
}

API Threat Protection

Beyond authentication and authorization, APIs need protection against malformed and malicious payloads.

Payload Validation with JSON Schema

Validate request bodies against a JSON schema before they reach your application:

# Istio EnvoyFilter for request body validation
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: payload-limits
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.buffer
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.buffer.v3.Buffer
            max_request_bytes: 1048576  # 1MB max request size

For JSON schema validation, implement it at the API gateway level (e.g., Kong, Ambassador) or as middleware in your application:

// Express middleware for JSON schema validation
const Ajv = require('ajv');
const ajv = new Ajv({ allErrors: true, removeAdditional: true });

const createUserSchema = {
  type: 'object',
  required: ['email', 'name'],
  properties: {
    email: { type: 'string', format: 'email', maxLength: 254 },
    name: { type: 'string', minLength: 1, maxLength: 100 },
    role: { type: 'string', enum: ['viewer', 'editor', 'admin'] }
  },
  additionalProperties: false
};

function validateBody(schema) {
  const validate = ajv.compile(schema);
  return (req, res, next) => {
    if (!validate(req.body)) {
      return res.status(400).json({
        error: 'Validation failed',
        details: validate.errors.map(e => ({
          field: e.instancePath,
          message: e.message
        }))
      });
    }
    next();
  };
}

app.post('/api/v1/users', validateBody(createUserSchema), createUser);

Observability: Monitoring the Security Stack

A zero-trust stack generates a large volume of security-relevant telemetry. Structured monitoring is essential for detecting anomalies and debugging legitimate access issues.

Key Metrics to Track

# Prometheus rules for API security monitoring
groups:
  - name: api-security
    rules:
      - alert: HighAuthFailureRate
        expr: |
          sum(rate(istio_requests_total{response_code="401"}[5m])) by (destination_service)
          /
          sum(rate(istio_requests_total[5m])) by (destination_service)
          > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Auth failure rate above 10% for {{ $labels.destination_service }}"

      - alert: RateLimitExceeded
        expr: |
          sum(rate(istio_requests_total{response_code="429"}[5m])) by (source_principal)
          > 100
        for: 2m
        labels:
          severity: info
        annotations:
          summary: "Client {{ $labels.source_principal }} exceeding rate limits"

      - alert: OPADecisionLatencyHigh
        expr: |
          histogram_quantile(0.99, rate(opa_decision_duration_seconds_bucket[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "OPA decision latency p99 above 50ms"

      - alert: mTLSHandshakeFailures
        expr: |
          sum(rate(envoy_ssl_connection_error[5m])) by (pod) > 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "mTLS handshake failures on {{ $labels.pod }}"

Distributed Tracing for Auth Decisions

Include authorization decision metadata in distributed traces:

# Istio telemetry configuration
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: api-tracing
  namespace: production
spec:
  tracing:
    - providers:
        - name: jaeger
      randomSamplingPercentage: 10
      customTags:
        auth.principal:
          header:
            name: x-jwt-payload
        auth.tenant_id:
          header:
            name: x-tenant-id
        rate_limit.remaining:
          header:
            name: x-ratelimit-remaining

Defense in Depth: The Complete Stack

Each layer addresses specific threat vectors:

┌─────────────────────────────────────────────────────┐
│ Layer 1: Network (WAF / Cloud Load Balancer)        │
│  • DDoS protection                                  │
│  • Bot detection                                    │
│  • IP reputation filtering                          │
│  • Geographic restrictions                          │
├─────────────────────────────────────────────────────┤
│ Layer 2: Ingress (Istio Gateway)                    │
│  • TLS termination (external)                       │
│  • JWT validation                                   │
│  • Global rate limiting                             │
│  • Request size limits                              │
├─────────────────────────────────────────────────────┤
│ Layer 3: Mesh (Istio Sidecars)                      │
│  • mTLS between all services                        │
│  • Service-level AuthorizationPolicies              │
│  • Local rate limiting                              │
├─────────────────────────────────────────────────────┤
│ Layer 4: Application (OPA + Middleware)              │
│  • Tenant isolation                                 │
│  • Fine-grained resource authorization              │
│  • JSON schema validation                           │
│  • Business logic authorization                     │
├─────────────────────────────────────────────────────┤
│ Layer 5: Data (Encryption + Access Control)          │
│  • Encryption at rest (KMS)                         │
│  • Row-level security (database)                    │
│  • Field-level encryption for sensitive data        │
│  • Audit logging for data access                    │
└─────────────────────────────────────────────────────┘

Case Study: Multi-Tenant B2B API Platform

Background

Requirements

✓mTLS between all internal services — no plaintext traffic within the cluster
✓JWT validation at ingress — reject unauthenticated requests before they reach application code
✓Per-customer rate limits — different limits per pricing tier, per API endpoint
✓Tenant isolation — tenant A cannot access tenant B's data, enforceable at the infrastructure level
✓Audit trail — every API call logged with caller identity, tenant context, and authorization decision

Implementation

mTLS with Istio:

Strict mTLS across the production namespace. The PeerAuthentication policy rejected any non-mTLS connection:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: strict-mtls
  namespace: production
spec:
  mtls:
    mode: STRICT

Service-to-service communication verification ensured that only authorized services could call each other:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: payment-service-callers
  namespace: production
spec:
  selector:
    matchLabels:
      app: payment-service
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/order-service"
              - "cluster.local/ns/production/sa/billing-service"
      to:
        - operation:
            methods: ["POST", "GET"]
            paths: ["/api/v1/payments/*"]

Only order-service and billing-service could call payment-service. Any other service — even with valid mTLS — received a 403.

JWT Validation at Ingress:

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: api-jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      istio: ingressgateway
  jwtRules:
    - issuer: "https://auth.platform.example.com/"
      jwksUri: "https://auth.platform.example.com/.well-known/jwks.json"
      audiences: ["api.platform.example.com"]
      forwardOriginalToken: true
      outputClaimsToHeaders:
        - header: "x-tenant-id"
          claim: "tenant_id"
        - header: "x-api-tier"
          claim: "tier"
        - header: "x-user-roles"
          claim: "roles"

---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-auth
  namespace: production
spec:
  selector:
    matchLabels:
      istio: ingressgateway
  action: DENY
  rules:
    - from:
        - source:
            notRequestPrincipals: ["*"]
      to:
        - operation:
            notPaths:
              - "/health"
              - "/ready"
              - "/.well-known/*"
              - "/docs/*"

Per-Customer Rate Limiting:

The rate limit configuration provided different limits per tier and per endpoint:

# Rate limit config
domain: platform-api
descriptors:
  - key: api_tier
    value: free
    rate_limit:
      unit: minute
      requests_per_unit: 60
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 10
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 5
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 10

  - key: api_tier
    value: starter
    rate_limit:
      unit: minute
      requests_per_unit: 600
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 100
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 50
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 100

  - key: api_tier
    value: enterprise
    rate_limit:
      unit: minute
      requests_per_unit: 6000
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 1000
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 500
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 1000

Rate limit headers were returned on every response:

X-RateLimit-Limit: 600
X-RateLimit-Remaining: 423
X-RateLimit-Reset: 1694188800

OPA Tenant Isolation:

The OPA policy ensured strict tenant isolation — a request authenticated as tenant A could not access resources belonging to tenant B:

package platform.authz

import rego.v1

default allow := false

# Allow request if tenant context matches
allow if {
    input.tenant_id != ""
    resource_tenant := extract_tenant_from_path(input.path)
    resource_tenant != ""
    input.tenant_id == resource_tenant
}

# Allow non-tenant-scoped endpoints
allow if {
    input.tenant_id != ""
    not is_tenant_scoped_path(input.path)
}

# Admin bypass for platform operators
allow if {
    "platform_admin" in input.roles
}

extract_tenant_from_path(path) := tenant if {
    parts := split(path, "/")
    some i
    parts[i] == "tenants"
    tenant := parts[i + 1]
}

extract_tenant_from_path(path) := "" if {
    parts := split(path, "/")
    not path_contains_tenants(parts)
}

path_contains_tenants(parts) if {
    some i
    parts[i] == "tenants"
}

is_tenant_scoped_path(path) if {
    contains(path, "/tenants/")
}

This policy ran on every request. OPA decision latency averaged 1.2ms at p50 and 4.8ms at p99 — acceptable overhead for the security guarantee.

Results

After deployment, the platform operated with the following security posture:

✓100% mTLS coverage — verified by Istio telemetry showing zero plaintext connections
✓0 cross-tenant data access incidents in the first 12 months of operation
✓99.97% rate limiter availability — Redis cluster with automatic failover
✓Average auth overhead: 6.3ms per request (JWT validation + OPA decision + rate limit check)
✓3 blocked unauthorized access attempts detected in the first quarter via auth failure alerting — all were misconfigured API clients, not attacks, but the detection mechanism proved operational

Architecture Decisions and Trade-offs

Summary

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

mTLS: Mutual Authentication Between Services

Istio Service Mesh for Automatic mTLS

Certificate Management with cert-manager

SPIFFE/SPIRE for Workload Identity

JWT Validation at the Mesh Level

Istio RequestAuthentication

Istio AuthorizationPolicy

Token Architecture

Rate Limiting

Local Rate Limiting (Envoy)

Global Rate Limiting (External Service)

Request-Level Authorization with OPA

API Threat Protection

Payload Validation with JSON Schema

Observability: Monitoring the Security Stack

Key Metrics to Track

Distributed Tracing for Auth Decisions

Defense in Depth: The Complete Stack

Case Study: Multi-Tenant B2B API Platform

Background

Requirements

Implementation

Results

Architecture Decisions and Trade-offs

Summary

Related Services from Stripe Systems

DevOps

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

Staff Augmentation — A Practical Guide for Engineering Leaders

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Why Custom Software Development Matters for Growing Businesses

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

mTLS: Mutual Authentication Between Services

Istio Service Mesh for Automatic mTLS