Skip to main content
Stripe SystemsStripe Systems
DevSecOps📅 January 21, 2026· 15 min read

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

✍️
Stripe Systems Engineering

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public internet. For APIs, this translates to verifying identity, validating authorization, enforcing rate limits, and inspecting payloads on every request — regardless of whether the caller is an external client, an internal microservice, or a batch job running in the same cluster.

This post covers the specific technologies and configurations required to implement zero-trust API security on Kubernetes: Istio service mesh for automatic mTLS, JWT validation at the ingress and mesh level, rate limiting with both local and global strategies, and OPA for fine-grained authorization decisions.

mTLS: Mutual Authentication Between Services

Standard TLS (what your browser uses for HTTPS) authenticates the server to the client: you verify that api.example.com is who it claims to be. Mutual TLS (mTLS) adds the reverse: the server also authenticates the client. Both parties present certificates, and both verify the other's identity.

In a microservice architecture, mTLS between services means:

  • Service A proves its identity to Service B when making a request
  • Service B proves its identity to Service A in the response
  • The communication channel is encrypted
  • No service can impersonate another without possessing its private key

Istio Service Mesh for Automatic mTLS

Istio injects a sidecar proxy (Envoy) into every pod. These proxies handle mTLS automatically — application code doesn't need to manage certificates or TLS configuration.

Install Istio with strict mTLS by default:

istioctl install --set profile=default \
  --set meshConfig.defaultConfig.holdApplicationUntilProxyStarts=true

# Enable sidecar injection for your namespace
kubectl label namespace production istio-injection=enabled

Enforce strict mTLS across the mesh:

# peer-authentication.yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system  # Mesh-wide policy
spec:
  mtls:
    mode: STRICT

With STRICT mode, any connection attempt without a valid mTLS certificate is rejected. This applies to all service-to-service communication within the mesh.

Per-service exceptions (when necessary):

Some services need to accept non-mTLS traffic — for example, a health check endpoint called by a load balancer that doesn't participate in the mesh:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: health-check-exception
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  mtls:
    mode: STRICT
  portLevelMtls:
    8080:
      mode: PERMISSIVE  # Allow non-mTLS on health check port only

Certificate Management with cert-manager

Istio manages its own certificate authority (Citadel) for mesh-internal mTLS. For external-facing TLS certificates, cert-manager automates certificate issuance and renewal:

# cert-manager ClusterIssuer with Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            class: istio

---
# Certificate for API domain
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: istio-system
spec:
  secretName: api-tls-cert
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - api.example.com
    - "*.api.example.com"
  renewBefore: 720h  # Renew 30 days before expiry

SPIFFE/SPIRE for Workload Identity

For environments requiring stronger workload identity than Istio's built-in CA, SPIFFE (Secure Production Identity Framework for Everyone) provides a standardized identity framework. SPIRE is the reference implementation.

Each workload receives a SPIFFE ID (e.g., spiffe://example.com/ns/production/sa/payment-service) and a short-lived X.509 certificate. Unlike Istio's Citadel, SPIRE supports multi-cluster and hybrid environments, and integrates with external identity providers.

# SPIRE ClusterSPIFFEID for a service
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: payment-service
spec:
  spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
  podSelector:
    matchLabels:
      app: payment-service
  namespaceSelector:
    matchLabels:
      environment: production

JWT Validation at the Mesh Level

JSON Web Tokens (JWTs) carry identity and authorization claims. Validating JWTs at the Istio ingress gateway or sidecar proxy means your application code doesn't need to implement JWT verification — the mesh handles it before the request reaches your service.

Istio RequestAuthentication

RequestAuthentication tells Istio how to validate incoming JWTs:

# request-authentication.yaml
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  jwtRules:
    - issuer: "https://auth.example.com/"
      jwksUri: "https://auth.example.com/.well-known/jwks.json"
      audiences:
        - "api.example.com"
      forwardOriginalToken: true
      outputPayloadToHeader: "x-jwt-payload"
    - issuer: "https://accounts.google.com"
      jwksUri: "https://www.googleapis.com/oauth2/v3/certs"
      audiences:
        - "api.example.com"

This configuration:

  • Validates tokens from two issuers (your auth service and Google)
  • Verifies the aud (audience) claim matches your API
  • Forwards the original token to the upstream service
  • Extracts the JWT payload into a header for downstream use

Important: RequestAuthentication only validates tokens that are present. It does not reject requests without tokens. To require authentication, pair it with an AuthorizationPolicy.

Istio AuthorizationPolicy

AuthorizationPolicy controls which requests are allowed based on JWT claims, source identity, or request attributes:

# authorization-policy.yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  action: DENY
  rules:
    - from:
        - source:
            notRequestPrincipals: ["*"]
      to:
        - operation:
            notPaths: ["/health", "/ready", "/metrics"]

---
# Role-based access
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: admin-endpoints
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-gateway
  action: ALLOW
  rules:
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      to:
        - operation:
            paths: ["/admin/*"]
      when:
        - key: request.auth.claims[role]
          values: ["admin"]

The first policy denies any request without a valid JWT (except health/ready/metrics endpoints). The second policy restricts /admin/* endpoints to tokens containing "role": "admin".

Token Architecture

A robust token architecture separates short-lived access tokens from longer-lived refresh tokens:

Token TypeLifetimeContainsStorage
Access token15–60 minutesUser ID, roles, permissions, tenant IDAuthorization header (Bearer)
Refresh token7–30 daysUser ID, token family IDHTTP-only secure cookie or secure storage
API keyLong-lived (rotate quarterly)Client ID, tier, rate limit configAuthorization header

Token scoping: access tokens should contain the minimum claims needed for authorization. A token for the billing API doesn't need permissions for the user management API. Scoped tokens limit the blast radius of a compromised token.

Audience validation: every token should specify its intended audience (aud claim), and every API should verify it. A token issued for billing.api.example.com should be rejected by users.api.example.com.

Rate Limiting

Rate limiting prevents abuse, ensures fair resource allocation, and protects backend services from traffic spikes. In a Kubernetes-native stack, you have two options:

Local Rate Limiting (Envoy)

Local rate limiting runs in each Envoy sidecar independently. It's simple to configure and doesn't require external dependencies, but each pod maintains its own counter — 5 replicas with a 100 req/min limit allows 500 req/min total.

# envoy-filter-local-ratelimit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: local-ratelimit
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            "@type": type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: http_local_rate_limiter
              token_bucket:
                max_tokens: 100
                tokens_per_fill: 100
                fill_interval: 60s
              filter_enabled:
                runtime_key: local_rate_limit_enabled
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              filter_enforced:
                runtime_key: local_rate_limit_enforced
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              response_headers_to_add:
                - append_action: OVERWRITE_IF_EXISTS_OR_ADD
                  header:
                    key: x-local-rate-limit
                    value: "true"

Global Rate Limiting (External Service)

Global rate limiting uses a centralized service (typically Redis-backed) that all instances share. This provides accurate, cluster-wide rate limits.

Rate limit service deployment:

# ratelimit-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ratelimit
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ratelimit
  template:
    metadata:
      labels:
        app: ratelimit
    spec:
      containers:
        - name: ratelimit
          image: envoyproxy/ratelimit:master
          ports:
            - containerPort: 8081  # gRPC
          env:
            - name: REDIS_SOCKET_TYPE
              value: "tcp"
            - name: REDIS_URL
              value: "redis.production.svc.cluster.local:6379"
            - name: RUNTIME_ROOT
              value: "/data"
            - name: RUNTIME_SUBDIRECTORY
              value: "ratelimit"
            - name: USE_STATSD
              value: "false"
          volumeMounts:
            - name: config
              mountPath: /data/ratelimit/config
      volumes:
        - name: config
          configMap:
            name: ratelimit-config

Rate limit configuration per API tier:

# ratelimit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ratelimit-config
  namespace: production
data:
  config.yaml: |
    domain: api-gateway
    descriptors:
      # Per-customer rate limits based on API tier
      - key: api_tier
        value: free
        rate_limit:
          unit: minute
          requests_per_unit: 60
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 10

      - key: api_tier
        value: starter
        rate_limit:
          unit: minute
          requests_per_unit: 600
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 100

      - key: api_tier
        value: enterprise
        rate_limit:
          unit: minute
          requests_per_unit: 6000
        descriptors:
          - key: path
            value: "/api/v1/search"
            rate_limit:
              unit: minute
              requests_per_unit: 1000

      # Global safety limit
      - key: generic_key
        value: default
        rate_limit:
          unit: second
          requests_per_unit: 5000

Istio EnvoyFilter to connect to the rate limit service:

# envoy-filter-global-ratelimit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: global-ratelimit
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
        listener:
          filterChain:
            filter:
              name: envoy.filters.network.http_connection_manager
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.ratelimit
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
            domain: api-gateway
            failure_mode_deny: false
            rate_limit_service:
              grpc_service:
                envoy_grpc:
                  cluster_name: rate_limit_cluster
              transport_api_version: V3
            request_type: external

Setting failure_mode_deny: false means that if the rate limit service is unavailable, requests are allowed through. This is a deliberate choice — a rate limiter outage shouldn't cause a complete API outage. Monitor the rate limit service availability separately.

Request-Level Authorization with OPA

JWT claims handle identity and coarse-grained roles. For fine-grained authorization — "can this user access this specific resource?" — OPA provides a policy engine that evaluates complex rules without embedding authorization logic in application code.

OPA sidecar deployment pattern:

# deployment-with-opa.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  template:
    spec:
      containers:
        - name: api-service
          image: api-service:v1.2.3
          ports:
            - containerPort: 8080

        - name: opa
          image: openpolicyagent/opa:latest-envoy
          ports:
            - containerPort: 9191  # Decision API
            - containerPort: 8181  # Management API
          args:
            - "run"
            - "--server"
            - "--addr=0.0.0.0:8181"
            - "--diagnostic-addr=0.0.0.0:8282"
            - "--set=plugins.envoy_ext_authz_grpc.addr=:9191"
            - "--set=plugins.envoy_ext_authz_grpc.path=envoy/authz/allow"
            - "--set=decision_logs.console=true"
            - "/policies"
          volumeMounts:
            - name: opa-policies
              mountPath: /policies
      volumes:
        - name: opa-policies
          configMap:
            name: opa-policies

OPA policy for tenant isolation:

# policies/tenant_isolation.rego
package envoy.authz

import rego.v1

default allow := false

allow if {
    is_valid_token
    is_authorized_for_resource
}

is_valid_token if {
    token := input.attributes.request.http.headers.authorization
    startswith(token, "Bearer ")
    jwt := substring(token, 7, -1)
    [header, payload, _] := io.jwt.decode(jwt)
    payload.exp > time.now_ns() / 1e9
}

# Extract tenant ID from JWT
tenant_id := tid if {
    token := input.attributes.request.http.headers.authorization
    jwt := substring(token, 7, -1)
    [_, payload, _] := io.jwt.decode(jwt)
    tid := payload.tenant_id
}

# Extract tenant ID from request path (e.g., /api/v1/tenants/{tenant_id}/resources)
path_tenant_id := ptid if {
    path := input.attributes.request.http.path
    parts := split(path, "/")
    parts[3] == "tenants"
    ptid := parts[4]
}

# Tenant can only access their own resources
is_authorized_for_resource if {
    path_tenant_id
    tenant_id == path_tenant_id
}

# Non-tenant-scoped paths are allowed for any authenticated user
is_authorized_for_resource if {
    not path_tenant_id
}

# Admin override — admins can access any tenant's resources
is_authorized_for_resource if {
    token := input.attributes.request.http.headers.authorization
    jwt := substring(token, 7, -1)
    [_, payload, _] := io.jwt.decode(jwt)
    "admin" in payload.roles
}

This policy enforces strict tenant isolation: a request to /api/v1/tenants/tenant-123/resources is only allowed if the JWT's tenant_id claim is tenant-123 — or if the caller has the admin role.

API Threat Protection

Beyond authentication and authorization, APIs need protection against malformed and malicious payloads.

Payload Validation with JSON Schema

Validate request bodies against a JSON schema before they reach your application:

# Istio EnvoyFilter for request body validation
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: payload-limits
  namespace: production
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.buffer
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.buffer.v3.Buffer
            max_request_bytes: 1048576  # 1MB max request size

For JSON schema validation, implement it at the API gateway level (e.g., Kong, Ambassador) or as middleware in your application:

// Express middleware for JSON schema validation
const Ajv = require('ajv');
const ajv = new Ajv({ allErrors: true, removeAdditional: true });

const createUserSchema = {
  type: 'object',
  required: ['email', 'name'],
  properties: {
    email: { type: 'string', format: 'email', maxLength: 254 },
    name: { type: 'string', minLength: 1, maxLength: 100 },
    role: { type: 'string', enum: ['viewer', 'editor', 'admin'] }
  },
  additionalProperties: false
};

function validateBody(schema) {
  const validate = ajv.compile(schema);
  return (req, res, next) => {
    if (!validate(req.body)) {
      return res.status(400).json({
        error: 'Validation failed',
        details: validate.errors.map(e => ({
          field: e.instancePath,
          message: e.message
        }))
      });
    }
    next();
  };
}

app.post('/api/v1/users', validateBody(createUserSchema), createUser);

Observability: Monitoring the Security Stack

A zero-trust stack generates a large volume of security-relevant telemetry. Structured monitoring is essential for detecting anomalies and debugging legitimate access issues.

Key Metrics to Track

# Prometheus rules for API security monitoring
groups:
  - name: api-security
    rules:
      - alert: HighAuthFailureRate
        expr: |
          sum(rate(istio_requests_total{response_code="401"}[5m])) by (destination_service)
          /
          sum(rate(istio_requests_total[5m])) by (destination_service)
          > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Auth failure rate above 10% for {{ $labels.destination_service }}"

      - alert: RateLimitExceeded
        expr: |
          sum(rate(istio_requests_total{response_code="429"}[5m])) by (source_principal)
          > 100
        for: 2m
        labels:
          severity: info
        annotations:
          summary: "Client {{ $labels.source_principal }} exceeding rate limits"

      - alert: OPADecisionLatencyHigh
        expr: |
          histogram_quantile(0.99, rate(opa_decision_duration_seconds_bucket[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "OPA decision latency p99 above 50ms"

      - alert: mTLSHandshakeFailures
        expr: |
          sum(rate(envoy_ssl_connection_error[5m])) by (pod) > 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "mTLS handshake failures on {{ $labels.pod }}"

Distributed Tracing for Auth Decisions

Include authorization decision metadata in distributed traces:

# Istio telemetry configuration
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: api-tracing
  namespace: production
spec:
  tracing:
    - providers:
        - name: jaeger
      randomSamplingPercentage: 10
      customTags:
        auth.principal:
          header:
            name: x-jwt-payload
        auth.tenant_id:
          header:
            name: x-tenant-id
        rate_limit.remaining:
          header:
            name: x-ratelimit-remaining

Defense in Depth: The Complete Stack

Each layer addresses specific threat vectors:

┌─────────────────────────────────────────────────────┐
│ Layer 1: Network (WAF / Cloud Load Balancer)        │
│  • DDoS protection                                  │
│  • Bot detection                                    │
│  • IP reputation filtering                          │
│  • Geographic restrictions                          │
├─────────────────────────────────────────────────────┤
│ Layer 2: Ingress (Istio Gateway)                    │
│  • TLS termination (external)                       │
│  • JWT validation                                   │
│  • Global rate limiting                             │
│  • Request size limits                              │
├─────────────────────────────────────────────────────┤
│ Layer 3: Mesh (Istio Sidecars)                      │
│  • mTLS between all services                        │
│  • Service-level AuthorizationPolicies              │
│  • Local rate limiting                              │
├─────────────────────────────────────────────────────┤
│ Layer 4: Application (OPA + Middleware)              │
│  • Tenant isolation                                 │
│  • Fine-grained resource authorization              │
│  • JSON schema validation                           │
│  • Business logic authorization                     │
├─────────────────────────────────────────────────────┤
│ Layer 5: Data (Encryption + Access Control)          │
│  • Encryption at rest (KMS)                         │
│  • Row-level security (database)                    │
│  • Field-level encryption for sensitive data        │
│  • Audit logging for data access                    │
└─────────────────────────────────────────────────────┘

A request that passes Layer 1 still faces JWT validation at Layer 2, mTLS verification at Layer 3, tenant isolation at Layer 4, and data-level access controls at Layer 5. Compromising any single layer doesn't grant unrestricted access.

Case Study: Multi-Tenant B2B API Platform

Background

A multi-tenant B2B API platform serving 200+ customers with three pricing tiers (Free, Starter, Enterprise) needed a comprehensive security stack. Customers ranged from individual developers on the free tier to financial institutions on enterprise plans processing thousands of requests per second. The Stripe Systems team designed and implemented the Kubernetes-native security architecture.

Requirements

  1. mTLS between all internal services — no plaintext traffic within the cluster
  2. JWT validation at ingress — reject unauthenticated requests before they reach application code
  3. Per-customer rate limits — different limits per pricing tier, per API endpoint
  4. Tenant isolation — tenant A cannot access tenant B's data, enforceable at the infrastructure level
  5. Audit trail — every API call logged with caller identity, tenant context, and authorization decision

Implementation

mTLS with Istio:

Strict mTLS across the production namespace. The PeerAuthentication policy rejected any non-mTLS connection:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: strict-mtls
  namespace: production
spec:
  mtls:
    mode: STRICT

Service-to-service communication verification ensured that only authorized services could call each other:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: payment-service-callers
  namespace: production
spec:
  selector:
    matchLabels:
      app: payment-service
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/order-service"
              - "cluster.local/ns/production/sa/billing-service"
      to:
        - operation:
            methods: ["POST", "GET"]
            paths: ["/api/v1/payments/*"]

Only order-service and billing-service could call payment-service. Any other service — even with valid mTLS — received a 403.

JWT Validation at Ingress:

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: api-jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      istio: ingressgateway
  jwtRules:
    - issuer: "https://auth.platform.example.com/"
      jwksUri: "https://auth.platform.example.com/.well-known/jwks.json"
      audiences: ["api.platform.example.com"]
      forwardOriginalToken: true
      outputClaimsToHeaders:
        - header: "x-tenant-id"
          claim: "tenant_id"
        - header: "x-api-tier"
          claim: "tier"
        - header: "x-user-roles"
          claim: "roles"

---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: require-auth
  namespace: production
spec:
  selector:
    matchLabels:
      istio: ingressgateway
  action: DENY
  rules:
    - from:
        - source:
            notRequestPrincipals: ["*"]
      to:
        - operation:
            notPaths:
              - "/health"
              - "/ready"
              - "/.well-known/*"
              - "/docs/*"

The RequestAuthentication extracted tenant_id, tier, and roles claims into headers, making them available to downstream services and the rate limiter without requiring each service to parse the JWT.

Per-Customer Rate Limiting:

The rate limit configuration provided different limits per tier and per endpoint:

# Rate limit config
domain: platform-api
descriptors:
  - key: api_tier
    value: free
    rate_limit:
      unit: minute
      requests_per_unit: 60
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 10
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 5
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 10

  - key: api_tier
    value: starter
    rate_limit:
      unit: minute
      requests_per_unit: 600
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 100
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 50
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 100

  - key: api_tier
    value: enterprise
    rate_limit:
      unit: minute
      requests_per_unit: 6000
    descriptors:
      - key: api_endpoint
        value: search
        rate_limit:
          unit: minute
          requests_per_unit: 1000
      - key: api_endpoint
        value: bulk_export
        rate_limit:
          unit: hour
          requests_per_unit: 500
      - key: api_endpoint
        value: webhook_register
        rate_limit:
          unit: day
          requests_per_unit: 1000

Rate limit headers were returned on every response:

X-RateLimit-Limit: 600
X-RateLimit-Remaining: 423
X-RateLimit-Reset: 1694188800

OPA Tenant Isolation:

The OPA policy ensured strict tenant isolation — a request authenticated as tenant A could not access resources belonging to tenant B:

package platform.authz

import rego.v1

default allow := false

# Allow request if tenant context matches
allow if {
    input.tenant_id != ""
    resource_tenant := extract_tenant_from_path(input.path)
    resource_tenant != ""
    input.tenant_id == resource_tenant
}

# Allow non-tenant-scoped endpoints
allow if {
    input.tenant_id != ""
    not is_tenant_scoped_path(input.path)
}

# Admin bypass for platform operators
allow if {
    "platform_admin" in input.roles
}

extract_tenant_from_path(path) := tenant if {
    parts := split(path, "/")
    some i
    parts[i] == "tenants"
    tenant := parts[i + 1]
}

extract_tenant_from_path(path) := "" if {
    parts := split(path, "/")
    not path_contains_tenants(parts)
}

path_contains_tenants(parts) if {
    some i
    parts[i] == "tenants"
}

is_tenant_scoped_path(path) if {
    contains(path, "/tenants/")
}

This policy ran on every request. OPA decision latency averaged 1.2ms at p50 and 4.8ms at p99 — acceptable overhead for the security guarantee.

Results

After deployment, the platform operated with the following security posture:

  • 100% mTLS coverage — verified by Istio telemetry showing zero plaintext connections
  • 0 cross-tenant data access incidents in the first 12 months of operation
  • 99.97% rate limiter availability — Redis cluster with automatic failover
  • Average auth overhead: 6.3ms per request (JWT validation + OPA decision + rate limit check)
  • 3 blocked unauthorized access attempts detected in the first quarter via auth failure alerting — all were misconfigured API clients, not attacks, but the detection mechanism proved operational

Architecture Decisions and Trade-offs

Why Istio over Linkerd: the platform needed JWT validation and rate limiting at the mesh level. Linkerd focuses on mTLS and observability; Istio provides the AuthorizationPolicy and EnvoyFilter extension points required for the full security stack.

Why OPA sidecar over centralized OPA: deploying OPA as a sidecar with each service eliminated a network hop for authorization decisions and removed a single point of failure. The trade-off was higher resource consumption (each pod runs an OPA container), but the latency and reliability improvements justified it.

Why Redis-backed global rate limiting over local: the free tier limit of 60 requests/minute needed to be accurate regardless of how many pod replicas served the traffic. Local rate limiting would allow 60 * N requests where N is the replica count.

Why failure_mode_deny: false on rate limiting: a rate limiter outage that blocks all API traffic is worse than temporarily allowing over-limit requests. The team monitored rate limiter availability separately and had alerts for when it was unavailable.

Summary

Zero-trust API security is a layered implementation, not a single technology. mTLS provides transport-level identity and encryption. JWT validation provides caller authentication. Rate limiting provides abuse prevention and fairness. OPA provides fine-grained authorization.

Each layer is independently valuable, and implementing them incrementally is practical: start with mTLS (Istio makes this straightforward), add JWT validation at the ingress, then add rate limiting and OPA as your authorization requirements grow.

The overhead — 5–10ms per request for the full stack — is acceptable for most API workloads and significantly cheaper than the cost of a security incident caused by trusting network boundaries that no longer exist.

Ready to discuss your project?

Get in Touch →
← Back to Blog

More Articles