Cloud Computing📅 March 22, 2026· 19 min read

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

✍️

Stripe Systems Engineering

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pricing. The reality is that multi-cloud introduces operational complexity that most organizations underestimate — and the supposed benefits only materialize under specific conditions.

This post is a practical guide for engineering teams evaluating or implementing multi-cloud. We will cover when it actually makes sense, how to build portable infrastructure without sacrificing performance, and where the costs (both financial and operational) are real. Everything here is grounded in production experience.

When Multi-Cloud Makes Sense — and When It Doesn't

Before investing in multi-cloud, be honest about why you want it.

Legitimate reasons:

✓Regulatory data residency requirements. If you must store EU citizen data in EU-based infrastructure and Indian citizen data in India, and your primary cloud provider has limited regions in one jurisdiction, you may need a second provider.
✓Acquiring a company that runs on a different cloud. You inherit their infrastructure. Migration is expensive and risky. Running both for an extended period is a pragmatic choice.
✓Specific managed services with no equivalent. GCP's BigQuery for analytics, AWS's SageMaker for ML pipelines, Azure's Active Directory integration — sometimes one provider has a genuinely superior service for a specific workload.
✓Contractual or procurement constraints. Government contracts or enterprise clients may mandate specific providers.

Poor reasons:

✓"Avoiding vendor lock-in" as an abstract goal. If you are a 50-person startup, the cost of building and maintaining cloud-agnostic abstractions far exceeds the hypothetical cost of migrating later.
✓Negotiating better pricing. In practice, cloud pricing negotiations depend on committed spend. Splitting spend across providers weakens your negotiating position with each one.
✓Disaster recovery. A well-architected multi-region deployment within a single cloud provider gives you comparable resilience at a fraction of the operational cost.

The honest assessment: multi-cloud adds 30-60% operational overhead in terms of tooling, team skills, and debugging complexity. Adopt it only when the business case is unambiguous.

Abstraction Layers: Terraform, Pulumi, and Crossplane

If you commit to multi-cloud, infrastructure-as-code abstraction is non-negotiable. You need a single workflow to provision resources across providers. The three practical options are Terraform, Pulumi, and Crossplane.

Terraform Provider Abstraction

Terraform handles multi-cloud through its provider model. You declare providers for each cloud, and modules abstract the differences:

# providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

provider "google" {
  project = var.gcp_project
  region  = var.gcp_region
}

The key pattern is writing modules that accept a cloud_provider variable and dispatch to the correct resource type:

# modules/database/main.tf
variable "cloud_provider" {
  type        = string
  description = "Target cloud: aws or gcp"
  validation {
    condition     = contains(["aws", "gcp"], var.cloud_provider)
    error_message = "Supported providers: aws, gcp."
  }
}

variable "db_name" {
  type = string
}

variable "db_tier" {
  type    = string
  default = "small"
}

locals {
  aws_instance_class = {
    small  = "db.t3.medium"
    medium = "db.r6g.large"
    large  = "db.r6g.2xlarge"
  }
  gcp_tier = {
    small  = "db-custom-2-7680"
    medium = "db-custom-4-15360"
    large  = "db-custom-8-30720"
  }
}

resource "aws_db_instance" "postgres" {
  count                = var.cloud_provider == "aws" ? 1 : 0
  identifier           = var.db_name
  engine               = "postgres"
  engine_version       = "16.4"
  instance_class       = local.aws_instance_class[var.db_tier]
  allocated_storage    = 100
  storage_encrypted    = true
  publicly_accessible  = false
  skip_final_snapshot  = false
}

resource "google_sql_database_instance" "postgres" {
  count            = var.cloud_provider == "gcp" ? 1 : 0
  name             = var.db_name
  database_version = "POSTGRES_16"
  region           = var.gcp_region

  settings {
    tier              = local.gcp_tier[var.db_tier]
    availability_type = "REGIONAL"

    ip_configuration {
      ipv4_enabled    = false
      private_network = var.vpc_id
    }
  }
}

output "connection_string" {
  value = var.cloud_provider == "aws" ? aws_db_instance.postgres[0].endpoint : google_sql_database_instance.postgres[0].connection_name
}

This approach works but has a clear limitation: your modules grow linearly with each provider you support. For teams managing more than two providers, consider Crossplane.

Crossplane for Kubernetes-Native Abstraction

Crossplane runs as a Kubernetes operator and lets you define cloud resources as custom Kubernetes objects. The advantage is that teams already using Kubernetes get a consistent API surface:

apiVersion: database.crossplane.io/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: fintech-primary
spec:
  parameters:
    storageGB: 100
    version: "16"
  compositionSelector:
    matchLabels:
      provider: aws
      region: eu-west-1
  writeConnectionSecretToRef:
    name: db-credentials

Crossplane compositions map this to the correct provider-specific resource. The tradeoff is that you now depend on Kubernetes as your control plane — which is itself a significant operational commitment.

The Portable Services Stack

Not every managed service is portable. The pragmatic approach is to standardize on services that have near-equivalent managed offerings across clouds:

Service	AWS	GCP	Azure	Portable Alternative
Relational DB	RDS PostgreSQL	Cloud SQL PostgreSQL	Azure Database for PostgreSQL	PostgreSQL (any provider)
Cache	ElastiCache Redis	Memorystore Redis	Azure Cache for Redis	Redis (any provider)
Message Queue	MSK (Kafka)	Managed Kafka (via Confluent)	Event Hubs (Kafka protocol)	Apache Kafka
Container Orchestration	EKS	GKE	AKS	Kubernetes (any provider)
Object Storage	S3	Cloud Storage	Blob Storage	MinIO (self-managed) or S3-compatible API

PostgreSQL, Redis, Kafka, and Kubernetes form the common denominator. If your application code talks only to these interfaces, the infrastructure layer becomes swappable. Avoid cloud-specific features like Aurora Serverless v2 or Cloud Spanner unless you have a concrete reason — they become anchors.

Object storage deserves special mention. AWS S3's API has become a de facto standard. Both GCP Cloud Storage and MinIO support S3-compatible endpoints. Write your application against the S3 API and you preserve portability even if the underlying storage changes.

Data Residency and Compliance

Data residency is the most common legitimate driver of multi-cloud adoption. Three regulatory frameworks matter for most organizations:

✓GDPR (EU): Personal data of EU residents must be processed under GDPR protections. While GDPR does not strictly require data to stay within the EU, Schrems II rulings make cross-border transfers legally complex. Hosting in EU regions is the simplest compliance path.
✓India's DPDP Act (2023): The Digital Personal Data Protection Act allows the Indian government to restrict cross-border data transfers to specific countries via notification. Financial data in particular faces pressure to remain within Indian borders.
✓Sector-specific rules: PCI DSS for payment data, HIPAA for healthcare, RBI guidelines for financial data in India — each adds constraints on where data can reside and how it can be replicated.

When your users span jurisdictions with conflicting residency requirements, multi-cloud (or at minimum multi-region within a single cloud) becomes architecturally necessary. The challenge is that most managed database services do not support cross-cloud replication natively, which pushes you toward either self-managed databases or application-level replication patterns.

DNS-Based Traffic Routing

DNS is the simplest mechanism for directing traffic to the correct cloud based on user geography or failover conditions. All three major providers offer weighted, geolocation, and health-check-based DNS routing.

A typical pattern uses a top-level domain with geographic routing:

# AWS Route53 CLI — create geolocation routing policy
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890 \
  --change-batch '{
    "Changes": [{
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "SetIdentifier": "eu-traffic",
        "GeoLocation": { "ContinentCode": "EU" },
        "AliasTarget": {
          "HostedZoneId": "Z2FDTNDATAQYW2",
          "DNSName": "eu-alb-1234567.eu-west-1.elb.amazonaws.com",
          "EvaluateTargetHealth": true
        }
      }
    }, {
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "SetIdentifier": "india-traffic",
        "GeoLocation": { "CountryCode": "IN" },
        "AliasTarget": {
          "HostedZoneId": "Z1BNKLHFG3H9OA",
          "DNSName": "gcp-proxy.asia-south1.example.com",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

For cross-cloud setups, Route53 or Cloud DNS can point to load balancer endpoints in different providers. The critical detail is health checking — configure health checks against each cloud's endpoint so DNS automatically fails over if one becomes unavailable:

# Terraform: Route53 health check for GCP endpoint
resource "aws_route53_health_check" "gcp_api" {
  fqdn              = "gcp-proxy.asia-south1.example.com"
  port               = 443
  type               = "HTTPS"
  resource_path      = "/healthz"
  failure_threshold  = 3
  request_interval   = 10

  tags = {
    Name = "gcp-asia-south1-health"
  }
}

DNS TTLs matter here. Set TTLs to 60 seconds or less for failover records. Higher TTLs mean clients continue hitting a failed endpoint for the duration of the cached record.

Cross-Cloud Networking

Networking between clouds is where multi-cloud gets expensive and latency-sensitive. There are two approaches: VPN tunnels over the public internet and dedicated interconnects.

VPN Tunnels

A site-to-site VPN between AWS VPC and GCP VPC is the simplest option. Both providers support IPsec tunnels:

# Terraform: AWS VPN Gateway
resource "aws_vpn_gateway" "main" {
  vpc_id = aws_vpc.primary.id
  tags   = { Name = "cross-cloud-vpn" }
}

resource "aws_customer_gateway" "gcp" {
  bgp_asn    = 65000
  ip_address = google_compute_address.vpn_static_ip.address
  type       = "ipsec.1"
  tags       = { Name = "gcp-customer-gateway" }
}

resource "aws_vpn_connection" "to_gcp" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.gcp.id
  type                = "ipsec.1"
  static_routes_only  = false
  tags                = { Name = "aws-to-gcp-vpn" }
}

Typical latency over VPN between AWS eu-west-1 and GCP asia-south1 (Mumbai) is 120-160ms. Between regions in the same geography (e.g., AWS eu-west-1 and GCP europe-west1), expect 5-15ms. VPN throughput caps at roughly 1.25 Gbps per tunnel on AWS; you can aggregate up to four tunnels per connection for higher throughput.

Dedicated Interconnects

For production workloads with high bandwidth or strict latency requirements, dedicated interconnects are justified:

✓AWS Direct Connect: 1 Gbps or 10 Gbps dedicated connections through colocation facilities.
✓GCP Partner Interconnect: 50 Mbps to 50 Gbps through service provider partners.

Using a colocation provider like Equinix, you can terminate both AWS Direct Connect and GCP Partner Interconnect in the same facility and cross-connect them. This reduces inter-cloud latency to 1-5ms for same-metro connections and provides consistent bandwidth. The cost is significant — a 1 Gbps Direct Connect port runs approximately $0.30/hour ($220/month) plus data transfer, and GCP Partner Interconnect pricing varies by partner and capacity.

Latency Reference

Route	VPN (typical)	Dedicated Interconnect
eu-west-1 ↔ europe-west1 (same metro)	5-15ms	1-3ms
eu-west-1 ↔ asia-south1	120-160ms	100-130ms
us-east-1 ↔ us-central1	20-40ms	10-20ms
Same cloud, same region	<1ms	N/A

These numbers matter for database replication. Synchronous replication across 130ms of latency is impractical for transactional workloads.

Container Portability

Kubernetes provides the orchestration abstraction, but your container images and application code must cooperate. Three rules keep containers portable:

1. Build cloud-agnostic images. Do not bake cloud-specific SDKs or credentials into images. Use multi-stage Docker builds with a minimal runtime:

# Build stage
FROM golang:1.23-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /server ./cmd/server

# Runtime stage
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /server /server
ENTRYPOINT ["/server"]

2. Use environment variables for cloud-specific configuration. Storage bucket names, database connection strings, and queue URLs should come from environment variables or Kubernetes ConfigMaps/Secrets — never hardcoded:

# k8s deployment excerpt
env:
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: db-credentials
        key: connection_string
  - name: OBJECT_STORE_BUCKET
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: storage_bucket
  - name: OBJECT_STORE_ENDPOINT
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: storage_endpoint

3. Use interface adapters for cloud services. Define interfaces in your application code and implement adapters for each provider. This keeps business logic clean:

// storage.go — interface definition
package storage

import (
    "context"
    "io"
)

type ObjectStore interface {
    Put(ctx context.Context, key string, reader io.Reader) error
    Get(ctx context.Context, key string) (io.ReadCloser, error)
    Delete(ctx context.Context, key string) error
}

// storage_s3.go — AWS/S3-compatible implementation
package storage

import (
    "context"
    "io"

    "github.com/aws/aws-sdk-go-v2/service/s3"
)

type S3Store struct {
    client *s3.Client
    bucket string
}

func NewS3Store(client *s3.Client, bucket string) *S3Store {
    return &S3Store{client: client, bucket: bucket}
}

func (s *S3Store) Put(ctx context.Context, key string, reader io.Reader) error {
    _, err := s.client.PutObject(ctx, &s3.PutObjectInput{
        Bucket: &s.bucket,
        Key:    &key,
        Body:   reader,
    })
    return err
}

func (s *S3Store) Get(ctx context.Context, key string) (io.ReadCloser, error) {
    out, err := s.client.GetObject(ctx, &s3.GetObjectInput{
        Bucket: &s.bucket,
        Key:    &key,
    })
    if err != nil {
        return nil, err
    }
    return out.Body, nil
}

func (s *S3Store) Delete(ctx context.Context, key string) error {
    _, err := s.client.DeleteObject(ctx, &s3.DeleteObjectInput{
        Bucket: &s.bucket,
        Key:    &key,
    })
    return err
}

// storage_gcs.go — GCP implementation
package storage

import (
    "context"
    "io"

    gcs "cloud.google.com/go/storage"
)

type GCSStore struct {
    client *gcs.Client
    bucket string
}

func NewGCSStore(client *gcs.Client, bucket string) *GCSStore {
    return &GCSStore{client: client, bucket: bucket}
}

func (g *GCSStore) Put(ctx context.Context, key string, reader io.Reader) error {
    w := g.client.Bucket(g.bucket).Object(key).NewWriter(ctx)
    if _, err := io.Copy(w, reader); err != nil {
        w.Close()
        return err
    }
    return w.Close()
}

func (g *GCSStore) Get(ctx context.Context, key string) (io.ReadCloser, error) {
    return g.client.Bucket(g.bucket).Object(key).NewReader(ctx)
}

func (g *GCSStore) Delete(ctx context.Context, key string) error {
    return g.client.Bucket(g.bucket).Object(key).Delete(ctx)
}

At application startup, read an environment variable to decide which adapter to instantiate. The rest of your codebase only ever interacts with the ObjectStore interface.

Database Replication Across Clouds

Cross-cloud database replication is the hardest problem in multi-cloud. Your options:

Option 1: PostgreSQL Logical Replication

PostgreSQL's built-in logical replication works across any two PostgreSQL instances, regardless of where they are hosted. You configure the primary (on AWS RDS, for example) as a publisher and the secondary (on GCP Cloud SQL) as a subscriber:

-- On the AWS RDS primary: create a publication
CREATE PUBLICATION fintech_pub FOR TABLE
    accounts, transactions, user_profiles;

-- On the GCP Cloud SQL replica: create a subscription
CREATE SUBSCRIPTION fintech_sub
    CONNECTION 'host=rds-primary.eu-west-1.rds.amazonaws.com
                port=5432
                dbname=fintech
                user=replication_user
                password=REDACTED
                sslmode=require'
    PUBLICATION fintech_pub;

Pros: Works with managed PostgreSQL on any cloud. No additional software. Selective table replication. Cons: Logical replication is asynchronous — the replica lags behind the primary. At 130ms inter-cloud latency, expect 200-500ms of replication lag under normal load, potentially seconds under heavy write throughput. DDL changes (schema migrations) are not replicated automatically.

Option 2: CockroachDB

CockroachDB is a distributed SQL database that natively supports multi-region and multi-cloud deployments. You define locality for each node, and CockroachDB handles replication and consensus:

-- Configure locality-aware replication
ALTER DATABASE fintech SET PRIMARY REGION "eu-west-1";
ALTER DATABASE fintech ADD REGION "asia-south1";

-- Pin specific tables to regions for compliance
ALTER TABLE user_profiles_eu SET LOCALITY REGIONAL BY ROW;
ALTER TABLE user_profiles_in SET LOCALITY REGIONAL IN "asia-south1";

Pros: Strong consistency across regions. Automatic failover. SQL-compatible. Cons: Self-managed (no equivalent managed service across both AWS and GCP). Write latency is bounded by inter-region consensus — a write touching asia-south1 data from eu-west-1 incurs at minimum one round trip (~260ms). Operational complexity is high.

Option 3: YugabyteDB

Similar to CockroachDB in design — a distributed PostgreSQL-compatible database. YugabyteDB offers a managed service (Yugabyte Aeon) that supports multi-cloud deployments:

Pros: PostgreSQL wire protocol compatibility (easier migration from standard PostgreSQL). Managed offering reduces operational burden. Cons: Same fundamental latency constraints as CockroachDB for cross-region writes. PostgreSQL compatibility is not 100% — some extensions and behaviors differ.

Recommendation

For read-heavy workloads where eventual consistency on the replica is acceptable, PostgreSQL logical replication is the simplest and most proven approach. For workloads requiring strong consistency across regions, CockroachDB or YugabyteDB are appropriate — but expect significantly higher operational investment and write latency.

Observability Across Clouds

Centralized observability is mandatory in multi-cloud. If your logs, metrics, and traces are split across CloudWatch and Google Cloud Logging, debugging cross-cloud issues becomes a nightmare.

The standard approach: OpenTelemetry collectors in each cloud, exporting to a centralized Grafana stack.

OpenTelemetry Collector Configuration

Deploy an OTel collector as a DaemonSet in each Kubernetes cluster:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  # Scrape Kubernetes metrics
  prometheus:
    config:
      scrape_configs:
        - job_name: 'k8s-pods'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024

  # Tag all telemetry with cloud provider and region
  resource:
    attributes:
      - key: cloud.provider
        value: "${CLOUD_PROVIDER}"
        action: upsert
      - key: cloud.region
        value: "${CLOUD_REGION}"
        action: upsert
      - key: deployment.environment
        value: "${ENVIRONMENT}"
        action: upsert

  # Filter noisy health check spans
  filter:
    spans:
      exclude:
        match_type: strict
        attributes:
          - key: http.target
            value: /healthz
          - key: http.target
            value: /readyz

exporters:
  otlphttp/traces:
    endpoint: https://tempo.observability.internal:4318
    tls:
      cert_file: /etc/otel/tls/client.crt
      key_file: /etc/otel/tls/client.key

  prometheusremotewrite:
    endpoint: https://mimir.observability.internal/api/v1/push
    tls:
      cert_file: /etc/otel/tls/client.crt
      key_file: /etc/otel/tls/client.key

  loki:
    endpoint: https://loki.observability.internal/loki/api/v1/push
    tls:
      cert_file: /etc/otel/tls/client.crt
      key_file: /etc/otel/tls/client.key

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [filter, resource, batch]
      exporters: [otlphttp/traces]
    metrics:
      receivers: [otlp, prometheus]
      processors: [resource, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [loki]

The resource processor is critical — it stamps every piece of telemetry with cloud.provider and cloud.region, enabling you to filter and correlate across clouds in Grafana dashboards.

The backend stack — Grafana Tempo for traces, Mimir for metrics, Loki for logs — can run in either cloud or on dedicated infrastructure. For most teams, running it in the primary cloud and accepting the cross-cloud data transfer cost is simpler than self-hosting in a colocation facility.

The Cost of Multi-Cloud

Egress charges are the hidden tax of multi-cloud. Every byte that crosses from one cloud to another incurs data transfer fees from both sides.

Egress Pricing (as of 2026)

Provider	First 1 TB/month	1-10 TB/month	10-50 TB/month
AWS (to internet)	$0.09/GB	$0.085/GB	$0.07/GB
GCP (to internet)	$0.12/GB	$0.11/GB	$0.08/GB
Azure (to internet)	$0.087/GB	$0.083/GB	$0.07/GB

Inter-cloud traffic is billed as internet egress by both providers. If you replicate 500 GB of database WAL per month from AWS to GCP, you pay approximately $45 on the AWS side and $0 on the GCP ingress side (ingress is typically free). That sounds manageable, but add observability data (metrics, logs, traces), container image pulls, API traffic between services, and it accumulates quickly. A moderately active multi-cloud deployment can easily incur $2,000-5,000/month in cross-cloud egress alone.

Operational Cost

Beyond egress, account for:

✓Team skills. Your engineers need to be proficient in at least two cloud platforms. Training, certifications, and context-switching overhead are real.
✓Tooling divergence. CLI tools, IAM models, networking abstractions, and monitoring differ across providers. Even with Terraform, debugging provider-specific issues requires provider-specific knowledge.
✓Incident response. When a cross-cloud issue occurs, you need to correlate logs and metrics from two different platforms simultaneously. Mean time to resolution increases.
✓Security surface area. Two sets of IAM policies, two sets of network security rules, two sets of audit trails. Every additional cloud doubles your security review scope.

Decision Framework

Rather than treating multi-cloud as a binary choice, use a three-tier framework:

Tier 1: Single-Cloud-First

When: You have no regulatory requirement for multiple providers. Your team is small. You are optimizing for velocity.

Strategy: Pick one cloud. Use its managed services freely. Invest in multi-region within that cloud for resilience. Write clean interfaces in your application code (the adapter pattern described earlier) so that a future migration is possible but not premature.

Tier 2: Multi-Cloud-Ready

When: You anticipate regulatory requirements within 12-18 months. You are growing into new geographies. Your team is large enough to absorb some additional infrastructure complexity.

Strategy: Standardize on portable services (PostgreSQL, Redis, Kafka, Kubernetes). Use Terraform with provider abstraction from day one. Avoid cloud-specific managed services for core workloads. You are not actively running on multiple clouds, but your architecture could support it with 2-4 weeks of infrastructure work.

Tier 3: Active Multi-Cloud

When: You have a concrete, current business or regulatory requirement. You have a dedicated platform engineering team.

Strategy: Implement the full stack — cross-cloud networking, database replication, centralized observability, DNS-based traffic routing. Budget for the egress costs and operational overhead. Staff your platform team accordingly.

Most organizations should be at Tier 1 or Tier 2. Tier 3 is justified only with a clear and present need.

Case Study: Fintech Data Residency on AWS and GCP

A fintech company processing payments for merchants in both the EU and India faced a concrete data residency challenge. EU transaction data had to remain in EU-hosted infrastructure under GDPR. Indian transaction data needed to stay within India to satisfy RBI data localization guidelines. The primary workload — payment processing, merchant dashboards, reconciliation — ran on AWS in eu-west-1 (Ireland). However, AWS's Mumbai region (ap-south-1) alone did not satisfy the client's requirement for a geographically diverse Indian hosting option, and the Indian operations team preferred GCP's asia-south1 (Mumbai) for cost reasons and existing familiarity with BigQuery for analytics.

Stripe Systems designed and implemented the cross-cloud architecture. Here is how it was built.

Terraform Module Structure

infrastructure/
├── environments/
│   ├── eu-production/
│   │   ├── main.tf          # AWS eu-west-1 resources
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── india-production/
│       ├── main.tf          # GCP asia-south1 resources
│       ├── variables.tf
│       └── terraform.tfvars
├── modules/
│   ├── database/
│   │   ├── main.tf          # Provider-abstracted DB (shown earlier)
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── kubernetes/
│   │   ├── aws-eks/
│   │   │   └── main.tf
│   │   └── gcp-gke/
│   │       └── main.tf
│   ├── networking/
│   │   ├── aws-vpc/
│   │   │   └── main.tf
│   │   ├── gcp-vpc/
│   │   │   └── main.tf
│   │   └── cross-cloud-vpn/
│   │       └── main.tf      # VPN tunnel between AWS and GCP
│   └── observability/
│       └── otel-collector/
│           ├── daemonset.yaml
│           └── config.yaml
└── shared/
    ├── dns/
    │   └── main.tf           # Route53 geolocation routing
    └── tls/
        └── main.tf           # Shared TLS certificates

Network Architecture

The two environments connect via a pair of IPsec VPN tunnels for redundancy:

┌─────────────────────────────┐         ┌──────────────────────────────┐
│  AWS eu-west-1              │         │  GCP asia-south1             │
│                             │         │                              │
│  ┌───────────┐              │  VPN    │              ┌────────────┐  │
│  │  EKS      │──┐           │ Tunnel  │           ┌──│  GKE       │  │
│  │  Cluster  │  │           │◄──────►│           │  │  Cluster   │  │
│  └───────────┘  │           │ (IPsec) │           │  └────────────┘  │
│                 ▼           │         │           ▼                  │
│  ┌───────────────────────┐  │         │  ┌─────────────────────────┐ │
│  │  VPC: 10.0.0.0/16     │  │         │  │  VPC: 10.1.0.0/16      │ │
│  │  ┌─────────────────┐  │  │         │  │  ┌──────────────────┐  │ │
│  │  │ RDS PostgreSQL   │  │  │         │  │  │ Cloud SQL PG     │  │ │
│  │  │ (Primary)        │──┼──┼─────────┼──┼─►│ (Read Replica)   │  │ │
│  │  └─────────────────┘  │  │  Logical │  │  └──────────────────┘  │ │
│  └───────────────────────┘  │  Replic. │  └─────────────────────────┘ │
└─────────────────────────────┘         └──────────────────────────────┘
                │                                        │
                ▼                                        ▼
        Route53 (EU traffic)                  Cloud DNS (India traffic)
                └──────────────┬─────────────────────────┘
                               ▼
                        api.example.com
                   (Geolocation-based routing)

The VPN tunnels use BGP for dynamic route advertisement. AWS VPC uses CIDR 10.0.0.0/16, GCP VPC uses 10.1.0.0/16 — non-overlapping ranges are essential.

Database Replication

The primary PostgreSQL on AWS RDS (eu-west-1) publishes three tables to GCP Cloud SQL (asia-south1) via logical replication. The Cloud SQL instance serves read-only queries for the Indian merchant dashboard:

-- AWS RDS: Enable logical replication (requires parameter group change)
-- rds.logical_replication = 1 (set in RDS parameter group, requires reboot)

-- Create replication user with limited privileges
CREATE ROLE repl_user WITH REPLICATION LOGIN PASSWORD '...';
GRANT SELECT ON accounts, transactions, user_profiles TO repl_user;

-- Create publication
CREATE PUBLICATION india_read_pub FOR TABLE
    accounts, transactions, user_profiles
    WHERE (region = 'IN');

-- GCP Cloud SQL: Subscribe
-- Requires cloudsql.logical_decoding = on (set via instance flags)

CREATE SUBSCRIPTION india_read_sub
    CONNECTION 'host=primary.eu-west-1.rds.amazonaws.com
                port=5432
                dbname=fintech_prod
                user=repl_user
                password=...
                sslmode=verify-full
                sslrootcert=/etc/ssl/aws-rds-ca.pem'
    PUBLICATION india_read_pub;

The WHERE (region = 'IN') filter on the publication ensures only Indian user data replicates to GCP, satisfying both GDPR (EU data stays in EU) and Indian data localization (Indian data is available in India).

Measured replication lag under production load averaged 350ms, with spikes to 1.2 seconds during batch reconciliation jobs. The Indian merchant dashboard displays a "data as of" timestamp, and the application code accounts for eventual consistency — read-after-write operations for Indian merchants route to the AWS primary via a dedicated API path when consistency is critical.

Application-Level Cloud Abstraction

The application uses the adapter pattern to remain cloud-agnostic. Here is the storage interface and provider selection from the actual implementation:

// cmd/server/main.go — provider selection at startup
func initObjectStore(cfg Config) (storage.ObjectStore, error) {
    switch cfg.CloudProvider {
    case "aws":
        awsCfg, err := awsconfig.LoadDefaultConfig(context.Background(),
            awsconfig.WithRegion(cfg.CloudRegion),
        )
        if err != nil {
            return nil, fmt.Errorf("loading AWS config: %w", err)
        }
        client := s3.NewFromConfig(awsCfg)
        return storage.NewS3Store(client, cfg.StorageBucket), nil

    case "gcp":
        client, err := gcs.NewClient(context.Background())
        if err != nil {
            return nil, fmt.Errorf("creating GCS client: %w", err)
        }
        return storage.NewGCSStore(client, cfg.StorageBucket), nil

    default:
        return nil, fmt.Errorf("unsupported cloud provider: %s", cfg.CloudProvider)
    }
}

The Config struct is populated entirely from environment variables. The same container image deploys to both EKS and GKE — only the ConfigMap and Secrets differ between environments.

Results

The architecture has been running in production for eight months. Key operational metrics:

✓Cross-cloud VPN uptime: 99.97% (two brief outages caused by GCP maintenance windows, automatic failover to secondary tunnel).
✓Replication lag (p99): 1.8 seconds. P50 is 350ms.
✓Egress cost: Approximately $1,200/month for replication traffic and cross-cloud API calls.
✓Deployment cadence: Same CI/CD pipeline deploys to both EKS and GKE. Engineers do not interact with cloud-specific tooling during normal development.

The team at Stripe Systems continues to maintain the infrastructure, with ongoing work to evaluate replacing the VPN tunnels with a dedicated interconnect as Indian traffic grows.

Conclusion

Multi-cloud is a tool, not a goal. It solves specific problems — data residency, regulatory compliance, inherited infrastructure — and introduces specific costs. The engineering challenge is building abstractions that provide portability without burying your team in unnecessary complexity.

The practical path for most teams: start single-cloud, write clean interfaces, standardize on portable services, and adopt multi-cloud only when a concrete requirement demands it. When that requirement arrives, the patterns described here — Terraform provider abstraction, logical replication, interface adapters, centralized observability — give you a proven foundation to build on.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting co...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

A default Docker image built from `node:18` or `python:3.11` ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary p...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather tha...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers me...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public inter...

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there ...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

Cloud Computing📅 March 22, 2026· 19 min read

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

✍️

Stripe Systems Engineering

When Multi-Cloud Makes Sense — and When It Doesn't

Before investing in multi-cloud, be honest about why you want it.

Legitimate reasons:

✓Regulatory data residency requirements. If you must store EU citizen data in EU-based infrastructure and Indian citizen data in India, and your primary cloud provider has limited regions in one jurisdiction, you may need a second provider.
✓Acquiring a company that runs on a different cloud. You inherit their infrastructure. Migration is expensive and risky. Running both for an extended period is a pragmatic choice.
✓Specific managed services with no equivalent. GCP's BigQuery for analytics, AWS's SageMaker for ML pipelines, Azure's Active Directory integration — sometimes one provider has a genuinely superior service for a specific workload.
✓Contractual or procurement constraints. Government contracts or enterprise clients may mandate specific providers.

Poor reasons:

✓"Avoiding vendor lock-in" as an abstract goal. If you are a 50-person startup, the cost of building and maintaining cloud-agnostic abstractions far exceeds the hypothetical cost of migrating later.
✓Negotiating better pricing. In practice, cloud pricing negotiations depend on committed spend. Splitting spend across providers weakens your negotiating position with each one.
✓Disaster recovery. A well-architected multi-region deployment within a single cloud provider gives you comparable resilience at a fraction of the operational cost.

The honest assessment: multi-cloud adds 30-60% operational overhead in terms of tooling, team skills, and debugging complexity. Adopt it only when the business case is unambiguous.

Abstraction Layers: Terraform, Pulumi, and Crossplane

Terraform Provider Abstraction

Terraform handles multi-cloud through its provider model. You declare providers for each cloud, and modules abstract the differences:

# providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

provider "google" {
  project = var.gcp_project
  region  = var.gcp_region
}

The key pattern is writing modules that accept a cloud_provider variable and dispatch to the correct resource type:

# modules/database/main.tf
variable "cloud_provider" {
  type        = string
  description = "Target cloud: aws or gcp"
  validation {
    condition     = contains(["aws", "gcp"], var.cloud_provider)
    error_message = "Supported providers: aws, gcp."
  }
}

variable "db_name" {
  type = string
}

variable "db_tier" {
  type    = string
  default = "small"
}

locals {
  aws_instance_class = {
    small  = "db.t3.medium"
    medium = "db.r6g.large"
    large  = "db.r6g.2xlarge"
  }
  gcp_tier = {
    small  = "db-custom-2-7680"
    medium = "db-custom-4-15360"
    large  = "db-custom-8-30720"
  }
}

resource "aws_db_instance" "postgres" {
  count                = var.cloud_provider == "aws" ? 1 : 0
  identifier           = var.db_name
  engine               = "postgres"
  engine_version       = "16.4"
  instance_class       = local.aws_instance_class[var.db_tier]
  allocated_storage    = 100
  storage_encrypted    = true
  publicly_accessible  = false
  skip_final_snapshot  = false
}

resource "google_sql_database_instance" "postgres" {
  count            = var.cloud_provider == "gcp" ? 1 : 0
  name             = var.db_name
  database_version = "POSTGRES_16"
  region           = var.gcp_region

  settings {
    tier              = local.gcp_tier[var.db_tier]
    availability_type = "REGIONAL"

    ip_configuration {
      ipv4_enabled    = false
      private_network = var.vpc_id
    }
  }
}

output "connection_string" {
  value = var.cloud_provider == "aws" ? aws_db_instance.postgres[0].endpoint : google_sql_database_instance.postgres[0].connection_name
}

This approach works but has a clear limitation: your modules grow linearly with each provider you support. For teams managing more than two providers, consider Crossplane.

Crossplane for Kubernetes-Native Abstraction

Crossplane runs as a Kubernetes operator and lets you define cloud resources as custom Kubernetes objects. The advantage is that teams already using Kubernetes get a consistent API surface:

apiVersion: database.crossplane.io/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: fintech-primary
spec:
  parameters:
    storageGB: 100
    version: "16"
  compositionSelector:
    matchLabels:
      provider: aws
      region: eu-west-1
  writeConnectionSecretToRef:
    name: db-credentials

The Portable Services Stack

Not every managed service is portable. The pragmatic approach is to standardize on services that have near-equivalent managed offerings across clouds:

Service	AWS	GCP	Azure	Portable Alternative
Relational DB	RDS PostgreSQL	Cloud SQL PostgreSQL	Azure Database for PostgreSQL	PostgreSQL (any provider)
Cache	ElastiCache Redis	Memorystore Redis	Azure Cache for Redis	Redis (any provider)
Message Queue	MSK (Kafka)	Managed Kafka (via Confluent)	Event Hubs (Kafka protocol)	Apache Kafka
Container Orchestration	EKS	GKE	AKS	Kubernetes (any provider)
Object Storage	S3	Cloud Storage	Blob Storage	MinIO (self-managed) or S3-compatible API

Data Residency and Compliance

Data residency is the most common legitimate driver of multi-cloud adoption. Three regulatory frameworks matter for most organizations:

✓GDPR (EU): Personal data of EU residents must be processed under GDPR protections. While GDPR does not strictly require data to stay within the EU, Schrems II rulings make cross-border transfers legally complex. Hosting in EU regions is the simplest compliance path.
✓India's DPDP Act (2023): The Digital Personal Data Protection Act allows the Indian government to restrict cross-border data transfers to specific countries via notification. Financial data in particular faces pressure to remain within Indian borders.
✓Sector-specific rules: PCI DSS for payment data, HIPAA for healthcare, RBI guidelines for financial data in India — each adds constraints on where data can reside and how it can be replicated.

DNS-Based Traffic Routing

A typical pattern uses a top-level domain with geographic routing:

# AWS Route53 CLI — create geolocation routing policy
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890 \
  --change-batch '{
    "Changes": [{
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "SetIdentifier": "eu-traffic",
        "GeoLocation": { "ContinentCode": "EU" },
        "AliasTarget": {
          "HostedZoneId": "Z2FDTNDATAQYW2",
          "DNSName": "eu-alb-1234567.eu-west-1.elb.amazonaws.com",
          "EvaluateTargetHealth": true
        }
      }
    }, {
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "SetIdentifier": "india-traffic",
        "GeoLocation": { "CountryCode": "IN" },
        "AliasTarget": {
          "HostedZoneId": "Z1BNKLHFG3H9OA",
          "DNSName": "gcp-proxy.asia-south1.example.com",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

# Terraform: Route53 health check for GCP endpoint
resource "aws_route53_health_check" "gcp_api" {
  fqdn              = "gcp-proxy.asia-south1.example.com"
  port               = 443
  type               = "HTTPS"
  resource_path      = "/healthz"
  failure_threshold  = 3
  request_interval   = 10

  tags = {
    Name = "gcp-asia-south1-health"
  }
}

DNS TTLs matter here. Set TTLs to 60 seconds or less for failover records. Higher TTLs mean clients continue hitting a failed endpoint for the duration of the cached record.

Cross-Cloud Networking

Networking between clouds is where multi-cloud gets expensive and latency-sensitive. There are two approaches: VPN tunnels over the public internet and dedicated interconnects.

VPN Tunnels

A site-to-site VPN between AWS VPC and GCP VPC is the simplest option. Both providers support IPsec tunnels:

# Terraform: AWS VPN Gateway
resource "aws_vpn_gateway" "main" {
  vpc_id = aws_vpc.primary.id
  tags   = { Name = "cross-cloud-vpn" }
}

resource "aws_customer_gateway" "gcp" {
  bgp_asn    = 65000
  ip_address = google_compute_address.vpn_static_ip.address
  type       = "ipsec.1"
  tags       = { Name = "gcp-customer-gateway" }
}

resource "aws_vpn_connection" "to_gcp" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.gcp.id
  type                = "ipsec.1"
  static_routes_only  = false
  tags                = { Name = "aws-to-gcp-vpn" }
}

Dedicated Interconnects

For production workloads with high bandwidth or strict latency requirements, dedicated interconnects are justified:

✓AWS Direct Connect: 1 Gbps or 10 Gbps dedicated connections through colocation facilities.
✓GCP Partner Interconnect: 50 Mbps to 50 Gbps through service provider partners.

Latency Reference

Route	VPN (typical)	Dedicated Interconnect
eu-west-1 ↔ europe-west1 (same metro)	5-15ms	1-3ms
eu-west-1 ↔ asia-south1	120-160ms	100-130ms
us-east-1 ↔ us-central1	20-40ms	10-20ms
Same cloud, same region	<1ms	N/A

These numbers matter for database replication. Synchronous replication across 130ms of latency is impractical for transactional workloads.

Container Portability

Kubernetes provides the orchestration abstraction, but your container images and application code must cooperate. Three rules keep containers portable:

1. Build cloud-agnostic images. Do not bake cloud-specific SDKs or credentials into images. Use multi-stage Docker builds with a minimal runtime:

# Build stage
FROM golang:1.23-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /server ./cmd/server

# Runtime stage
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /server /server
ENTRYPOINT ["/server"]

# k8s deployment excerpt
env:
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: db-credentials
        key: connection_string
  - name: OBJECT_STORE_BUCKET
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: storage_bucket
  - name: OBJECT_STORE_ENDPOINT
    valueFrom:
      configMapKeyRef:
        name: app-config
        key: storage_endpoint

3. Use interface adapters for cloud services. Define interfaces in your application code and implement adapters for each provider. This keeps business logic clean:

// storage.go — interface definition
package storage

import (
    "context"
    "io"
)

type ObjectStore interface {
    Put(ctx context.Context, key string, reader io.Reader) error
    Get(ctx context.Context, key string) (io.ReadCloser, error)
    Delete(ctx context.Context, key string) error
}

// storage_s3.go — AWS/S3-compatible implementation
package storage

import (
    "context"
    "io"

    "github.com/aws/aws-sdk-go-v2/service/s3"
)

type S3Store struct {
    client *s3.Client
    bucket string
}

func NewS3Store(client *s3.Client, bucket string) *S3Store {
    return &S3Store{client: client, bucket: bucket}
}

func (s *S3Store) Put(ctx context.Context, key string, reader io.Reader) error {
    _, err := s.client.PutObject(ctx, &s3.PutObjectInput{
        Bucket: &s.bucket,
        Key:    &key,
        Body:   reader,
    })
    return err
}

func (s *S3Store) Get(ctx context.Context, key string) (io.ReadCloser, error) {
    out, err := s.client.GetObject(ctx, &s3.GetObjectInput{
        Bucket: &s.bucket,
        Key:    &key,
    })
    if err != nil {
        return nil, err
    }
    return out.Body, nil
}

func (s *S3Store) Delete(ctx context.Context, key string) error {
    _, err := s.client.DeleteObject(ctx, &s3.DeleteObjectInput{
        Bucket: &s.bucket,
        Key:    &key,
    })
    return err
}

// storage_gcs.go — GCP implementation
package storage

import (
    "context"
    "io"

    gcs "cloud.google.com/go/storage"
)

type GCSStore struct {
    client *gcs.Client
    bucket string
}

func NewGCSStore(client *gcs.Client, bucket string) *GCSStore {
    return &GCSStore{client: client, bucket: bucket}
}

func (g *GCSStore) Put(ctx context.Context, key string, reader io.Reader) error {
    w := g.client.Bucket(g.bucket).Object(key).NewWriter(ctx)
    if _, err := io.Copy(w, reader); err != nil {
        w.Close()
        return err
    }
    return w.Close()
}

func (g *GCSStore) Get(ctx context.Context, key string) (io.ReadCloser, error) {
    return g.client.Bucket(g.bucket).Object(key).NewReader(ctx)
}

func (g *GCSStore) Delete(ctx context.Context, key string) error {
    return g.client.Bucket(g.bucket).Object(key).Delete(ctx)
}

At application startup, read an environment variable to decide which adapter to instantiate. The rest of your codebase only ever interacts with the ObjectStore interface.

Database Replication Across Clouds

Cross-cloud database replication is the hardest problem in multi-cloud. Your options:

Option 1: PostgreSQL Logical Replication

-- On the AWS RDS primary: create a publication
CREATE PUBLICATION fintech_pub FOR TABLE
    accounts, transactions, user_profiles;

-- On the GCP Cloud SQL replica: create a subscription
CREATE SUBSCRIPTION fintech_sub
    CONNECTION 'host=rds-primary.eu-west-1.rds.amazonaws.com
                port=5432
                dbname=fintech
                user=replication_user
                password=REDACTED
                sslmode=require'
    PUBLICATION fintech_pub;

Option 2: CockroachDB

CockroachDB is a distributed SQL database that natively supports multi-region and multi-cloud deployments. You define locality for each node, and CockroachDB handles replication and consensus:

-- Configure locality-aware replication
ALTER DATABASE fintech SET PRIMARY REGION "eu-west-1";
ALTER DATABASE fintech ADD REGION "asia-south1";

-- Pin specific tables to regions for compliance
ALTER TABLE user_profiles_eu SET LOCALITY REGIONAL BY ROW;
ALTER TABLE user_profiles_in SET LOCALITY REGIONAL IN "asia-south1";

Option 3: YugabyteDB

Similar to CockroachDB in design — a distributed PostgreSQL-compatible database. YugabyteDB offers a managed service (Yugabyte Aeon) that supports multi-cloud deployments:

Recommendation

Observability Across Clouds

Centralized observability is mandatory in multi-cloud. If your logs, metrics, and traces are split across CloudWatch and Google Cloud Logging, debugging cross-cloud issues becomes a nightmare.

The standard approach: OpenTelemetry collectors in each cloud, exporting to a centralized Grafana stack.

OpenTelemetry Collector Configuration

Deploy an OTel collector as a DaemonSet in each Kubernetes cluster:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  # Scrape Kubernetes metrics
  prometheus:
    config:
      scrape_configs:
        - job_name: 'k8s-pods'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024

  # Tag all telemetry with cloud provider and region
  resource:
    attributes:
      - key: cloud.provider
        value: "${CLOUD_PROVIDER}"
        action: upsert
      - key: cloud.region
        value: "${CLOUD_REGION}"
        action: upsert
      - key: deployment.environment
        value: "${ENVIRONMENT}"
        action: upsert

  # Filter noisy health check spans
  filter:
    spans:
      exclude:
        match_type: strict
        attributes:
          - key: http.target
            value: /healthz
          - key: http.target
            value: /readyz

exporters:
  otlphttp/traces:
    endpoint: https://tempo.observability.internal:4318
    tls:
      cert_file: /etc/otel/tls/client.crt
      key_file: /etc/otel/tls/client.key

  prometheusremotewrite:
    endpoint: https://mimir.observability.internal/api/v1/push
    tls:
      cert_file: /etc/otel/tls/client.crt
      key_file: /etc/otel/tls/client.key

  loki:
    endpoint: https://loki.observability.internal/loki/api/v1/push
    tls:
      cert_file: /etc/otel/tls/client.crt
      key_file: /etc/otel/tls/client.key

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [filter, resource, batch]
      exporters: [otlphttp/traces]
    metrics:
      receivers: [otlp, prometheus]
      processors: [resource, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [loki]

The resource processor is critical — it stamps every piece of telemetry with cloud.provider and cloud.region, enabling you to filter and correlate across clouds in Grafana dashboards.

The Cost of Multi-Cloud

Egress charges are the hidden tax of multi-cloud. Every byte that crosses from one cloud to another incurs data transfer fees from both sides.

Egress Pricing (as of 2026)

Provider	First 1 TB/month	1-10 TB/month	10-50 TB/month
AWS (to internet)	$0.09/GB	$0.085/GB	$0.07/GB
GCP (to internet)	$0.12/GB	$0.11/GB	$0.08/GB
Azure (to internet)	$0.087/GB	$0.083/GB	$0.07/GB

Operational Cost

Beyond egress, account for:

✓Team skills. Your engineers need to be proficient in at least two cloud platforms. Training, certifications, and context-switching overhead are real.
✓Tooling divergence. CLI tools, IAM models, networking abstractions, and monitoring differ across providers. Even with Terraform, debugging provider-specific issues requires provider-specific knowledge.
✓Incident response. When a cross-cloud issue occurs, you need to correlate logs and metrics from two different platforms simultaneously. Mean time to resolution increases.
✓Security surface area. Two sets of IAM policies, two sets of network security rules, two sets of audit trails. Every additional cloud doubles your security review scope.

Decision Framework

Rather than treating multi-cloud as a binary choice, use a three-tier framework:

Tier 1: Single-Cloud-First

When: You have no regulatory requirement for multiple providers. Your team is small. You are optimizing for velocity.

Tier 2: Multi-Cloud-Ready

When: You anticipate regulatory requirements within 12-18 months. You are growing into new geographies. Your team is large enough to absorb some additional infrastructure complexity.

Tier 3: Active Multi-Cloud

When: You have a concrete, current business or regulatory requirement. You have a dedicated platform engineering team.

Most organizations should be at Tier 1 or Tier 2. Tier 3 is justified only with a clear and present need.

Case Study: Fintech Data Residency on AWS and GCP

Stripe Systems designed and implemented the cross-cloud architecture. Here is how it was built.

Terraform Module Structure

infrastructure/
├── environments/
│   ├── eu-production/
│   │   ├── main.tf          # AWS eu-west-1 resources
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── india-production/
│       ├── main.tf          # GCP asia-south1 resources
│       ├── variables.tf
│       └── terraform.tfvars
├── modules/
│   ├── database/
│   │   ├── main.tf          # Provider-abstracted DB (shown earlier)
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── kubernetes/
│   │   ├── aws-eks/
│   │   │   └── main.tf
│   │   └── gcp-gke/
│   │       └── main.tf
│   ├── networking/
│   │   ├── aws-vpc/
│   │   │   └── main.tf
│   │   ├── gcp-vpc/
│   │   │   └── main.tf
│   │   └── cross-cloud-vpn/
│   │       └── main.tf      # VPN tunnel between AWS and GCP
│   └── observability/
│       └── otel-collector/
│           ├── daemonset.yaml
│           └── config.yaml
└── shared/
    ├── dns/
    │   └── main.tf           # Route53 geolocation routing
    └── tls/
        └── main.tf           # Shared TLS certificates

Network Architecture

The two environments connect via a pair of IPsec VPN tunnels for redundancy:

┌─────────────────────────────┐         ┌──────────────────────────────┐
│  AWS eu-west-1              │         │  GCP asia-south1             │
│                             │         │                              │
│  ┌───────────┐              │  VPN    │              ┌────────────┐  │
│  │  EKS      │──┐           │ Tunnel  │           ┌──│  GKE       │  │
│  │  Cluster  │  │           │◄──────►│           │  │  Cluster   │  │
│  └───────────┘  │           │ (IPsec) │           │  └────────────┘  │
│                 ▼           │         │           ▼                  │
│  ┌───────────────────────┐  │         │  ┌─────────────────────────┐ │
│  │  VPC: 10.0.0.0/16     │  │         │  │  VPC: 10.1.0.0/16      │ │
│  │  ┌─────────────────┐  │  │         │  │  ┌──────────────────┐  │ │
│  │  │ RDS PostgreSQL   │  │  │         │  │  │ Cloud SQL PG     │  │ │
│  │  │ (Primary)        │──┼──┼─────────┼──┼─►│ (Read Replica)   │  │ │
│  │  └─────────────────┘  │  │  Logical │  │  └──────────────────┘  │ │
│  └───────────────────────┘  │  Replic. │  └─────────────────────────┘ │
└─────────────────────────────┘         └──────────────────────────────┘
                │                                        │
                ▼                                        ▼
        Route53 (EU traffic)                  Cloud DNS (India traffic)
                └──────────────┬─────────────────────────┘
                               ▼
                        api.example.com
                   (Geolocation-based routing)

The VPN tunnels use BGP for dynamic route advertisement. AWS VPC uses CIDR 10.0.0.0/16, GCP VPC uses 10.1.0.0/16 — non-overlapping ranges are essential.

Database Replication

-- AWS RDS: Enable logical replication (requires parameter group change)
-- rds.logical_replication = 1 (set in RDS parameter group, requires reboot)

-- Create replication user with limited privileges
CREATE ROLE repl_user WITH REPLICATION LOGIN PASSWORD '...';
GRANT SELECT ON accounts, transactions, user_profiles TO repl_user;

-- Create publication
CREATE PUBLICATION india_read_pub FOR TABLE
    accounts, transactions, user_profiles
    WHERE (region = 'IN');

-- GCP Cloud SQL: Subscribe
-- Requires cloudsql.logical_decoding = on (set via instance flags)

CREATE SUBSCRIPTION india_read_sub
    CONNECTION 'host=primary.eu-west-1.rds.amazonaws.com
                port=5432
                dbname=fintech_prod
                user=repl_user
                password=...
                sslmode=verify-full
                sslrootcert=/etc/ssl/aws-rds-ca.pem'
    PUBLICATION india_read_pub;

Application-Level Cloud Abstraction

The application uses the adapter pattern to remain cloud-agnostic. Here is the storage interface and provider selection from the actual implementation:

// cmd/server/main.go — provider selection at startup
func initObjectStore(cfg Config) (storage.ObjectStore, error) {
    switch cfg.CloudProvider {
    case "aws":
        awsCfg, err := awsconfig.LoadDefaultConfig(context.Background(),
            awsconfig.WithRegion(cfg.CloudRegion),
        )
        if err != nil {
            return nil, fmt.Errorf("loading AWS config: %w", err)
        }
        client := s3.NewFromConfig(awsCfg)
        return storage.NewS3Store(client, cfg.StorageBucket), nil

    case "gcp":
        client, err := gcs.NewClient(context.Background())
        if err != nil {
            return nil, fmt.Errorf("creating GCS client: %w", err)
        }
        return storage.NewGCSStore(client, cfg.StorageBucket), nil

    default:
        return nil, fmt.Errorf("unsupported cloud provider: %s", cfg.CloudProvider)
    }
}

The Config struct is populated entirely from environment variables. The same container image deploys to both EKS and GKE — only the ConfigMap and Secrets differ between environments.

Results

The architecture has been running in production for eight months. Key operational metrics:

✓Cross-cloud VPN uptime: 99.97% (two brief outages caused by GCP maintenance windows, automatic failover to secondary tunnel).
✓Replication lag (p99): 1.8 seconds. P50 is 350ms.
✓Egress cost: Approximately $1,200/month for replication traffic and cross-cloud API calls.
✓Deployment cadence: Same CI/CD pipeline deploys to both EKS and GKE. Engineers do not interact with cloud-specific tooling during normal development.

The team at Stripe Systems continues to maintain the infrastructure, with ongoing work to evaluate replacing the VPN tunnels with a dedicated interconnect as Indian traffic grows.

Conclusion

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

When Multi-Cloud Makes Sense — and When It Doesn't

Abstraction Layers: Terraform, Pulumi, and Crossplane

Terraform Provider Abstraction

Crossplane for Kubernetes-Native Abstraction

The Portable Services Stack

Data Residency and Compliance

DNS-Based Traffic Routing

Cross-Cloud Networking

VPN Tunnels

Dedicated Interconnects

Latency Reference

Container Portability

Database Replication Across Clouds

Option 1: PostgreSQL Logical Replication

Option 2: CockroachDB

Option 3: YugabyteDB

Recommendation

Observability Across Clouds

OpenTelemetry Collector Configuration

The Cost of Multi-Cloud

Egress Pricing (as of 2026)

Operational Cost

Decision Framework

Tier 1: Single-Cloud-First

Tier 2: Multi-Cloud-Ready

Tier 3: Active Multi-Cloud

Case Study: Fintech Data Residency on AWS and GCP

Terraform Module Structure

Network Architecture

Database Replication

Application-Level Cloud Abstraction

Results

Conclusion

Related Services from Stripe Systems

DevOps

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

Staff Augmentation — A Practical Guide for Engineering Leaders

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments