Cloud Computing📅 February 7, 2026· 18 min read

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

✍️

Stripe Systems Engineering

Why Zero-Trust: Moving Beyond the Perimeter

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there is no meaningful perimeter — workloads run across regions, projects, and managed services that share physical infrastructure with other tenants. Second, lateral movement after a single compromised credential can reach every resource on the internal network. Third, VPN-based access grants network-level trust to entire subnets rather than scoping access to individual applications.

Zero-trust inverts this model. Every request — whether it originates from an employee's laptop, a CI/CD pipeline, or a Kubernetes pod — must present a verified identity, satisfy context-aware access policies, and be authorized for the specific resource it is trying to reach. The network itself grants no implicit trust. In practical terms, this means:

✓Identity is the perimeter. Access decisions are based on who (or what) is making the request, not where the request comes from.
✓Least-privilege by default. Every principal gets the minimum permissions required, scoped to specific resources.
✓Continuous verification. Session context (device posture, location, time) is re-evaluated, not just checked at login.
✓Assume breach. Design controls so that a compromised component cannot escalate to full environment access.

On Google Cloud Platform, this translates to a concrete set of services and configurations. The rest of this post walks through each one.

GCP's Zero-Trust Building Blocks

GCP provides several services that, when combined, implement a defense-in-depth zero-trust architecture:

Layer	Service	Function
API-level data exfiltration prevention	VPC Service Controls	Service perimeters that restrict which projects and networks can access sensitive APIs
Application-level access	Identity-Aware Proxy (IAP)	Context-aware authentication/authorization for web apps and SSH without VPN
Workload identity	Workload Identity Federation	Federated tokens from external IdPs, eliminating long-lived service account keys
Network-level isolation	Private Google Access, VPC firewalls	No public IPs on VMs; API access over internal routes
Supply chain integrity	Binary Authorization	Admission control ensuring only signed, verified container images run in GKE
Edge protection	Cloud Armor	WAF rules, DDoS mitigation, and rate limiting at the global load balancer
Governance	Organization Policy constraints	Hard guardrails on resource locations, service usage, and sharing
Observability	Cloud Audit Logs, VPC Flow Logs, SCC	Complete audit trail of admin and data access events

None of these services alone constitutes zero-trust. The architecture emerges from layering them together.

VPC Service Controls: Preventing Data Exfiltration at the API Layer

VPC Service Controls (VPC-SC) create a security boundary around Google Cloud services. Even if an attacker obtains valid IAM credentials, they cannot exfiltrate data to a project outside the perimeter. This is the single most important control for sensitive data on GCP, and it operates at a layer that IAM alone cannot cover.

A service perimeter defines which projects are "inside" and which Google API services are protected. Any API call that crosses the perimeter boundary — for example, copying a BigQuery table to an external project — is denied.

Creating a Perimeter with gcloud

# Create an access policy (one per organization)
gcloud access-context-manager policies create \
  --organization=123456789012 \
  --title="org-zero-trust-policy"

# Define an access level based on corporate IP ranges and device policy
gcloud access-context-manager levels create corp-trusted-access \
  --policy=POLICY_ID \
  --title="Corporate Trusted Access" \
  --basic-level-spec=access-level-spec.yaml

# Create a service perimeter protecting BigQuery and Cloud Storage
gcloud access-context-manager perimeters create healthcare-data-perimeter \
  --policy=POLICY_ID \
  --title="Healthcare Data Perimeter" \
  --resources="projects/12345,projects/67890" \
  --restricted-services="bigquery.googleapis.com,storage.googleapis.com" \
  --access-levels="accessPolicies/POLICY_ID/accessLevels/corp-trusted-access"

The access-level-spec.yaml defines which conditions allow perimeter traversal:

# access-level-spec.yaml
- ipSubnetworks:
    - "203.0.113.0/24"    # Corporate office IP range
    - "198.51.100.0/24"   # VPN egress range
  devicePolicy:
    requireScreenlock: true
    osConstraints:
      - osType: DESKTOP_CHROME_OS
        minimumVersion: "13816.0.0"
      - osType: DESKTOP_MAC
      - osType: DESKTOP_WINDOWS
    allowedEncryptionStatuses:
      - ENCRYPTED

Terraform Configuration for VPC Service Controls

resource "google_access_context_manager_service_perimeter" "healthcare_perimeter" {
  parent = "accessPolicies/${google_access_context_manager_access_policy.org_policy.name}"
  name   = "accessPolicies/${google_access_context_manager_access_policy.org_policy.name}/servicePerimeters/healthcare_data"
  title  = "Healthcare Data Perimeter"

  status {
    resources = [
      "projects/${data.google_project.data_project.number}",
      "projects/${data.google_project.analytics_project.number}",
    ]

    restricted_services = [
      "bigquery.googleapis.com",
      "storage.googleapis.com",
      "healthcare.googleapis.com",
    ]

    access_levels = [
      google_access_context_manager_access_level.corp_trusted.name,
    ]

    # Allow CI/CD pipeline to write to storage from outside perimeter
    ingress_policies {
      ingress_from {
        identity_type = "ANY_IDENTITY"
        sources {
          access_level = google_access_context_manager_access_level.cicd_access.name
        }
      }
      ingress_to {
        resources = ["projects/${data.google_project.data_project.number}"]
        operations {
          service_name = "storage.googleapis.com"
          method_selectors {
            method = "google.storage.objects.create"
          }
        }
      }
    }

    # Allow BigQuery export only to internal project
    egress_policies {
      egress_from {
        identity_type = "ANY_IDENTITY"
      }
      egress_to {
        resources = ["projects/${data.google_project.reporting_project.number}"]
        operations {
          service_name = "bigquery.googleapis.com"
          method_selectors {
            method = "google.cloud.bigquery.v2.JobService.InsertJob"
          }
        }
      }
    }
  }
}

Key point: Ingress and egress policies are method-level. You do not need to allow all operations — scope them to the exact API methods your workloads require.

Identity-Aware Proxy: Context-Aware Access Without VPN

IAP places an authentication and authorization layer in front of your applications. Users authenticate via Google Identity (or a configured IdP), and IAP evaluates access policies that can include device posture, IP address, and user group membership. The application itself never needs to implement authentication — it receives verified identity headers from IAP.

Enabling IAP for a GKE Service

For a service running behind a GKE Ingress with a BackendConfig:

# backend-config.yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: iap-backend-config
  namespace: production
spec:
  iap:
    enabled: true
    oauthclientCredentials:
      secretName: iap-oauth-secret

Create the OAuth credentials and secret:

# Create OAuth consent screen and credentials in Cloud Console first,
# then store them as a Kubernetes secret
kubectl create secret generic iap-oauth-secret \
  --namespace=production \
  --from-literal=client_id=CLIENT_ID.apps.googleusercontent.com \
  --from-literal=client_secret=CLIENT_SECRET

# Set IAM policy to allow specific group access through IAP
gcloud iap web add-iam-policy-binding \
  --resource-type=backend-services \
  --service=admin-dashboard-backend \
  --member="group:[email protected]" \
  --role="roles/iap.httpsResourceAccessAllowed"

IAP for SSH Access (Replacing VPN for VM Access)

IAP TCP forwarding allows SSH access to VMs that have no external IP addresses:

# SSH through IAP tunnel — no VPN, no public IP needed
gcloud compute ssh my-instance \
  --zone=us-central1-a \
  --tunnel-through-iap

# Forward a port through IAP (e.g., for database access)
gcloud compute start-iap-tunnel my-database-vm 5432 \
  --local-host-port=localhost:5432 \
  --zone=us-central1-a

The corresponding firewall rule allows IAP's IP range only:

resource "google_compute_firewall" "allow_iap_ssh" {
  name    = "allow-iap-ssh"
  network = google_compute_network.main.id

  allow {
    protocol = "tcp"
    ports    = ["22"]
  }

  # IAP's IP range — the only source that can reach SSH
  source_ranges = ["35.235.240.0/20"]
  target_tags   = ["iap-ssh"]
}

Block all other SSH ingress. The VM has no public IP and the only path in is through IAP, which enforces authentication and context-aware access policies before forwarding any traffic.

Workload Identity Federation: Eliminating Service Account Keys

Service account key files are the most common credential leak vector on GCP. Workload Identity Federation replaces them entirely by allowing external identity providers (GitHub Actions OIDC, AWS IAM, Azure AD, on-prem OIDC) to exchange tokens for short-lived GCP access tokens.

The flow:

✓External workload obtains a token from its native IdP (e.g., GitHub Actions provides an OIDC token to every workflow run).
✓The token is exchanged via GCP's Security Token Service (STS) for a federated token.
✓The federated token impersonates a GCP service account.
✓The resulting access token is short-lived (1 hour by default) and cannot be exported.

Terraform Configuration for GitHub Actions Federation

resource "google_iam_workload_identity_pool" "github_pool" {
  workload_identity_pool_id = "github-actions-pool"
  display_name              = "GitHub Actions Pool"
  description               = "Identity pool for GitHub Actions OIDC"
}

resource "google_iam_workload_identity_pool_provider" "github_provider" {
  workload_identity_pool_id          = google_iam_workload_identity_pool.github_pool.workload_identity_pool_id
  workload_identity_pool_provider_id = "github-oidc-provider"
  display_name                       = "GitHub OIDC"

  attribute_mapping = {
    "google.subject"       = "assertion.sub"
    "attribute.actor"      = "assertion.actor"
    "attribute.repository" = "assertion.repository"
    "attribute.ref"        = "assertion.ref"
  }

  # Restrict to your organization's repositories
  attribute_condition = "assertion.repository_owner == 'your-github-org'"

  oidc {
    issuer_uri = "https://token.actions.githubusercontent.com"
  }
}

resource "google_service_account_iam_binding" "github_deploy_binding" {
  service_account_id = google_service_account.deploy_sa.name
  role               = "roles/iam.workloadIdentityUser"

  members = [
    "principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github_pool.name}/attribute.repository/your-github-org/your-repo",
  ]
}

In the GitHub Actions workflow:

# .github/workflows/deploy.yaml
jobs:
  deploy:
    permissions:
      contents: read
      id-token: write  # Required for OIDC token
    steps:
      - uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: "projects/PROJECT_NUM/locations/global/workloadIdentityPools/github-actions-pool/providers/github-oidc-provider"
          service_account: "[email protected]"
      - uses: google-github-actions/setup-gcloud@v2
      - run: gcloud run deploy my-service --image=...

After this configuration, delete every service account key in the project. Run this audit regularly:

# Find all user-managed service account keys
gcloud iam service-accounts list --format="value(email)" \
  --project=my-project | while read sa; do
  gcloud iam service-accounts keys list \
    --iam-account="$sa" \
    --managed-by=user \
    --format="table(name,validAfterTime,validBeforeTime)"
done

Private Google Access: No Public IPs, No Exceptions

Private Google Access allows VM instances without external IP addresses to reach Google APIs and services. Combined with VPC Service Controls, this ensures that data never traverses the public internet.

Configure DNS to resolve Google API endpoints to the private or restricted VIP ranges:

# DNS zone for restricted.googleapis.com
resource "google_dns_managed_zone" "restricted_apis" {
  name        = "restricted-googleapis"
  dns_name    = "googleapis.com."
  visibility  = "private"

  private_visibility_config {
    networks {
      network_url = google_compute_network.main.id
    }
  }
}

resource "google_dns_record_set" "restricted_api_cname" {
  name         = "*.googleapis.com."
  managed_zone = google_dns_managed_zone.restricted_apis.name
  type         = "CNAME"
  ttl          = 300
  rrdatas      = ["restricted.googleapis.com."]
}

resource "google_dns_record_set" "restricted_api_a" {
  name         = "restricted.googleapis.com."
  managed_zone = google_dns_managed_zone.restricted_apis.name
  type         = "A"
  ttl          = 300
  rrdatas = [
    "199.36.153.4",
    "199.36.153.5",
    "199.36.153.6",
    "199.36.153.7",
  ]
}

Two VIP ranges are available:

✓private.googleapis.com (199.36.153.8/30) — reaches all Google APIs, does not enforce VPC Service Controls.
✓restricted.googleapis.com (199.36.153.4/30) — reaches only APIs that support VPC Service Controls, enforces perimeter policies.

For a zero-trust deployment, always use restricted.googleapis.com. It ensures that even if a workload attempts to call a Google API that is not within the service perimeter, the request is blocked at the network layer.

Enable Private Google Access on the subnet:

gcloud compute networks subnets update my-subnet \
  --region=us-central1 \
  --enable-private-ip-google-access

Binary Authorization: Supply Chain Integrity for Containers

Binary Authorization is an admission controller for GKE that verifies container images are signed by trusted authorities before allowing them to run. This prevents deployment of tampered images, images from untrusted registries, or images that did not pass through your CI pipeline.

Setting Up an Attestor with Cloud KMS

# Create a Cloud KMS key for signing attestations
gcloud kms keyrings create build-attestors --location=global

gcloud kms keys create ci-pipeline-signer \
  --keyring=build-attestors \
  --location=global \
  --purpose=asymmetric-signing \
  --default-algorithm=ec-sign-p256-sha256

# Create a Container Analysis note (the attestor links to this)
cat > note.json << 'EOF'
{
  "attestation": {
    "hint": {
      "humanReadableName": "CI Pipeline Attestor"
    }
  }
}
EOF

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://containeranalysis.googleapis.com/v1/projects/PROJECT_ID/notes/ci-pipeline-attestor" \
  -d @note.json

# Create the attestor
gcloud container binauthz attestors create ci-pipeline-attestor \
  --attestation-authority-note=ci-pipeline-attestor \
  --attestation-authority-note-project=PROJECT_ID

# Add the KMS key to the attestor
gcloud container binauthz attestors public-keys add \
  --attestor=ci-pipeline-attestor \
  --keyversion-project=PROJECT_ID \
  --keyversion-location=global \
  --keyversion-keyring=build-attestors \
  --keyversion-key=ci-pipeline-signer \
  --keyversion=1

Binary Authorization Policy

# binauthz-policy.yaml
defaultAdmissionRule:
  evaluationMode: REQUIRE_ATTESTATION
  enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
  requireAttestationsBy:
    - projects/PROJECT_ID/attestors/ci-pipeline-attestor
globalPolicyEvaluationMode: ENABLE
clusterAdmissionRules:
  us-central1-a.production-cluster:
    evaluationMode: REQUIRE_ATTESTATION
    enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
    requireAttestationsBy:
      - projects/PROJECT_ID/attestors/ci-pipeline-attestor

Terraform for Binary Authorization

resource "google_binary_authorization_policy" "policy" {
  project = var.project_id

  global_policy_evaluation_mode = "ENABLE"

  default_admission_rule {
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.ci_pipeline.id,
    ]
  }

  cluster_admission_rules {
    cluster          = "us-central1-a.production-cluster"
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.ci_pipeline.id,
    ]
  }
}

resource "google_binary_authorization_attestor" "ci_pipeline" {
  name    = "ci-pipeline-attestor"
  project = var.project_id

  attestation_authority_note {
    note_reference = google_container_analysis_note.ci_attestor_note.name

    public_keys {
      id = data.google_kms_crypto_key_version.signer_version.id
      pkix_public_key {
        public_key_pem      = data.google_kms_crypto_key_version.signer_version.public_key[0].pem
        signature_algorithm = "ECDSA_P256_SHA256"
      }
    }
  }
}

Sign images in your CI pipeline after successful tests and security scans:

# In CI pipeline — sign the image after all checks pass
IMAGE_DIGEST=$(gcloud artifacts docker images describe \
  us-central1-docker.pkg.dev/PROJECT_ID/repo/my-app:${GIT_SHA} \
  --format="value(image_summary.digest)")

gcloud container binauthz attestations sign-and-create \
  --artifact-url="us-central1-docker.pkg.dev/PROJECT_ID/repo/my-app@${IMAGE_DIGEST}" \
  --attestor="ci-pipeline-attestor" \
  --attestor-project=PROJECT_ID \
  --keyversion-project=PROJECT_ID \
  --keyversion-location=global \
  --keyversion-keyring=build-attestors \
  --keyversion-key=ci-pipeline-signer \
  --keyversion=1

Cloud Armor: WAF and DDoS Protection at the Edge

Cloud Armor policies attach to backend services on the global HTTP(S) load balancer. They evaluate before traffic reaches your application.

Terraform Configuration

resource "google_compute_security_policy" "app_waf" {
  name = "app-waf-policy"

  # Default rule: allow
  rule {
    action   = "allow"
    priority = 2147483647
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    description = "Default allow rule"
  }

  # Block OWASP Top 10: SQL injection
  rule {
    action   = "deny(403)"
    priority = 1000
    match {
      expr {
        expression = "evaluatePreconfiguredExpr('sqli-v33-stable')"
      }
    }
    description = "Block SQL injection"
  }

  # Block OWASP Top 10: XSS
  rule {
    action   = "deny(403)"
    priority = 1001
    match {
      expr {
        expression = "evaluatePreconfiguredExpr('xss-v33-stable')"
      }
    }
    description = "Block cross-site scripting"
  }

  # Block OWASP Top 10: Local file inclusion
  rule {
    action   = "deny(403)"
    priority = 1002
    match {
      expr {
        expression = "evaluatePreconfiguredExpr('lfi-v33-stable')"
      }
    }
    description = "Block local file inclusion"
  }

  # Block OWASP Top 10: Remote code execution
  rule {
    action   = "deny(403)"
    priority = 1003
    match {
      expr {
        expression = "evaluatePreconfiguredExpr('rce-v33-stable')"
      }
    }
    description = "Block remote code execution"
  }

  # Rate limiting: max 100 requests per minute per IP
  rule {
    action   = "throttle"
    priority = 2000
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    rate_limit_options {
      rate_limit_threshold {
        count        = 100
        interval_sec = 60
      }
      conform_action = "allow"
      exceed_action  = "deny(429)"
      enforce_on_key = "IP"
    }
    description = "Rate limit per IP"
  }

  # Geo-blocking: deny traffic from embargoed regions
  rule {
    action   = "deny(403)"
    priority = 500
    match {
      expr {
        expression = "origin.region_code == 'KP' || origin.region_code == 'IR'"
      }
    }
    description = "Block embargoed regions"
  }
}

Attach the policy to a backend service:

gcloud compute backend-services update my-backend-service \
  --security-policy=app-waf-policy \
  --global

Organization Policy Constraints

Organization policies enforce guardrails that IAM cannot. They restrict what resources can be created, regardless of a principal's IAM permissions.

# Restrict resource creation to approved regions only
gcloud resource-manager org-policies set-policy \
  --organization=123456789012 policy.yaml

# policy.yaml — restrict to US and EU regions
constraint: constraints/gcp.resourceLocations
listPolicy:
  allowedValues:
    - in:us-locations
    - in:eu-locations
  deniedValues:
    - in:asia-locations

Other critical constraints:

# Disable service account key creation entirely
gcloud resource-manager org-policies enable-enforce \
  constraints/iam.disableServiceAccountKeyCreation \
  --organization=123456789012

# Require OS Login for all compute instances
gcloud resource-manager org-policies enable-enforce \
  constraints/compute.requireOsLogin \
  --organization=123456789012

# Disable serial port access
gcloud resource-manager org-policies enable-enforce \
  constraints/compute.disableSerialPortAccess \
  --organization=123456789012

# Restrict VM external IPs (deny all)
gcloud resource-manager org-policies set-policy \
  --organization=123456789012 deny-external-ip.yaml

# deny-external-ip.yaml
constraint: constraints/compute.vmExternalIpAccess
listPolicy:
  allValues: DENY

These constraints form the non-negotiable foundation. Even an Organization Admin cannot create resources that violate them without first modifying the policy — and that modification itself generates an audit log entry.

Audit Logging: Visibility Into Everything

Zero-trust requires comprehensive logging. On GCP, this means enabling Data Access audit logs (which are off by default for most services) and aggregating them for analysis.

Enable data access logs for all services:

# Enable data access audit logs for BigQuery and Cloud Storage
gcloud projects set-iam-policy my-project <(
  gcloud projects get-iam-policy my-project --format=json | \
  python3 -c "
import json, sys
policy = json.load(sys.stdin)
policy.setdefault('auditConfigs', [])
for svc in ['bigquery.googleapis.com', 'storage.googleapis.com', 'allServices']:
    policy['auditConfigs'].append({
        'service': svc,
        'auditLogConfigs': [
            {'logType': 'ADMIN_READ'},
            {'logType': 'DATA_READ'},
            {'logType': 'DATA_WRITE'},
        ]
    })
json.dump(policy, sys.stdout)
")

Query audit logs for suspicious activity:

# Find all BigQuery data access events in the last 24 hours
gcloud logging read '
  logName="projects/my-project/logs/cloudaudit.googleapis.com%2Fdata_access"
  AND resource.type="bigquery_resource"
  AND timestamp>="2025-10-07T00:00:00Z"
' --format="table(timestamp,protoPayload.authenticationInfo.principalEmail,protoPayload.methodName,protoPayload.resourceName)" \
  --limit=50

# Find VPC Service Controls violations
gcloud logging read '
  logName="projects/my-project/logs/cloudaudit.googleapis.com%2Fpolicy"
  AND protoPayload.metadata.@type="type.googleapis.com/google.cloud.audit.VpcServiceControlAuditMetadata"
' --format=json --limit=20

# Find IAM policy changes
gcloud logging read '
  logName="projects/my-project/logs/cloudaudit.googleapis.com%2Factivity"
  AND protoPayload.methodName="SetIamPolicy"
' --format="table(timestamp,protoPayload.authenticationInfo.principalEmail,protoPayload.resourceName)" \
  --limit=25

Enable VPC Flow Logs on every subnet to capture network-level traffic metadata:

resource "google_compute_subnetwork" "main" {
  name          = "main-subnet"
  ip_cidr_range = "10.0.0.0/20"
  region        = "us-central1"
  network       = google_compute_network.main.id

  log_config {
    aggregation_interval = "INTERVAL_5_SEC"
    flow_sampling        = 1.0
    metadata             = "INCLUDE_ALL_METADATA"
    filter_expr          = "true"
  }
}

Terraform Module Structure for Reproducible Zero-Trust

Rather than configuring each service ad hoc, structure your Terraform as composable modules:

modules/
├── zero-trust-foundation/
│   ├── main.tf              # Org policies, audit log sinks
│   ├── variables.tf
│   └── outputs.tf
├── vpc-service-controls/
│   ├── main.tf              # Perimeters, access levels, ingress/egress
│   ├── variables.tf
│   └── outputs.tf
├── iap/
│   ├── main.tf              # IAP configuration, firewall rules, OAuth
│   ├── variables.tf
│   └── outputs.tf
├── workload-identity/
│   ├── main.tf              # Identity pools, providers, SA bindings
│   ├── variables.tf
│   └── outputs.tf
├── binary-authorization/
│   ├── main.tf              # Attestors, policies, KMS keys
│   ├── variables.tf
│   └── outputs.tf
├── cloud-armor/
│   ├── main.tf              # Security policies, WAF rules
│   ├── variables.tf
│   └── outputs.tf
└── private-networking/
    ├── main.tf              # VPC, subnets, DNS zones, NAT
    ├── variables.tf
    └── outputs.tf

The root module composes them:

module "foundation" {
  source          = "./modules/zero-trust-foundation"
  organization_id = var.organization_id
  allowed_regions = ["us-locations", "eu-locations"]
}

module "networking" {
  source     = "./modules/private-networking"
  project_id = var.project_id
  region     = var.region
}

module "vpc_sc" {
  source             = "./modules/vpc-service-controls"
  access_policy_id   = module.foundation.access_policy_id
  protected_projects = [var.data_project_number]
  restricted_services = [
    "bigquery.googleapis.com",
    "storage.googleapis.com",
    "healthcare.googleapis.com",
  ]
}

module "iap" {
  source      = "./modules/iap"
  project_id  = var.project_id
  network_id  = module.networking.network_id
  oauth_brand = var.oauth_brand
}

module "binary_auth" {
  source            = "./modules/binary-authorization"
  project_id        = var.project_id
  attestor_kms_key  = var.attestor_kms_key
  gke_cluster_zones = var.gke_cluster_zones
}

module "cloud_armor" {
  source              = "./modules/cloud-armor"
  rate_limit_per_ip   = 100
  blocked_regions     = ["KP", "IR"]
  enable_owasp_rules  = true
}

Every module should output the resource IDs and names that downstream modules need. Version-pin your module sources in production.

Case Study: Healthcare SaaS Platform — Passing a HITRUST Audit

The Problem

A healthcare SaaS platform running on GCP needed to achieve HITRUST CSF certification. The platform processed Protected Health Information (PHI) in BigQuery and Cloud Storage, served an admin dashboard to internal staff, and ran containerized microservices on GKE. The initial audit readiness assessment flagged multiple control gaps:

✓Access Control (01.c, 01.v): Overly broad IAM roles. Several engineers had roles/bigquery.admin at the project level. No context-aware access controls.
✓Audit Logging (09.aa, 09.ad): Data access audit logs were not enabled. No mechanism to detect data exfiltration attempts.
✓Network Security (09.m): Admin dashboard was accessible via a public IP with IP-based allowlisting. VPN was the only access control for SSH.
✓System Integrity (10.h): No controls on what container images could run in production. Any image from any registry could be deployed.

The Implementation

Stripe Systems implemented a layered zero-trust architecture over a 10-week engagement. The following sections detail the exact configurations deployed.

VPC Service Controls Around PHI Data

The most critical control: a service perimeter around BigQuery datasets and Cloud Storage buckets containing PHI.

resource "google_access_context_manager_service_perimeter" "phi_perimeter" {
  parent = "accessPolicies/${var.access_policy_id}"
  name   = "accessPolicies/${var.access_policy_id}/servicePerimeters/phi_data_perimeter"
  title  = "PHI Data Perimeter"

  status {
    resources = [
      "projects/${var.phi_data_project_number}",
    ]

    restricted_services = [
      "bigquery.googleapis.com",
      "storage.googleapis.com",
    ]

    access_levels = [
      "accessPolicies/${var.access_policy_id}/accessLevels/admin_corp_access",
    ]

    vpc_accessible_services {
      enable_restriction = true
      allowed_services   = [
        "bigquery.googleapis.com",
        "storage.googleapis.com",
        "logging.googleapis.com",
      ]
    }

    ingress_policies {
      ingress_from {
        sources {
          resource = "projects/${var.gke_project_number}"
        }
        identity_type = "ANY_IDENTITY"
      }
      ingress_to {
        resources = ["*"]
        operations {
          service_name = "bigquery.googleapis.com"
          method_selectors {
            method = "google.cloud.bigquery.v2.JobService.InsertJob"
          }
          method_selectors {
            method = "google.cloud.bigquery.v2.JobService.GetQueryResults"
          }
        }
      }
    }
  }
}

This configuration means: even if an engineer with roles/bigquery.admin attempts to export a table to a personal project, the request is denied at the perimeter. The vpc_accessible_services block ensures that only explicitly listed services are reachable from within the perimeter's VPC network.

IAP for Admin Dashboard Access

The admin dashboard previously sat behind a public IP with an IP allowlist. Stripe Systems replaced this with IAP, eliminating the need for both the public IP and the corporate VPN.

# Enable IAP on the admin backend service
gcloud iap web enable \
  --resource-type=backend-services \
  --service=admin-dashboard-backend

# Grant access to the admin group only
gcloud iap web add-iam-policy-binding \
  --resource-type=backend-services \
  --service=admin-dashboard-backend \
  --member="group:[email protected]" \
  --role="roles/iap.httpsResourceAccessAllowed"

# Set up IAP access level requiring managed device
gcloud access-context-manager levels create admin-device-policy \
  --policy=POLICY_ID \
  --title="Admin Device Policy" \
  --basic-level-spec=admin-access-level.yaml

# admin-access-level.yaml
- devicePolicy:
    requireScreenlock: true
    requireAdminApproval: true
    allowedEncryptionStatuses:
      - ENCRYPTED
    allowedDeviceManagementLevels:
      - COMPLETE

Terraform for the IAP-protected backend:

resource "google_iap_web_backend_service_iam_member" "admin_access" {
  project             = var.project_id
  web_backend_service = google_compute_backend_service.admin_dashboard.name
  role                = "roles/iap.httpsResourceAccessAllowed"
  member              = "group:[email protected]"

  condition {
    title      = "require-managed-device"
    expression = "'accessPolicies/${var.access_policy_id}/accessLevels/admin-device-policy' in request.auth.access_levels"
  }
}

Binary Authorization in GKE

Only images built by the CI pipeline, from the approved Artifact Registry, and signed after passing security scans, are allowed to run:

# binary-authorization-policy.yaml
defaultAdmissionRule:
  evaluationMode: REQUIRE_ATTESTATION
  enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
  requireAttestationsBy:
    - projects/healthcare-saas-prod/attestors/ci-pipeline-attestor
    - projects/healthcare-saas-prod/attestors/security-scan-attestor
globalPolicyEvaluationMode: ENABLE
clusterAdmissionRules:
  us-central1-a.phi-processing-cluster:
    evaluationMode: REQUIRE_ATTESTATION
    enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
    requireAttestationsBy:
      - projects/healthcare-saas-prod/attestors/ci-pipeline-attestor
      - projects/healthcare-saas-prod/attestors/security-scan-attestor

Two attestors are required: one from the CI build step (proving the image was built from the approved repository) and one from the security scanning step (proving the image passed vulnerability scanning). An image must have both attestations to be admitted.

resource "google_binary_authorization_policy" "phi_cluster_policy" {
  project = var.project_id

  global_policy_evaluation_mode = "ENABLE"

  default_admission_rule {
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.ci_pipeline.id,
      google_binary_authorization_attestor.security_scan.id,
    ]
  }

  cluster_admission_rules {
    cluster          = "${var.region}-a.phi-processing-cluster"
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.ci_pipeline.id,
      google_binary_authorization_attestor.security_scan.id,
    ]
  }
}

Network Architecture (Zero-Trust Boundaries)

The resulting architecture has three distinct trust boundaries:

┌─────────────────────────────────────────────────────────────────┐
│                    GCP Organization                             │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Org Policies: No SA keys, No external IPs, US/EU only   │  │
│  │                                                           │  │
│  │  ┌─────────────────────────────────────────────────────┐  │  │
│  │  │     VPC Service Controls Perimeter (PHI Data)       │  │  │
│  │  │                                                     │  │  │
│  │  │  ┌──────────────┐   ┌────────────────────────────┐  │  │  │
│  │  │  │  BigQuery     │   │  Cloud Storage (PHI)       │  │  │  │
│  │  │  │  (PHI data)   │   │  CMEK-encrypted buckets    │  │  │  │
│  │  │  └──────────────┘   └────────────────────────────┘  │  │  │
│  │  │       ▲ ingress only from GKE project               │  │  │
│  │  └───────┼─────────────────────────────────────────────┘  │  │
│  │          │                                                │  │
│  │  ┌───────┼──────────────────────────────────────────┐     │  │
│  │  │  GKE Project (Private Cluster)                   │     │  │
│  │  │  ┌────┴──────┐  ┌──────────────┐                 │     │  │
│  │  │  │ App Pods   │  │ Admin Dash   │◄── IAP ◄── Users│    │  │
│  │  │  │ (Binary    │  │ (IAP-        │   (Identity +   │    │  │
│  │  │  │  Auth'd)   │  │  protected)  │    Device check)│    │  │
│  │  │  └───────────┘  └──────────────┘                 │     │  │
│  │  │       │                                          │     │  │
│  │  │       ▼ restricted.googleapis.com (private)      │     │  │
│  │  │  Cloud Armor WAF ◄── Internet traffic            │     │  │
│  │  └──────────────────────────────────────────────────┘     │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

✓Outer boundary: Organization policies enforce hard constraints (no service account keys, no external IPs, restricted regions).
✓Middle boundary: VPC Service Controls prevent data from leaving the PHI project, regardless of IAM permissions.
✓Inner boundary: IAP authenticates every user request to the admin dashboard. Binary Authorization validates every container before admission. Cloud Armor filters malicious traffic at the edge.

Audit Results: Before and After

HITRUST Control	Before	After
01.c — Privilege Management	Broad project-level IAM roles; no context-aware access	Least-privilege roles; IAP with device posture and group-based access
01.v — Information Access Restriction	IP allowlists only; no API-level exfiltration controls	VPC Service Controls perimeter blocks cross-project data movement
09.aa — Audit Logging	Admin activity logs only (default); no data access logs	Full data access logging on all services; log sinks to locked CMEK-encrypted bucket
09.ab — Monitoring System Use	Manual log review	Automated alerts on VPC-SC violations, IAM changes, and Binary Auth denials
09.ad — Administrator and Operator Logs	No separation of admin audit trail	Cloud Audit Logs with immutable admin activity records; 400-day retention
09.m — Network Controls	Flat VPC; public IPs on admin servers; VPN for SSH	Private cluster; no public IPs; IAP TCP forwarding for SSH; VPC Flow Logs
10.h — Control of Operational Software	Any Docker image could be deployed	Binary Authorization with dual attestor (CI build + security scan)

The HITRUST audit was completed with zero critical findings related to GCP infrastructure controls. Three informational findings were logged (documentation completeness items) and resolved within the remediation period.

Key Takeaways

Zero-trust on GCP is not a product you purchase — it is an architecture you build by layering identity-based access, API-level perimeters, supply chain verification, and comprehensive logging. The concrete steps:

✓Start with VPC Service Controls around your most sensitive data. This is the highest-impact control and the hardest to retrofit later.
✓Enable IAP for all internal tools. Remove VPN dependencies for application access.
✓Eliminate service account keys with Workload Identity Federation. Set the iam.disableServiceAccountKeyCreation org policy.
✓Enforce Binary Authorization in GKE before deploying to production.
✓Enable data access audit logs on all services. The cost is marginal compared to the visibility gained.
✓Deploy Cloud Armor with OWASP rules on every external-facing load balancer.
✓Set organization policies as hard guardrails that cannot be overridden by individual project owners.

Each layer compensates for failures in other layers. That is the point of defense in depth: no single control needs to be perfect because the attacker must bypass all of them.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting co...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

A default Docker image built from `node:18` or `python:3.11` ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary p...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather tha...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pric...

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers me...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public inter...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

Cloud Computing📅 February 7, 2026· 18 min read

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

✍️

Stripe Systems Engineering

Why Zero-Trust: Moving Beyond the Perimeter

✓Identity is the perimeter. Access decisions are based on who (or what) is making the request, not where the request comes from.
✓Least-privilege by default. Every principal gets the minimum permissions required, scoped to specific resources.
✓Continuous verification. Session context (device posture, location, time) is re-evaluated, not just checked at login.
✓Assume breach. Design controls so that a compromised component cannot escalate to full environment access.

On Google Cloud Platform, this translates to a concrete set of services and configurations. The rest of this post walks through each one.

GCP's Zero-Trust Building Blocks

GCP provides several services that, when combined, implement a defense-in-depth zero-trust architecture:

Layer	Service	Function
API-level data exfiltration prevention	VPC Service Controls	Service perimeters that restrict which projects and networks can access sensitive APIs
Application-level access	Identity-Aware Proxy (IAP)	Context-aware authentication/authorization for web apps and SSH without VPN
Workload identity	Workload Identity Federation	Federated tokens from external IdPs, eliminating long-lived service account keys
Network-level isolation	Private Google Access, VPC firewalls	No public IPs on VMs; API access over internal routes
Supply chain integrity	Binary Authorization	Admission control ensuring only signed, verified container images run in GKE
Edge protection	Cloud Armor	WAF rules, DDoS mitigation, and rate limiting at the global load balancer
Governance	Organization Policy constraints	Hard guardrails on resource locations, service usage, and sharing
Observability	Cloud Audit Logs, VPC Flow Logs, SCC	Complete audit trail of admin and data access events

None of these services alone constitutes zero-trust. The architecture emerges from layering them together.

VPC Service Controls: Preventing Data Exfiltration at the API Layer

Creating a Perimeter with gcloud

# Create an access policy (one per organization)
gcloud access-context-manager policies create \
  --organization=123456789012 \
  --title="org-zero-trust-policy"

# Define an access level based on corporate IP ranges and device policy
gcloud access-context-manager levels create corp-trusted-access \
  --policy=POLICY_ID \
  --title="Corporate Trusted Access" \
  --basic-level-spec=access-level-spec.yaml

# Create a service perimeter protecting BigQuery and Cloud Storage
gcloud access-context-manager perimeters create healthcare-data-perimeter \
  --policy=POLICY_ID \
  --title="Healthcare Data Perimeter" \
  --resources="projects/12345,projects/67890" \
  --restricted-services="bigquery.googleapis.com,storage.googleapis.com" \
  --access-levels="accessPolicies/POLICY_ID/accessLevels/corp-trusted-access"

The access-level-spec.yaml defines which conditions allow perimeter traversal:

# access-level-spec.yaml
- ipSubnetworks:
    - "203.0.113.0/24"    # Corporate office IP range
    - "198.51.100.0/24"   # VPN egress range
  devicePolicy:
    requireScreenlock: true
    osConstraints:
      - osType: DESKTOP_CHROME_OS
        minimumVersion: "13816.0.0"
      - osType: DESKTOP_MAC
      - osType: DESKTOP_WINDOWS
    allowedEncryptionStatuses:
      - ENCRYPTED

Terraform Configuration for VPC Service Controls

resource "google_access_context_manager_service_perimeter" "healthcare_perimeter" {
  parent = "accessPolicies/${google_access_context_manager_access_policy.org_policy.name}"
  name   = "accessPolicies/${google_access_context_manager_access_policy.org_policy.name}/servicePerimeters/healthcare_data"
  title  = "Healthcare Data Perimeter"

  status {
    resources = [
      "projects/${data.google_project.data_project.number}",
      "projects/${data.google_project.analytics_project.number}",
    ]

    restricted_services = [
      "bigquery.googleapis.com",
      "storage.googleapis.com",
      "healthcare.googleapis.com",
    ]

    access_levels = [
      google_access_context_manager_access_level.corp_trusted.name,
    ]

    # Allow CI/CD pipeline to write to storage from outside perimeter
    ingress_policies {
      ingress_from {
        identity_type = "ANY_IDENTITY"
        sources {
          access_level = google_access_context_manager_access_level.cicd_access.name
        }
      }
      ingress_to {
        resources = ["projects/${data.google_project.data_project.number}"]
        operations {
          service_name = "storage.googleapis.com"
          method_selectors {
            method = "google.storage.objects.create"
          }
        }
      }
    }

    # Allow BigQuery export only to internal project
    egress_policies {
      egress_from {
        identity_type = "ANY_IDENTITY"
      }
      egress_to {
        resources = ["projects/${data.google_project.reporting_project.number}"]
        operations {
          service_name = "bigquery.googleapis.com"
          method_selectors {
            method = "google.cloud.bigquery.v2.JobService.InsertJob"
          }
        }
      }
    }
  }
}

Key point: Ingress and egress policies are method-level. You do not need to allow all operations — scope them to the exact API methods your workloads require.

Identity-Aware Proxy: Context-Aware Access Without VPN

Enabling IAP for a GKE Service

For a service running behind a GKE Ingress with a BackendConfig:

# backend-config.yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: iap-backend-config
  namespace: production
spec:
  iap:
    enabled: true
    oauthclientCredentials:
      secretName: iap-oauth-secret

Create the OAuth credentials and secret:

# Create OAuth consent screen and credentials in Cloud Console first,
# then store them as a Kubernetes secret
kubectl create secret generic iap-oauth-secret \
  --namespace=production \
  --from-literal=client_id=CLIENT_ID.apps.googleusercontent.com \
  --from-literal=client_secret=CLIENT_SECRET

# Set IAM policy to allow specific group access through IAP
gcloud iap web add-iam-policy-binding \
  --resource-type=backend-services \
  --service=admin-dashboard-backend \
  --member="group:[email protected]" \
  --role="roles/iap.httpsResourceAccessAllowed"

IAP for SSH Access (Replacing VPN for VM Access)

IAP TCP forwarding allows SSH access to VMs that have no external IP addresses:

# SSH through IAP tunnel — no VPN, no public IP needed
gcloud compute ssh my-instance \
  --zone=us-central1-a \
  --tunnel-through-iap

# Forward a port through IAP (e.g., for database access)
gcloud compute start-iap-tunnel my-database-vm 5432 \
  --local-host-port=localhost:5432 \
  --zone=us-central1-a

The corresponding firewall rule allows IAP's IP range only:

resource "google_compute_firewall" "allow_iap_ssh" {
  name    = "allow-iap-ssh"
  network = google_compute_network.main.id

  allow {
    protocol = "tcp"
    ports    = ["22"]
  }

  # IAP's IP range — the only source that can reach SSH
  source_ranges = ["35.235.240.0/20"]
  target_tags   = ["iap-ssh"]
}

Block all other SSH ingress. The VM has no public IP and the only path in is through IAP, which enforces authentication and context-aware access policies before forwarding any traffic.

Workload Identity Federation: Eliminating Service Account Keys

The flow:

✓External workload obtains a token from its native IdP (e.g., GitHub Actions provides an OIDC token to every workflow run).
✓The token is exchanged via GCP's Security Token Service (STS) for a federated token.
✓The federated token impersonates a GCP service account.
✓The resulting access token is short-lived (1 hour by default) and cannot be exported.

Terraform Configuration for GitHub Actions Federation

resource "google_iam_workload_identity_pool" "github_pool" {
  workload_identity_pool_id = "github-actions-pool"
  display_name              = "GitHub Actions Pool"
  description               = "Identity pool for GitHub Actions OIDC"
}

resource "google_iam_workload_identity_pool_provider" "github_provider" {
  workload_identity_pool_id          = google_iam_workload_identity_pool.github_pool.workload_identity_pool_id
  workload_identity_pool_provider_id = "github-oidc-provider"
  display_name                       = "GitHub OIDC"

  attribute_mapping = {
    "google.subject"       = "assertion.sub"
    "attribute.actor"      = "assertion.actor"
    "attribute.repository" = "assertion.repository"
    "attribute.ref"        = "assertion.ref"
  }

  # Restrict to your organization's repositories
  attribute_condition = "assertion.repository_owner == 'your-github-org'"

  oidc {
    issuer_uri = "https://token.actions.githubusercontent.com"
  }
}

resource "google_service_account_iam_binding" "github_deploy_binding" {
  service_account_id = google_service_account.deploy_sa.name
  role               = "roles/iam.workloadIdentityUser"

  members = [
    "principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github_pool.name}/attribute.repository/your-github-org/your-repo",
  ]
}

In the GitHub Actions workflow:

# .github/workflows/deploy.yaml
jobs:
  deploy:
    permissions:
      contents: read
      id-token: write  # Required for OIDC token
    steps:
      - uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: "projects/PROJECT_NUM/locations/global/workloadIdentityPools/github-actions-pool/providers/github-oidc-provider"
          service_account: "[email protected]"
      - uses: google-github-actions/setup-gcloud@v2
      - run: gcloud run deploy my-service --image=...

After this configuration, delete every service account key in the project. Run this audit regularly:

# Find all user-managed service account keys
gcloud iam service-accounts list --format="value(email)" \
  --project=my-project | while read sa; do
  gcloud iam service-accounts keys list \
    --iam-account="$sa" \
    --managed-by=user \
    --format="table(name,validAfterTime,validBeforeTime)"
done

Private Google Access: No Public IPs, No Exceptions

Configure DNS to resolve Google API endpoints to the private or restricted VIP ranges:

# DNS zone for restricted.googleapis.com
resource "google_dns_managed_zone" "restricted_apis" {
  name        = "restricted-googleapis"
  dns_name    = "googleapis.com."
  visibility  = "private"

  private_visibility_config {
    networks {
      network_url = google_compute_network.main.id
    }
  }
}

resource "google_dns_record_set" "restricted_api_cname" {
  name         = "*.googleapis.com."
  managed_zone = google_dns_managed_zone.restricted_apis.name
  type         = "CNAME"
  ttl          = 300
  rrdatas      = ["restricted.googleapis.com."]
}

resource "google_dns_record_set" "restricted_api_a" {
  name         = "restricted.googleapis.com."
  managed_zone = google_dns_managed_zone.restricted_apis.name
  type         = "A"
  ttl          = 300
  rrdatas = [
    "199.36.153.4",
    "199.36.153.5",
    "199.36.153.6",
    "199.36.153.7",
  ]
}

Two VIP ranges are available:

✓private.googleapis.com (199.36.153.8/30) — reaches all Google APIs, does not enforce VPC Service Controls.
✓restricted.googleapis.com (199.36.153.4/30) — reaches only APIs that support VPC Service Controls, enforces perimeter policies.

Enable Private Google Access on the subnet:

gcloud compute networks subnets update my-subnet \
  --region=us-central1 \
  --enable-private-ip-google-access

Binary Authorization: Supply Chain Integrity for Containers

Setting Up an Attestor with Cloud KMS

# Create a Cloud KMS key for signing attestations
gcloud kms keyrings create build-attestors --location=global

gcloud kms keys create ci-pipeline-signer \
  --keyring=build-attestors \
  --location=global \
  --purpose=asymmetric-signing \
  --default-algorithm=ec-sign-p256-sha256

# Create a Container Analysis note (the attestor links to this)
cat > note.json << 'EOF'
{
  "attestation": {
    "hint": {
      "humanReadableName": "CI Pipeline Attestor"
    }
  }
}
EOF

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://containeranalysis.googleapis.com/v1/projects/PROJECT_ID/notes/ci-pipeline-attestor" \
  -d @note.json

# Create the attestor
gcloud container binauthz attestors create ci-pipeline-attestor \
  --attestation-authority-note=ci-pipeline-attestor \
  --attestation-authority-note-project=PROJECT_ID

# Add the KMS key to the attestor
gcloud container binauthz attestors public-keys add \
  --attestor=ci-pipeline-attestor \
  --keyversion-project=PROJECT_ID \
  --keyversion-location=global \
  --keyversion-keyring=build-attestors \
  --keyversion-key=ci-pipeline-signer \
  --keyversion=1

Binary Authorization Policy

# binauthz-policy.yaml
defaultAdmissionRule:
  evaluationMode: REQUIRE_ATTESTATION
  enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
  requireAttestationsBy:
    - projects/PROJECT_ID/attestors/ci-pipeline-attestor
globalPolicyEvaluationMode: ENABLE
clusterAdmissionRules:
  us-central1-a.production-cluster:
    evaluationMode: REQUIRE_ATTESTATION
    enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
    requireAttestationsBy:
      - projects/PROJECT_ID/attestors/ci-pipeline-attestor

Terraform for Binary Authorization

resource "google_binary_authorization_policy" "policy" {
  project = var.project_id

  global_policy_evaluation_mode = "ENABLE"

  default_admission_rule {
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.ci_pipeline.id,
    ]
  }

  cluster_admission_rules {
    cluster          = "us-central1-a.production-cluster"
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.ci_pipeline.id,
    ]
  }
}

resource "google_binary_authorization_attestor" "ci_pipeline" {
  name    = "ci-pipeline-attestor"
  project = var.project_id

  attestation_authority_note {
    note_reference = google_container_analysis_note.ci_attestor_note.name

    public_keys {
      id = data.google_kms_crypto_key_version.signer_version.id
      pkix_public_key {
        public_key_pem      = data.google_kms_crypto_key_version.signer_version.public_key[0].pem
        signature_algorithm = "ECDSA_P256_SHA256"
      }
    }
  }
}

Sign images in your CI pipeline after successful tests and security scans:

# In CI pipeline — sign the image after all checks pass
IMAGE_DIGEST=$(gcloud artifacts docker images describe \
  us-central1-docker.pkg.dev/PROJECT_ID/repo/my-app:${GIT_SHA} \
  --format="value(image_summary.digest)")

gcloud container binauthz attestations sign-and-create \
  --artifact-url="us-central1-docker.pkg.dev/PROJECT_ID/repo/my-app@${IMAGE_DIGEST}" \
  --attestor="ci-pipeline-attestor" \
  --attestor-project=PROJECT_ID \
  --keyversion-project=PROJECT_ID \
  --keyversion-location=global \
  --keyversion-keyring=build-attestors \
  --keyversion-key=ci-pipeline-signer \
  --keyversion=1

Cloud Armor: WAF and DDoS Protection at the Edge

Cloud Armor policies attach to backend services on the global HTTP(S) load balancer. They evaluate before traffic reaches your application.

Terraform Configuration

resource "google_compute_security_policy" "app_waf" {
  name = "app-waf-policy"

  # Default rule: allow
  rule {
    action   = "allow"
    priority = 2147483647
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    description = "Default allow rule"
  }

  # Block OWASP Top 10: SQL injection
  rule {
    action   = "deny(403)"
    priority = 1000
    match {
      expr {
        expression = "evaluatePreconfiguredExpr('sqli-v33-stable')"
      }
    }
    description = "Block SQL injection"
  }

  # Block OWASP Top 10: XSS
  rule {
    action   = "deny(403)"
    priority = 1001
    match {
      expr {
        expression = "evaluatePreconfiguredExpr('xss-v33-stable')"
      }
    }
    description = "Block cross-site scripting"
  }

  # Block OWASP Top 10: Local file inclusion
  rule {
    action   = "deny(403)"
    priority = 1002
    match {
      expr {
        expression = "evaluatePreconfiguredExpr('lfi-v33-stable')"
      }
    }
    description = "Block local file inclusion"
  }

  # Block OWASP Top 10: Remote code execution
  rule {
    action   = "deny(403)"
    priority = 1003
    match {
      expr {
        expression = "evaluatePreconfiguredExpr('rce-v33-stable')"
      }
    }
    description = "Block remote code execution"
  }

  # Rate limiting: max 100 requests per minute per IP
  rule {
    action   = "throttle"
    priority = 2000
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    rate_limit_options {
      rate_limit_threshold {
        count        = 100
        interval_sec = 60
      }
      conform_action = "allow"
      exceed_action  = "deny(429)"
      enforce_on_key = "IP"
    }
    description = "Rate limit per IP"
  }

  # Geo-blocking: deny traffic from embargoed regions
  rule {
    action   = "deny(403)"
    priority = 500
    match {
      expr {
        expression = "origin.region_code == 'KP' || origin.region_code == 'IR'"
      }
    }
    description = "Block embargoed regions"
  }
}

Attach the policy to a backend service:

gcloud compute backend-services update my-backend-service \
  --security-policy=app-waf-policy \
  --global

Organization Policy Constraints

Organization policies enforce guardrails that IAM cannot. They restrict what resources can be created, regardless of a principal's IAM permissions.

# Restrict resource creation to approved regions only
gcloud resource-manager org-policies set-policy \
  --organization=123456789012 policy.yaml

# policy.yaml — restrict to US and EU regions
constraint: constraints/gcp.resourceLocations
listPolicy:
  allowedValues:
    - in:us-locations
    - in:eu-locations
  deniedValues:
    - in:asia-locations

Other critical constraints:

# Disable service account key creation entirely
gcloud resource-manager org-policies enable-enforce \
  constraints/iam.disableServiceAccountKeyCreation \
  --organization=123456789012

# Require OS Login for all compute instances
gcloud resource-manager org-policies enable-enforce \
  constraints/compute.requireOsLogin \
  --organization=123456789012

# Disable serial port access
gcloud resource-manager org-policies enable-enforce \
  constraints/compute.disableSerialPortAccess \
  --organization=123456789012

# Restrict VM external IPs (deny all)
gcloud resource-manager org-policies set-policy \
  --organization=123456789012 deny-external-ip.yaml

# deny-external-ip.yaml
constraint: constraints/compute.vmExternalIpAccess
listPolicy:
  allValues: DENY

Audit Logging: Visibility Into Everything

Zero-trust requires comprehensive logging. On GCP, this means enabling Data Access audit logs (which are off by default for most services) and aggregating them for analysis.

Enable data access logs for all services:

# Enable data access audit logs for BigQuery and Cloud Storage
gcloud projects set-iam-policy my-project <(
  gcloud projects get-iam-policy my-project --format=json | \
  python3 -c "
import json, sys
policy = json.load(sys.stdin)
policy.setdefault('auditConfigs', [])
for svc in ['bigquery.googleapis.com', 'storage.googleapis.com', 'allServices']:
    policy['auditConfigs'].append({
        'service': svc,
        'auditLogConfigs': [
            {'logType': 'ADMIN_READ'},
            {'logType': 'DATA_READ'},
            {'logType': 'DATA_WRITE'},
        ]
    })
json.dump(policy, sys.stdout)
")

Query audit logs for suspicious activity:

# Find all BigQuery data access events in the last 24 hours
gcloud logging read '
  logName="projects/my-project/logs/cloudaudit.googleapis.com%2Fdata_access"
  AND resource.type="bigquery_resource"
  AND timestamp>="2025-10-07T00:00:00Z"
' --format="table(timestamp,protoPayload.authenticationInfo.principalEmail,protoPayload.methodName,protoPayload.resourceName)" \
  --limit=50

# Find VPC Service Controls violations
gcloud logging read '
  logName="projects/my-project/logs/cloudaudit.googleapis.com%2Fpolicy"
  AND protoPayload.metadata.@type="type.googleapis.com/google.cloud.audit.VpcServiceControlAuditMetadata"
' --format=json --limit=20

# Find IAM policy changes
gcloud logging read '
  logName="projects/my-project/logs/cloudaudit.googleapis.com%2Factivity"
  AND protoPayload.methodName="SetIamPolicy"
' --format="table(timestamp,protoPayload.authenticationInfo.principalEmail,protoPayload.resourceName)" \
  --limit=25

Enable VPC Flow Logs on every subnet to capture network-level traffic metadata:

resource "google_compute_subnetwork" "main" {
  name          = "main-subnet"
  ip_cidr_range = "10.0.0.0/20"
  region        = "us-central1"
  network       = google_compute_network.main.id

  log_config {
    aggregation_interval = "INTERVAL_5_SEC"
    flow_sampling        = 1.0
    metadata             = "INCLUDE_ALL_METADATA"
    filter_expr          = "true"
  }
}

Terraform Module Structure for Reproducible Zero-Trust

Rather than configuring each service ad hoc, structure your Terraform as composable modules:

modules/
├── zero-trust-foundation/
│   ├── main.tf              # Org policies, audit log sinks
│   ├── variables.tf
│   └── outputs.tf
├── vpc-service-controls/
│   ├── main.tf              # Perimeters, access levels, ingress/egress
│   ├── variables.tf
│   └── outputs.tf
├── iap/
│   ├── main.tf              # IAP configuration, firewall rules, OAuth
│   ├── variables.tf
│   └── outputs.tf
├── workload-identity/
│   ├── main.tf              # Identity pools, providers, SA bindings
│   ├── variables.tf
│   └── outputs.tf
├── binary-authorization/
│   ├── main.tf              # Attestors, policies, KMS keys
│   ├── variables.tf
│   └── outputs.tf
├── cloud-armor/
│   ├── main.tf              # Security policies, WAF rules
│   ├── variables.tf
│   └── outputs.tf
└── private-networking/
    ├── main.tf              # VPC, subnets, DNS zones, NAT
    ├── variables.tf
    └── outputs.tf

The root module composes them:

module "foundation" {
  source          = "./modules/zero-trust-foundation"
  organization_id = var.organization_id
  allowed_regions = ["us-locations", "eu-locations"]
}

module "networking" {
  source     = "./modules/private-networking"
  project_id = var.project_id
  region     = var.region
}

module "vpc_sc" {
  source             = "./modules/vpc-service-controls"
  access_policy_id   = module.foundation.access_policy_id
  protected_projects = [var.data_project_number]
  restricted_services = [
    "bigquery.googleapis.com",
    "storage.googleapis.com",
    "healthcare.googleapis.com",
  ]
}

module "iap" {
  source      = "./modules/iap"
  project_id  = var.project_id
  network_id  = module.networking.network_id
  oauth_brand = var.oauth_brand
}

module "binary_auth" {
  source            = "./modules/binary-authorization"
  project_id        = var.project_id
  attestor_kms_key  = var.attestor_kms_key
  gke_cluster_zones = var.gke_cluster_zones
}

module "cloud_armor" {
  source              = "./modules/cloud-armor"
  rate_limit_per_ip   = 100
  blocked_regions     = ["KP", "IR"]
  enable_owasp_rules  = true
}

Every module should output the resource IDs and names that downstream modules need. Version-pin your module sources in production.

Case Study: Healthcare SaaS Platform — Passing a HITRUST Audit

The Problem

✓Access Control (01.c, 01.v): Overly broad IAM roles. Several engineers had roles/bigquery.admin at the project level. No context-aware access controls.
✓Audit Logging (09.aa, 09.ad): Data access audit logs were not enabled. No mechanism to detect data exfiltration attempts.
✓Network Security (09.m): Admin dashboard was accessible via a public IP with IP-based allowlisting. VPN was the only access control for SSH.
✓System Integrity (10.h): No controls on what container images could run in production. Any image from any registry could be deployed.

The Implementation

Stripe Systems implemented a layered zero-trust architecture over a 10-week engagement. The following sections detail the exact configurations deployed.

VPC Service Controls Around PHI Data

The most critical control: a service perimeter around BigQuery datasets and Cloud Storage buckets containing PHI.

resource "google_access_context_manager_service_perimeter" "phi_perimeter" {
  parent = "accessPolicies/${var.access_policy_id}"
  name   = "accessPolicies/${var.access_policy_id}/servicePerimeters/phi_data_perimeter"
  title  = "PHI Data Perimeter"

  status {
    resources = [
      "projects/${var.phi_data_project_number}",
    ]

    restricted_services = [
      "bigquery.googleapis.com",
      "storage.googleapis.com",
    ]

    access_levels = [
      "accessPolicies/${var.access_policy_id}/accessLevels/admin_corp_access",
    ]

    vpc_accessible_services {
      enable_restriction = true
      allowed_services   = [
        "bigquery.googleapis.com",
        "storage.googleapis.com",
        "logging.googleapis.com",
      ]
    }

    ingress_policies {
      ingress_from {
        sources {
          resource = "projects/${var.gke_project_number}"
        }
        identity_type = "ANY_IDENTITY"
      }
      ingress_to {
        resources = ["*"]
        operations {
          service_name = "bigquery.googleapis.com"
          method_selectors {
            method = "google.cloud.bigquery.v2.JobService.InsertJob"
          }
          method_selectors {
            method = "google.cloud.bigquery.v2.JobService.GetQueryResults"
          }
        }
      }
    }
  }
}

IAP for Admin Dashboard Access

The admin dashboard previously sat behind a public IP with an IP allowlist. Stripe Systems replaced this with IAP, eliminating the need for both the public IP and the corporate VPN.

# Enable IAP on the admin backend service
gcloud iap web enable \
  --resource-type=backend-services \
  --service=admin-dashboard-backend

# Grant access to the admin group only
gcloud iap web add-iam-policy-binding \
  --resource-type=backend-services \
  --service=admin-dashboard-backend \
  --member="group:[email protected]" \
  --role="roles/iap.httpsResourceAccessAllowed"

# Set up IAP access level requiring managed device
gcloud access-context-manager levels create admin-device-policy \
  --policy=POLICY_ID \
  --title="Admin Device Policy" \
  --basic-level-spec=admin-access-level.yaml

# admin-access-level.yaml
- devicePolicy:
    requireScreenlock: true
    requireAdminApproval: true
    allowedEncryptionStatuses:
      - ENCRYPTED
    allowedDeviceManagementLevels:
      - COMPLETE

Terraform for the IAP-protected backend:

resource "google_iap_web_backend_service_iam_member" "admin_access" {
  project             = var.project_id
  web_backend_service = google_compute_backend_service.admin_dashboard.name
  role                = "roles/iap.httpsResourceAccessAllowed"
  member              = "group:[email protected]"

  condition {
    title      = "require-managed-device"
    expression = "'accessPolicies/${var.access_policy_id}/accessLevels/admin-device-policy' in request.auth.access_levels"
  }
}

Binary Authorization in GKE

Only images built by the CI pipeline, from the approved Artifact Registry, and signed after passing security scans, are allowed to run:

# binary-authorization-policy.yaml
defaultAdmissionRule:
  evaluationMode: REQUIRE_ATTESTATION
  enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
  requireAttestationsBy:
    - projects/healthcare-saas-prod/attestors/ci-pipeline-attestor
    - projects/healthcare-saas-prod/attestors/security-scan-attestor
globalPolicyEvaluationMode: ENABLE
clusterAdmissionRules:
  us-central1-a.phi-processing-cluster:
    evaluationMode: REQUIRE_ATTESTATION
    enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
    requireAttestationsBy:
      - projects/healthcare-saas-prod/attestors/ci-pipeline-attestor
      - projects/healthcare-saas-prod/attestors/security-scan-attestor

resource "google_binary_authorization_policy" "phi_cluster_policy" {
  project = var.project_id

  global_policy_evaluation_mode = "ENABLE"

  default_admission_rule {
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.ci_pipeline.id,
      google_binary_authorization_attestor.security_scan.id,
    ]
  }

  cluster_admission_rules {
    cluster          = "${var.region}-a.phi-processing-cluster"
    evaluation_mode  = "REQUIRE_ATTESTATION"
    enforcement_mode = "ENFORCED_BLOCK_AND_AUDIT_LOG"
    require_attestations_by = [
      google_binary_authorization_attestor.ci_pipeline.id,
      google_binary_authorization_attestor.security_scan.id,
    ]
  }
}

Network Architecture (Zero-Trust Boundaries)

The resulting architecture has three distinct trust boundaries:

┌─────────────────────────────────────────────────────────────────┐
│                    GCP Organization                             │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Org Policies: No SA keys, No external IPs, US/EU only   │  │
│  │                                                           │  │
│  │  ┌─────────────────────────────────────────────────────┐  │  │
│  │  │     VPC Service Controls Perimeter (PHI Data)       │  │  │
│  │  │                                                     │  │  │
│  │  │  ┌──────────────┐   ┌────────────────────────────┐  │  │  │
│  │  │  │  BigQuery     │   │  Cloud Storage (PHI)       │  │  │  │
│  │  │  │  (PHI data)   │   │  CMEK-encrypted buckets    │  │  │  │
│  │  │  └──────────────┘   └────────────────────────────┘  │  │  │
│  │  │       ▲ ingress only from GKE project               │  │  │
│  │  └───────┼─────────────────────────────────────────────┘  │  │
│  │          │                                                │  │
│  │  ┌───────┼──────────────────────────────────────────┐     │  │
│  │  │  GKE Project (Private Cluster)                   │     │  │
│  │  │  ┌────┴──────┐  ┌──────────────┐                 │     │  │
│  │  │  │ App Pods   │  │ Admin Dash   │◄── IAP ◄── Users│    │  │
│  │  │  │ (Binary    │  │ (IAP-        │   (Identity +   │    │  │
│  │  │  │  Auth'd)   │  │  protected)  │    Device check)│    │  │
│  │  │  └───────────┘  └──────────────┘                 │     │  │
│  │  │       │                                          │     │  │
│  │  │       ▼ restricted.googleapis.com (private)      │     │  │
│  │  │  Cloud Armor WAF ◄── Internet traffic            │     │  │
│  │  └──────────────────────────────────────────────────┘     │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

✓Outer boundary: Organization policies enforce hard constraints (no service account keys, no external IPs, restricted regions).
✓Middle boundary: VPC Service Controls prevent data from leaving the PHI project, regardless of IAM permissions.
✓Inner boundary: IAP authenticates every user request to the admin dashboard. Binary Authorization validates every container before admission. Cloud Armor filters malicious traffic at the edge.

Audit Results: Before and After

HITRUST Control	Before	After
01.c — Privilege Management	Broad project-level IAM roles; no context-aware access	Least-privilege roles; IAP with device posture and group-based access
01.v — Information Access Restriction	IP allowlists only; no API-level exfiltration controls	VPC Service Controls perimeter blocks cross-project data movement
09.aa — Audit Logging	Admin activity logs only (default); no data access logs	Full data access logging on all services; log sinks to locked CMEK-encrypted bucket
09.ab — Monitoring System Use	Manual log review	Automated alerts on VPC-SC violations, IAM changes, and Binary Auth denials
09.ad — Administrator and Operator Logs	No separation of admin audit trail	Cloud Audit Logs with immutable admin activity records; 400-day retention
09.m — Network Controls	Flat VPC; public IPs on admin servers; VPN for SSH	Private cluster; no public IPs; IAP TCP forwarding for SSH; VPC Flow Logs
10.h — Control of Operational Software	Any Docker image could be deployed	Binary Authorization with dual attestor (CI build + security scan)

Key Takeaways

✓Start with VPC Service Controls around your most sensitive data. This is the highest-impact control and the hardest to retrofit later.
✓Enable IAP for all internal tools. Remove VPN dependencies for application access.
✓Eliminate service account keys with Workload Identity Federation. Set the iam.disableServiceAccountKeyCreation org policy.
✓Enforce Binary Authorization in GKE before deploying to production.
✓Enable data access audit logs on all services. The cost is marginal compared to the visibility gained.
✓Deploy Cloud Armor with OWASP rules on every external-facing load balancer.
✓Set organization policies as hard guardrails that cannot be overridden by individual project owners.

Each layer compensates for failures in other layers. That is the point of defense in depth: no single control needs to be perfect because the attacker must bypass all of them.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Why Zero-Trust: Moving Beyond the Perimeter

GCP's Zero-Trust Building Blocks

VPC Service Controls: Preventing Data Exfiltration at the API Layer

Creating a Perimeter with gcloud

Terraform Configuration for VPC Service Controls

Identity-Aware Proxy: Context-Aware Access Without VPN

Enabling IAP for a GKE Service

IAP for SSH Access (Replacing VPN for VM Access)

Workload Identity Federation: Eliminating Service Account Keys

Terraform Configuration for GitHub Actions Federation

Private Google Access: No Public IPs, No Exceptions

Binary Authorization: Supply Chain Integrity for Containers

Setting Up an Attestor with Cloud KMS

Binary Authorization Policy

Terraform for Binary Authorization

Cloud Armor: WAF and DDoS Protection at the Edge

Terraform Configuration

Organization Policy Constraints

Audit Logging: Visibility Into Everything

Terraform Module Structure for Reproducible Zero-Trust

Case Study: Healthcare SaaS Platform — Passing a HITRUST Audit

The Problem

The Implementation

VPC Service Controls Around PHI Data

IAP for Admin Dashboard Access

Binary Authorization in GKE

Network Architecture (Zero-Trust Boundaries)

Audit Results: Before and After

Key Takeaways

Related Services from Stripe Systems

DevOps

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

Staff Augmentation — A Practical Guide for Engineering Leaders

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Why Custom Software Development Matters for Growing Businesses

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe