DevOps📅 March 7, 2026· 12 min read

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

✍️

Stripe Systems Engineering

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified dependency management — but it makes CI/CD significantly harder. A naive approach rebuilds and redeploys everything on every commit. This post covers how to build a CI/CD pipeline that only builds what changed, caches aggressively, and deploys safely to Kubernetes.

Why Monorepo, and the CI/CD Challenges It Creates

A monorepo works well when:

✓Multiple services share libraries (e.g., a common auth module, shared protobuf definitions)
✓Teams need to make atomic changes across service boundaries (e.g., updating an API contract and its consumer in one commit)
✓You want unified tooling (one ESLint config, one Dockerfile template, one CI pipeline)

The CI/CD challenges:

✓Build scope: Which services need rebuilding when libs/auth/index.ts changes? Every service that imports it.
✓Test scope: Which integration tests should run? Only those covering affected services.
✓Pipeline time: Without affected detection, a 12-service monorepo runs all 12 builds on every commit.
✓Caching: Docker layer caches are invalidated differently per service. npm/pip caches are shared but can conflict.
✓Deployment ordering: If Service A depends on Service B's new API, deploy B first.

Repository Structure

monorepo/
├── services/
│   ├── payment-api/
│   │   ├── src/
│   │   ├── tests/
│   │   ├── Dockerfile
│   │   ├── package.json
│   │   └── helm/
│   │       ├── Chart.yaml
│   │       ├── values.yaml
│   │       ├── values-dev.yaml
│   │       ├── values-staging.yaml
│   │       └── values-prod.yaml
│   ├── user-api/
│   ├── notification-service/
│   ├── order-api/
│   ├── inventory-api/
│   ├── search-service/
│   ├── analytics-worker/
│   └── gateway/
├── libs/
│   ├── auth/              # Shared authentication library
│   │   ├── src/
│   │   ├── package.json
│   │   └── tsconfig.json
│   ├── database/           # Shared database utilities
│   └── logging/            # Structured logging library
├── frontend/
│   ├── src/
│   ├── Dockerfile
│   └── package.json
├── .github/
│   └── workflows/
│       ├── ci.yaml
│       └── deploy.yaml
├── scripts/
│   └── affected.sh
├── package.json            # Root workspace config
└── turbo.json              # Turborepo configuration

Affected Detection

The core of monorepo CI/CD is determining which services changed. This requires understanding the dependency graph.

Dependency Map

Define which services depend on which libraries:

{
  "payment-api": ["libs/auth", "libs/database", "libs/logging"],
  "user-api": ["libs/auth", "libs/database", "libs/logging"],
  "notification-service": ["libs/logging"],
  "order-api": ["libs/auth", "libs/database", "libs/logging"],
  "inventory-api": ["libs/database", "libs/logging"],
  "search-service": ["libs/logging"],
  "analytics-worker": ["libs/database", "libs/logging"],
  "gateway": ["libs/auth", "libs/logging"],
  "frontend": []
}

Affected Detection Script

#!/bin/bash
# scripts/affected.sh
# Determines which services are affected by changes in the current commit

set -euo pipefail

BASE_REF=${1:-"origin/main"}
HEAD_REF=${2:-"HEAD"}

# Get list of changed files
CHANGED_FILES=$(git diff --name-only "$BASE_REF"..."$HEAD_REF")

# Dependency map (service -> space-separated dependency paths)
declare -A DEPS
DEPS[payment-api]="libs/auth libs/database libs/logging"
DEPS[user-api]="libs/auth libs/database libs/logging"
DEPS[notification-service]="libs/logging"
DEPS[order-api]="libs/auth libs/database libs/logging"
DEPS[inventory-api]="libs/database libs/logging"
DEPS[search-service]="libs/logging"
DEPS[analytics-worker]="libs/database libs/logging"
DEPS[gateway]="libs/auth libs/logging"
DEPS[frontend]=""

AFFECTED=()

for SERVICE in "${!DEPS[@]}"; do
  SERVICE_AFFECTED=false

  # Check if files in the service directory changed
  if echo "$CHANGED_FILES" | grep -q "^services/${SERVICE}/"; then
    SERVICE_AFFECTED=true
  fi

  # Check if the frontend directory changed
  if [[ "$SERVICE" == "frontend" ]] && echo "$CHANGED_FILES" | grep -q "^frontend/"; then
    SERVICE_AFFECTED=true
  fi

  # Check if any dependency changed
  for DEP in ${DEPS[$SERVICE]}; do
    if echo "$CHANGED_FILES" | grep -q "^${DEP}/"; then
      SERVICE_AFFECTED=true
      break
    fi
  done

  if $SERVICE_AFFECTED; then
    AFFECTED+=("$SERVICE")
  fi
done

# Check if CI config itself changed (rebuild everything)
if echo "$CHANGED_FILES" | grep -q "^\.github/workflows/\|^scripts/"; then
  echo '["payment-api","user-api","notification-service","order-api","inventory-api","search-service","analytics-worker","gateway","frontend"]'
  exit 0
fi

# Output as JSON array for GitHub Actions matrix
printf '%s\n' "${AFFECTED[@]}" | jq -R . | jq -s -c .

Usage:

chmod +x scripts/affected.sh
./scripts/affected.sh origin/main HEAD
# Output: ["payment-api","user-api","gateway"]

GitHub Actions Workflow

CI Pipeline

# .github/workflows/ci.yaml
name: CI Pipeline

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

permissions:
  contents: read
  packages: write
  pull-requests: write

env:
  REGISTRY: ghcr.io
  IMAGE_PREFIX: ghcr.io/${{ github.repository_owner }}

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      affected: ${{ steps.affected.outputs.services }}
      has_changes: ${{ steps.affected.outputs.has_changes }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for diff

      - name: Detect affected services
        id: affected
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            AFFECTED=$(./scripts/affected.sh origin/${{ github.base_ref }} HEAD)
          else
            AFFECTED=$(./scripts/affected.sh HEAD~1 HEAD)
          fi
          echo "services=${AFFECTED}" >> $GITHUB_OUTPUT
          if [ "$AFFECTED" = "[]" ]; then
            echo "has_changes=false" >> $GITHUB_OUTPUT
          else
            echo "has_changes=true" >> $GITHUB_OUTPUT
          fi
          echo "Affected services: ${AFFECTED}"

  lint-and-typecheck:
    needs: detect-changes
    if: needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci

      - name: Lint affected services
        run: npx turbo lint --filter='...[${{ github.event.pull_request.base.sha || 'HEAD~1' }}]'

      - name: Type check affected services
        run: npx turbo typecheck --filter='...[${{ github.event.pull_request.base.sha || 'HEAD~1' }}]'

  test:
    needs: detect-changes
    if: needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci

      - name: Run unit tests for ${{ matrix.service }}
        run: |
          if [ "${{ matrix.service }}" = "frontend" ]; then
            cd frontend && npm test -- --coverage
          else
            cd services/${{ matrix.service }} && npm test -- --coverage
          fi

      - name: Upload coverage
        uses: actions/upload-artifact@v4
        with:
          name: coverage-${{ matrix.service }}
          path: |
            services/${{ matrix.service }}/coverage/
            frontend/coverage/
          retention-days: 7

  build-and-push:
    needs: [detect-changes, test, lint-and-typecheck]
    if: github.ref == 'refs/heads/main' && needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Generate image metadata
        id: meta
        run: |
          SHA_SHORT=$(echo "${{ github.sha }}" | cut -c1-7)
          echo "sha_tag=${SHA_SHORT}" >> $GITHUB_OUTPUT
          echo "full_image=${{ env.IMAGE_PREFIX }}/${{ matrix.service }}" >> $GITHUB_OUTPUT

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          file: ${{ matrix.service == 'frontend' && 'frontend/Dockerfile' || format('services/{0}/Dockerfile', matrix.service) }}
          push: true
          tags: |
            ${{ steps.meta.outputs.full_image }}:${{ steps.meta.outputs.sha_tag }}
            ${{ steps.meta.outputs.full_image }}:latest
          cache-from: type=gha,scope=${{ matrix.service }}
          cache-to: type=gha,scope=${{ matrix.service }},mode=max
          build-args: |
            SERVICE_NAME=${{ matrix.service }}

  security-scan:
    needs: [detect-changes, build-and-push]
    if: github.ref == 'refs/heads/main' && needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    steps:
      - name: Run Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.IMAGE_PREFIX }}/${{ matrix.service }}:latest
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'

      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: 'trivy-results.sarif'

  generate-sbom:
    needs: [detect-changes, build-and-push]
    if: github.ref == 'refs/heads/main' && needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    strategy:
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    steps:
      - name: Generate SBOM with Syft
        uses: anchore/sbom-action@v0
        with:
          image: ${{ env.IMAGE_PREFIX }}/${{ matrix.service }}:latest
          format: spdx-json
          output-file: sbom-${{ matrix.service }}.spdx.json

      - name: Upload SBOM
        uses: actions/upload-artifact@v4
        with:
          name: sbom-${{ matrix.service }}
          path: sbom-${{ matrix.service }}.spdx.json

Key Design Decisions

fetch-depth: 0: Required for git diff between branches. Without full history, affected detection fails.

fail-fast: false: If payment-api tests fail, user-api tests should still run. Each service is independent.

GHA cache for Docker: cache-from: type=gha uses GitHub Actions cache for Docker layers. Each service gets its own cache scope, preventing cross-service cache pollution.

Trivy with exit-code 1: The pipeline fails on critical/high CVEs. This is a gate — images with known critical vulnerabilities do not reach production.

Multi-Stage Dockerfiles

A well-structured Dockerfile minimizes image size and build time:

# services/payment-api/Dockerfile

# Stage 1: Install dependencies (cached aggressively)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
COPY services/payment-api/package.json ./services/payment-api/
COPY libs/auth/package.json ./libs/auth/
COPY libs/database/package.json ./libs/database/
COPY libs/logging/package.json ./libs/logging/
RUN npm ci --workspace=services/payment-api --include-workspace-root

# Stage 2: Build (depends on source code changes)
FROM deps AS builder
COPY tsconfig.json ./
COPY libs/ ./libs/
COPY services/payment-api/ ./services/payment-api/
RUN npm run build --workspace=services/payment-api

# Stage 3: Production image (minimal runtime)
FROM gcr.io/distroless/nodejs20-debian12 AS runtime
WORKDIR /app
COPY --from=builder /app/services/payment-api/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/services/payment-api/node_modules ./services/payment-api/node_modules

EXPOSE 3000
USER nonroot:nonroot
CMD ["dist/index.js"]

Layer ordering matters: package.json files are copied before source code. When only source code changes, npm install is cached. The dependency layer (often 200MB+) is rebuilt only when package.json changes.

Distroless runtime: The final image contains only Node.js and the application. No shell, no package manager, no debugging tools. This reduces the attack surface dramatically — Trivy scans typically show 0 critical CVEs for distroless images.

Docker Image Tagging Strategy

# Tags generated per build:
# 1. Git SHA (short) — immutable, traceable to exact commit
ghcr.io/org/payment-api:abc123f

# 2. latest — mutable, points to most recent build
ghcr.io/org/payment-api:latest

# 3. Semantic version (for releases) — set via git tag
ghcr.io/org/payment-api:v2.3.1

Never deploy latest to production. Use the Git SHA tag for deterministic deployments. The latest tag is useful for development environments that should always run the newest build.

Integration Testing in a Monorepo

Unit tests run per-service, but integration tests often span service boundaries. The challenge: when user-api changes, should integration tests for order-api (which calls user-api) also run?

Reverse Dependency Graph

The affected detection script identifies forward dependencies (which services use a changed library). Integration tests require the reverse: which services depend on the changed service.

# Reverse dependency map — if service X changes, test these consumers
declare -A REVERSE_DEPS
REVERSE_DEPS[user-api]="order-api gateway"
REVERSE_DEPS[payment-api]="order-api"
REVERSE_DEPS[inventory-api]="order-api search-service"

When user-api changes, run integration tests for order-api and gateway in addition to user-api's own tests. This catches breaking changes in service contracts before they reach production.

Test Execution Strategy

  integration-test:
    needs: [detect-changes, test]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16-alpine
        env:
          POSTGRES_DB: test
          POSTGRES_PASSWORD: test
        ports:
          - 5432:5432
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - name: Run integration tests
        env:
          DATABASE_URL: postgres://postgres:test@localhost:5432/test
          REDIS_URL: redis://localhost:6379
        run: |
          AFFECTED='${{ needs.detect-changes.outputs.affected }}'
          for SERVICE in $(echo "$AFFECTED" | jq -r '.[]'); do
            echo "Running integration tests for ${SERVICE}"
            npm run test:integration --workspace="services/${SERVICE}" || exit 1
          done

GitHub Actions service containers spin up Postgres and Redis for the integration test job. Each service's integration tests run against these shared dependencies, validating database queries, cache interactions, and inter-service API calls.

Helm Charts Per Service

Each service has its own Helm chart with per-environment values:

# services/payment-api/helm/values.yaml (defaults)
replicaCount: 1
image:
  repository: ghcr.io/org/payment-api
  tag: "latest"
  pullPolicy: IfNotPresent

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

service:
  type: ClusterIP
  port: 3000

env:
  NODE_ENV: production
  LOG_LEVEL: info

ingress:
  enabled: false

# services/payment-api/helm/values-prod.yaml
replicaCount: 3

image:
  tag: ""  # Set by CI/CD pipeline

resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: "1"
    memory: "1Gi"

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: payment-api.example.com
      paths:
        - path: /
          pathType: Prefix

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

Deployment with ArgoCD Image Updater

Rather than committing image tags back to Git (which creates noise), use ArgoCD Image Updater to watch the container registry and update deployments automatically:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payment-api
  namespace: argocd
  annotations:
    argocd-image-updater.argoproj.io/image-list: >-
      payment=ghcr.io/org/payment-api
    argocd-image-updater.argoproj.io/payment.update-strategy: newest-build
    argocd-image-updater.argoproj.io/payment.allow-tags: regexp:^[a-f0-9]{7}$
    argocd-image-updater.argoproj.io/write-back-method: git
    argocd-image-updater.argoproj.io/write-back-target: kustomization
spec:
  source:
    repoURL: https://github.com/org/monorepo.git
    path: services/payment-api/helm
    helm:
      valueFiles:
        - values.yaml
        - values-prod.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: payment

The allow-tags regex ensures only Git SHA tags (7 hex characters) are considered — not latest, not arbitrary strings.

Pipeline Performance Optimization

Self-Hosted Runners with Spot Instances

GitHub-hosted runners are convenient but slow for Docker builds (no persistent cache, cold Docker daemon). Self-hosted runners on spot instances provide:

✓Persistent Docker layer cache on instance storage
✓Pre-pulled base images
✓Faster network to private container registries

# Runner deployment on Kubernetes using actions-runner-controller
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: monorepo-runners
spec:
  replicas: 4
  template:
    spec:
      repository: org/monorepo
      labels:
        - self-hosted
        - linux
        - monorepo
      dockerEnabled: true
      resources:
        limits:
          cpu: "4"
          memory: "8Gi"
      volumeMounts:
        - name: docker-cache
          mountPath: /var/lib/docker
      volumes:
        - name: docker-cache
          hostPath:
            path: /mnt/docker-cache
            type: DirectoryOrCreate

Caching Strategy Summary

Cache Type	Mechanism	Scope	Invalidation
npm dependencies	`actions/cache`	Per-lockfile hash	When package-lock.json changes
TypeScript compilation	Turborepo remote cache	Per-file hash	When source files change
Docker layers	`type=gha` BuildKit cache	Per-service	When Dockerfile or COPY sources change
Base images	Self-hosted runner cache	Persistent	Manual pull of new versions

Case Study: 8-Service Monorepo Transformation

A monorepo containing 8 Node.js microservices, 3 shared libraries (auth, database, logging), and a React frontend was running CI/CD on GitHub Actions with a single workflow that built everything on every commit.

Before

✓Pipeline structure: Single job, sequential builds of all 9 artifacts
✓Average pipeline time: 45 minutes
✓Docker builds: No layer caching, node:18 base image (950MB), npm install including devDependencies in runtime image
✓Testing: All tests run on every commit, including slow integration tests
✓Security scanning: None
✓Deployment: Manual kubectl apply by the lead developer

Changes Implemented by Stripe Systems

1. Affected detection: The bash script above was implemented, with the dependency map maintained alongside package.json workspace configuration. Average commits affected 1-2 services.

2. Matrix builds: Affected services build in parallel. With 4 runners, 4 services build simultaneously.

3. Docker optimization:

✓Multi-stage builds (deps → build → runtime)
✓Distroless base images (950MB → 89MB per image)
✓BuildKit GHA caching (rebuilds only changed layers)
✓Layer ordering (package.json before source code)

4. Test segmentation:

✓Unit tests: Run on affected services only (matrix strategy)
✓Integration tests: Run only when the service or its dependencies change
✓E2E tests: Run on main branch pushes only (not on PRs for speed)

5. Security gates: Trivy scan on every image, SBOM generation with Syft, secret scanning with gitleaks.

6. ArgoCD deployment: Image Updater watches GHCR and deploys new images automatically. No manual intervention.

Pipeline Timing Results

Scenario	Before	After	Reduction
Single service change	45 min	6 min	87%
Shared library change (all affected)	45 min	18 min	60%
Frontend-only change	45 min	8 min	82%
CI config change (full rebuild)	45 min	22 min	51%
Weighted average	45 min	8 min	82%

GitHub Actions Workflow (Simplified Final Version)

The final CI workflow:

name: CI
on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  detect:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.affected.outputs.services }}
      any: ${{ steps.affected.outputs.has_changes }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - id: affected
        run: |
          SERVICES=$(./scripts/affected.sh origin/main HEAD)
          echo "services=${SERVICES}" >> $GITHUB_OUTPUT
          [ "$SERVICES" != "[]" ] && echo "has_changes=true" >> $GITHUB_OUTPUT || echo "has_changes=false" >> $GITHUB_OUTPUT

  ci:
    needs: detect
    if: needs.detect.outputs.any == 'true'
    runs-on: self-hosted
    strategy:
      fail-fast: false
      matrix:
        service: ${{ fromJson(needs.detect.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm test --workspace=${{ matrix.service == 'frontend' && 'frontend' || format('services/{0}', matrix.service) }}
      - uses: docker/setup-buildx-action@v3
      - uses: docker/build-push-action@v5
        if: github.ref == 'refs/heads/main'
        with:
          context: .
          file: ${{ matrix.service == 'frontend' && 'frontend/Dockerfile' || format('services/{0}/Dockerfile', matrix.service) }}
          push: true
          tags: ghcr.io/org/${{ matrix.service }}:${{ github.sha }}
          cache-from: type=gha,scope=${{ matrix.service }}
          cache-to: type=gha,scope=${{ matrix.service }},mode=max

The Helm values structure allowed environment-specific overrides without duplicating entire chart configurations. Combined with ArgoCD Image Updater, the pipeline achieved continuous deployment: a merge to main triggered the build, the image was pushed, ArgoCD detected the new tag, and the deployment rolled out — all without human intervention.

The total effort to implement these changes was approximately 3 weeks, including migration of all Dockerfiles, CI workflows, and Helm charts. The ongoing maintenance burden is low because the affected detection script and dependency map are the only monorepo-specific components — everything else uses standard tooling.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting co...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

A default Docker image built from `node:18` or `python:3.11` ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary p...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather tha...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pric...

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers me...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public inter...

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there ...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

DevOps📅 March 7, 2026· 12 min read

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

✍️

Stripe Systems Engineering

Why Monorepo, and the CI/CD Challenges It Creates

A monorepo works well when:

✓Multiple services share libraries (e.g., a common auth module, shared protobuf definitions)
✓Teams need to make atomic changes across service boundaries (e.g., updating an API contract and its consumer in one commit)
✓You want unified tooling (one ESLint config, one Dockerfile template, one CI pipeline)

The CI/CD challenges:

✓Build scope: Which services need rebuilding when libs/auth/index.ts changes? Every service that imports it.
✓Test scope: Which integration tests should run? Only those covering affected services.
✓Pipeline time: Without affected detection, a 12-service monorepo runs all 12 builds on every commit.
✓Caching: Docker layer caches are invalidated differently per service. npm/pip caches are shared but can conflict.
✓Deployment ordering: If Service A depends on Service B's new API, deploy B first.

Repository Structure

monorepo/
├── services/
│   ├── payment-api/
│   │   ├── src/
│   │   ├── tests/
│   │   ├── Dockerfile
│   │   ├── package.json
│   │   └── helm/
│   │       ├── Chart.yaml
│   │       ├── values.yaml
│   │       ├── values-dev.yaml
│   │       ├── values-staging.yaml
│   │       └── values-prod.yaml
│   ├── user-api/
│   ├── notification-service/
│   ├── order-api/
│   ├── inventory-api/
│   ├── search-service/
│   ├── analytics-worker/
│   └── gateway/
├── libs/
│   ├── auth/              # Shared authentication library
│   │   ├── src/
│   │   ├── package.json
│   │   └── tsconfig.json
│   ├── database/           # Shared database utilities
│   └── logging/            # Structured logging library
├── frontend/
│   ├── src/
│   ├── Dockerfile
│   └── package.json
├── .github/
│   └── workflows/
│       ├── ci.yaml
│       └── deploy.yaml
├── scripts/
│   └── affected.sh
├── package.json            # Root workspace config
└── turbo.json              # Turborepo configuration

Affected Detection

The core of monorepo CI/CD is determining which services changed. This requires understanding the dependency graph.

Dependency Map

Define which services depend on which libraries:

{
  "payment-api": ["libs/auth", "libs/database", "libs/logging"],
  "user-api": ["libs/auth", "libs/database", "libs/logging"],
  "notification-service": ["libs/logging"],
  "order-api": ["libs/auth", "libs/database", "libs/logging"],
  "inventory-api": ["libs/database", "libs/logging"],
  "search-service": ["libs/logging"],
  "analytics-worker": ["libs/database", "libs/logging"],
  "gateway": ["libs/auth", "libs/logging"],
  "frontend": []
}

Affected Detection Script

#!/bin/bash
# scripts/affected.sh
# Determines which services are affected by changes in the current commit

set -euo pipefail

BASE_REF=${1:-"origin/main"}
HEAD_REF=${2:-"HEAD"}

# Get list of changed files
CHANGED_FILES=$(git diff --name-only "$BASE_REF"..."$HEAD_REF")

# Dependency map (service -> space-separated dependency paths)
declare -A DEPS
DEPS[payment-api]="libs/auth libs/database libs/logging"
DEPS[user-api]="libs/auth libs/database libs/logging"
DEPS[notification-service]="libs/logging"
DEPS[order-api]="libs/auth libs/database libs/logging"
DEPS[inventory-api]="libs/database libs/logging"
DEPS[search-service]="libs/logging"
DEPS[analytics-worker]="libs/database libs/logging"
DEPS[gateway]="libs/auth libs/logging"
DEPS[frontend]=""

AFFECTED=()

for SERVICE in "${!DEPS[@]}"; do
  SERVICE_AFFECTED=false

  # Check if files in the service directory changed
  if echo "$CHANGED_FILES" | grep -q "^services/${SERVICE}/"; then
    SERVICE_AFFECTED=true
  fi

  # Check if the frontend directory changed
  if [[ "$SERVICE" == "frontend" ]] && echo "$CHANGED_FILES" | grep -q "^frontend/"; then
    SERVICE_AFFECTED=true
  fi

  # Check if any dependency changed
  for DEP in ${DEPS[$SERVICE]}; do
    if echo "$CHANGED_FILES" | grep -q "^${DEP}/"; then
      SERVICE_AFFECTED=true
      break
    fi
  done

  if $SERVICE_AFFECTED; then
    AFFECTED+=("$SERVICE")
  fi
done

# Check if CI config itself changed (rebuild everything)
if echo "$CHANGED_FILES" | grep -q "^\.github/workflows/\|^scripts/"; then
  echo '["payment-api","user-api","notification-service","order-api","inventory-api","search-service","analytics-worker","gateway","frontend"]'
  exit 0
fi

# Output as JSON array for GitHub Actions matrix
printf '%s\n' "${AFFECTED[@]}" | jq -R . | jq -s -c .

Usage:

chmod +x scripts/affected.sh
./scripts/affected.sh origin/main HEAD
# Output: ["payment-api","user-api","gateway"]

GitHub Actions Workflow

CI Pipeline

# .github/workflows/ci.yaml
name: CI Pipeline

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

permissions:
  contents: read
  packages: write
  pull-requests: write

env:
  REGISTRY: ghcr.io
  IMAGE_PREFIX: ghcr.io/${{ github.repository_owner }}

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      affected: ${{ steps.affected.outputs.services }}
      has_changes: ${{ steps.affected.outputs.has_changes }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for diff

      - name: Detect affected services
        id: affected
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            AFFECTED=$(./scripts/affected.sh origin/${{ github.base_ref }} HEAD)
          else
            AFFECTED=$(./scripts/affected.sh HEAD~1 HEAD)
          fi
          echo "services=${AFFECTED}" >> $GITHUB_OUTPUT
          if [ "$AFFECTED" = "[]" ]; then
            echo "has_changes=false" >> $GITHUB_OUTPUT
          else
            echo "has_changes=true" >> $GITHUB_OUTPUT
          fi
          echo "Affected services: ${AFFECTED}"

  lint-and-typecheck:
    needs: detect-changes
    if: needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci

      - name: Lint affected services
        run: npx turbo lint --filter='...[${{ github.event.pull_request.base.sha || 'HEAD~1' }}]'

      - name: Type check affected services
        run: npx turbo typecheck --filter='...[${{ github.event.pull_request.base.sha || 'HEAD~1' }}]'

  test:
    needs: detect-changes
    if: needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci

      - name: Run unit tests for ${{ matrix.service }}
        run: |
          if [ "${{ matrix.service }}" = "frontend" ]; then
            cd frontend && npm test -- --coverage
          else
            cd services/${{ matrix.service }} && npm test -- --coverage
          fi

      - name: Upload coverage
        uses: actions/upload-artifact@v4
        with:
          name: coverage-${{ matrix.service }}
          path: |
            services/${{ matrix.service }}/coverage/
            frontend/coverage/
          retention-days: 7

  build-and-push:
    needs: [detect-changes, test, lint-and-typecheck]
    if: github.ref == 'refs/heads/main' && needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Generate image metadata
        id: meta
        run: |
          SHA_SHORT=$(echo "${{ github.sha }}" | cut -c1-7)
          echo "sha_tag=${SHA_SHORT}" >> $GITHUB_OUTPUT
          echo "full_image=${{ env.IMAGE_PREFIX }}/${{ matrix.service }}" >> $GITHUB_OUTPUT

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          file: ${{ matrix.service == 'frontend' && 'frontend/Dockerfile' || format('services/{0}/Dockerfile', matrix.service) }}
          push: true
          tags: |
            ${{ steps.meta.outputs.full_image }}:${{ steps.meta.outputs.sha_tag }}
            ${{ steps.meta.outputs.full_image }}:latest
          cache-from: type=gha,scope=${{ matrix.service }}
          cache-to: type=gha,scope=${{ matrix.service }},mode=max
          build-args: |
            SERVICE_NAME=${{ matrix.service }}

  security-scan:
    needs: [detect-changes, build-and-push]
    if: github.ref == 'refs/heads/main' && needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    steps:
      - name: Run Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.IMAGE_PREFIX }}/${{ matrix.service }}:latest
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'

      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: 'trivy-results.sarif'

  generate-sbom:
    needs: [detect-changes, build-and-push]
    if: github.ref == 'refs/heads/main' && needs.detect-changes.outputs.has_changes == 'true'
    runs-on: ubuntu-latest
    strategy:
      matrix:
        service: ${{ fromJson(needs.detect-changes.outputs.affected) }}
    steps:
      - name: Generate SBOM with Syft
        uses: anchore/sbom-action@v0
        with:
          image: ${{ env.IMAGE_PREFIX }}/${{ matrix.service }}:latest
          format: spdx-json
          output-file: sbom-${{ matrix.service }}.spdx.json

      - name: Upload SBOM
        uses: actions/upload-artifact@v4
        with:
          name: sbom-${{ matrix.service }}
          path: sbom-${{ matrix.service }}.spdx.json

Key Design Decisions

fetch-depth: 0: Required for git diff between branches. Without full history, affected detection fails.

fail-fast: false: If payment-api tests fail, user-api tests should still run. Each service is independent.

GHA cache for Docker: cache-from: type=gha uses GitHub Actions cache for Docker layers. Each service gets its own cache scope, preventing cross-service cache pollution.

Trivy with exit-code 1: The pipeline fails on critical/high CVEs. This is a gate — images with known critical vulnerabilities do not reach production.

Multi-Stage Dockerfiles

A well-structured Dockerfile minimizes image size and build time:

# services/payment-api/Dockerfile

# Stage 1: Install dependencies (cached aggressively)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
COPY services/payment-api/package.json ./services/payment-api/
COPY libs/auth/package.json ./libs/auth/
COPY libs/database/package.json ./libs/database/
COPY libs/logging/package.json ./libs/logging/
RUN npm ci --workspace=services/payment-api --include-workspace-root

# Stage 2: Build (depends on source code changes)
FROM deps AS builder
COPY tsconfig.json ./
COPY libs/ ./libs/
COPY services/payment-api/ ./services/payment-api/
RUN npm run build --workspace=services/payment-api

# Stage 3: Production image (minimal runtime)
FROM gcr.io/distroless/nodejs20-debian12 AS runtime
WORKDIR /app
COPY --from=builder /app/services/payment-api/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/services/payment-api/node_modules ./services/payment-api/node_modules

EXPOSE 3000
USER nonroot:nonroot
CMD ["dist/index.js"]

Docker Image Tagging Strategy

# Tags generated per build:
# 1. Git SHA (short) — immutable, traceable to exact commit
ghcr.io/org/payment-api:abc123f

# 2. latest — mutable, points to most recent build
ghcr.io/org/payment-api:latest

# 3. Semantic version (for releases) — set via git tag
ghcr.io/org/payment-api:v2.3.1

Never deploy latest to production. Use the Git SHA tag for deterministic deployments. The latest tag is useful for development environments that should always run the newest build.

Integration Testing in a Monorepo

Unit tests run per-service, but integration tests often span service boundaries. The challenge: when user-api changes, should integration tests for order-api (which calls user-api) also run?

Reverse Dependency Graph

The affected detection script identifies forward dependencies (which services use a changed library). Integration tests require the reverse: which services depend on the changed service.

# Reverse dependency map — if service X changes, test these consumers
declare -A REVERSE_DEPS
REVERSE_DEPS[user-api]="order-api gateway"
REVERSE_DEPS[payment-api]="order-api"
REVERSE_DEPS[inventory-api]="order-api search-service"

When user-api changes, run integration tests for order-api and gateway in addition to user-api's own tests. This catches breaking changes in service contracts before they reach production.

Test Execution Strategy

  integration-test:
    needs: [detect-changes, test]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16-alpine
        env:
          POSTGRES_DB: test
          POSTGRES_PASSWORD: test
        ports:
          - 5432:5432
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - name: Run integration tests
        env:
          DATABASE_URL: postgres://postgres:test@localhost:5432/test
          REDIS_URL: redis://localhost:6379
        run: |
          AFFECTED='${{ needs.detect-changes.outputs.affected }}'
          for SERVICE in $(echo "$AFFECTED" | jq -r '.[]'); do
            echo "Running integration tests for ${SERVICE}"
            npm run test:integration --workspace="services/${SERVICE}" || exit 1
          done

Helm Charts Per Service

Each service has its own Helm chart with per-environment values:

# services/payment-api/helm/values.yaml (defaults)
replicaCount: 1
image:
  repository: ghcr.io/org/payment-api
  tag: "latest"
  pullPolicy: IfNotPresent

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

service:
  type: ClusterIP
  port: 3000

env:
  NODE_ENV: production
  LOG_LEVEL: info

ingress:
  enabled: false

# services/payment-api/helm/values-prod.yaml
replicaCount: 3

image:
  tag: ""  # Set by CI/CD pipeline

resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: "1"
    memory: "1Gi"

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: payment-api.example.com
      paths:
        - path: /
          pathType: Prefix

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

Deployment with ArgoCD Image Updater

Rather than committing image tags back to Git (which creates noise), use ArgoCD Image Updater to watch the container registry and update deployments automatically:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payment-api
  namespace: argocd
  annotations:
    argocd-image-updater.argoproj.io/image-list: >-
      payment=ghcr.io/org/payment-api
    argocd-image-updater.argoproj.io/payment.update-strategy: newest-build
    argocd-image-updater.argoproj.io/payment.allow-tags: regexp:^[a-f0-9]{7}$
    argocd-image-updater.argoproj.io/write-back-method: git
    argocd-image-updater.argoproj.io/write-back-target: kustomization
spec:
  source:
    repoURL: https://github.com/org/monorepo.git
    path: services/payment-api/helm
    helm:
      valueFiles:
        - values.yaml
        - values-prod.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: payment

The allow-tags regex ensures only Git SHA tags (7 hex characters) are considered — not latest, not arbitrary strings.

Pipeline Performance Optimization

Self-Hosted Runners with Spot Instances

GitHub-hosted runners are convenient but slow for Docker builds (no persistent cache, cold Docker daemon). Self-hosted runners on spot instances provide:

✓Persistent Docker layer cache on instance storage
✓Pre-pulled base images
✓Faster network to private container registries

# Runner deployment on Kubernetes using actions-runner-controller
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: monorepo-runners
spec:
  replicas: 4
  template:
    spec:
      repository: org/monorepo
      labels:
        - self-hosted
        - linux
        - monorepo
      dockerEnabled: true
      resources:
        limits:
          cpu: "4"
          memory: "8Gi"
      volumeMounts:
        - name: docker-cache
          mountPath: /var/lib/docker
      volumes:
        - name: docker-cache
          hostPath:
            path: /mnt/docker-cache
            type: DirectoryOrCreate

Caching Strategy Summary

Cache Type	Mechanism	Scope	Invalidation
npm dependencies	`actions/cache`	Per-lockfile hash	When package-lock.json changes
TypeScript compilation	Turborepo remote cache	Per-file hash	When source files change
Docker layers	`type=gha` BuildKit cache	Per-service	When Dockerfile or COPY sources change
Base images	Self-hosted runner cache	Persistent	Manual pull of new versions

Case Study: 8-Service Monorepo Transformation

Before

✓Pipeline structure: Single job, sequential builds of all 9 artifacts
✓Average pipeline time: 45 minutes
✓Docker builds: No layer caching, node:18 base image (950MB), npm install including devDependencies in runtime image
✓Testing: All tests run on every commit, including slow integration tests
✓Security scanning: None
✓Deployment: Manual kubectl apply by the lead developer

Changes Implemented by Stripe Systems

1. Affected detection: The bash script above was implemented, with the dependency map maintained alongside package.json workspace configuration. Average commits affected 1-2 services.

2. Matrix builds: Affected services build in parallel. With 4 runners, 4 services build simultaneously.

3. Docker optimization:

✓Multi-stage builds (deps → build → runtime)
✓Distroless base images (950MB → 89MB per image)
✓BuildKit GHA caching (rebuilds only changed layers)
✓Layer ordering (package.json before source code)

4. Test segmentation:

✓Unit tests: Run on affected services only (matrix strategy)
✓Integration tests: Run only when the service or its dependencies change
✓E2E tests: Run on main branch pushes only (not on PRs for speed)

5. Security gates: Trivy scan on every image, SBOM generation with Syft, secret scanning with gitleaks.

6. ArgoCD deployment: Image Updater watches GHCR and deploys new images automatically. No manual intervention.

Pipeline Timing Results

Scenario	Before	After	Reduction
Single service change	45 min	6 min	87%
Shared library change (all affected)	45 min	18 min	60%
Frontend-only change	45 min	8 min	82%
CI config change (full rebuild)	45 min	22 min	51%
Weighted average	45 min	8 min	82%

GitHub Actions Workflow (Simplified Final Version)

The final CI workflow:

name: CI
on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

jobs:
  detect:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.affected.outputs.services }}
      any: ${{ steps.affected.outputs.has_changes }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - id: affected
        run: |
          SERVICES=$(./scripts/affected.sh origin/main HEAD)
          echo "services=${SERVICES}" >> $GITHUB_OUTPUT
          [ "$SERVICES" != "[]" ] && echo "has_changes=true" >> $GITHUB_OUTPUT || echo "has_changes=false" >> $GITHUB_OUTPUT

  ci:
    needs: detect
    if: needs.detect.outputs.any == 'true'
    runs-on: self-hosted
    strategy:
      fail-fast: false
      matrix:
        service: ${{ fromJson(needs.detect.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm test --workspace=${{ matrix.service == 'frontend' && 'frontend' || format('services/{0}', matrix.service) }}
      - uses: docker/setup-buildx-action@v3
      - uses: docker/build-push-action@v5
        if: github.ref == 'refs/heads/main'
        with:
          context: .
          file: ${{ matrix.service == 'frontend' && 'frontend/Dockerfile' || format('services/{0}/Dockerfile', matrix.service) }}
          push: true
          tags: ghcr.io/org/${{ matrix.service }}:${{ github.sha }}
          cache-from: type=gha,scope=${{ matrix.service }}
          cache-to: type=gha,scope=${{ matrix.service }},mode=max

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Why Monorepo, and the CI/CD Challenges It Creates

Repository Structure

Affected Detection

Dependency Map

Affected Detection Script

GitHub Actions Workflow

CI Pipeline

Key Design Decisions

Multi-Stage Dockerfiles

Docker Image Tagging Strategy

Integration Testing in a Monorepo

Reverse Dependency Graph

Test Execution Strategy

Helm Charts Per Service

Deployment with ArgoCD Image Updater

Pipeline Performance Optimization

Self-Hosted Runners with Spot Instances

Caching Strategy Summary

Case Study: 8-Service Monorepo Transformation

Before

Changes Implemented by Stripe Systems

Pipeline Timing Results

GitHub Actions Workflow (Simplified Final Version)

Related Services from Stripe Systems

DevOps

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

Staff Augmentation — A Practical Guide for Engineering Leaders

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Why Custom Software Development Matters for Growing Businesses

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Why Monorepo, and the CI/CD Challenges It Creates

Repository Structure

Affected Detection

Dependency Map