DevOps📅 January 23, 2026· 13 min read

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

✍️

Stripe Systems Engineering

A default Docker image built from node:18 or python:3.11 ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary package is a potential CVE. This post covers the specific techniques for reducing attack surface, shrinking image size, and enforcing runtime security constraints.

Why Image Hardening Matters

Three concerns drive image hardening:

Attack surface: A container image with 400 installed packages has 400 packages worth of potential vulnerabilities. The node:18 image (Debian Bookworm-based) ships with apt, curl, wget, gcc, make, perl, and hundreds of libraries. An attacker who gains code execution inside the container has a full toolkit available.

CVE exposure: Every package in your image is scanned by vulnerability databases. More packages mean more CVE matches. Most of these CVEs are in packages your application never uses — but they still appear in compliance reports and trigger alerts.

Compliance: SOC 2, PCI DSS, and HIPAA require demonstrating that production systems minimize unnecessary software. An auditor looking at a 1.2GB image containing a C compiler will ask why.

Base Image Selection

Comparison

Base Image	Size	Package Manager	Shell	Packages	Use Case
`node:20`	~950MB	apt	bash	~400	Development only
`node:20-slim`	~200MB	apt	bash	~100	When you need apt
`node:20-alpine`	~130MB	apk	sh	~30	General production
`gcr.io/distroless/nodejs20`	~130MB	None	None	~10	Hardened production
`cgr.dev/chainguard/node`	~90MB	None (apk in -dev)	None	~5	Hardened production
`scratch`	0MB	None	None	0	Static binaries (Go, Rust)

Alpine

Alpine uses musl libc instead of glibc. This matters for:

✓Node.js native modules: Packages with native bindings (e.g., bcrypt, sharp) may need to be compiled against musl. Use npm rebuild in the build stage.
✓DNS resolution: musl's DNS resolver behaves differently from glibc. It does not support search directives in /etc/resolv.conf the same way. In Kubernetes, this can cause service discovery issues unless ndots is configured correctly in the pod spec.
✓Performance: musl's malloc implementation is simpler than glibc's. For memory-intensive workloads, benchmark before committing.

FROM node:20-alpine AS builder
RUN apk add --no-cache python3 make g++  # For native modules
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
# Native modules are compiled against musl here

Distroless

Google's distroless images contain only the language runtime and its dependencies. No package manager, no shell, no ls, no cat. You cannot docker exec -it container sh into a distroless container — there is no shell.

What is included in gcr.io/distroless/nodejs20-debian12:

✓Node.js 20 binary
✓Required shared libraries (libc, libstdc++, etc.)
✓CA certificates
✓/etc/passwd with a nonroot user

What is NOT included:

✓Shell (bash, sh)
✓Package manager (apt, apk)
✓Coreutils (ls, cat, cp, mv)
✓curl, wget, netcat
✓Compilers, interpreters

Chainguard Images

Chainguard provides hardened base images rebuilt nightly with the latest package versions. They claim zero known CVEs at build time.

# Chainguard Node.js image
FROM cgr.dev/chainguard/node:latest
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["dist/index.js"]

Chainguard images are slightly smaller than Google distroless and are updated more frequently. The tradeoff: they are a third-party dependency with a commercial model (free tier is limited).

Scratch

For statically compiled binaries (Go with CGO_ENABLED=0, Rust):

FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /server /server
USER 65534:65534
ENTRYPOINT ["/server"]

The resulting image contains exactly one file (plus CA certs). Image size is typically 5-20MB.

Multi-Stage Builds

The key principle: build dependencies should never appear in the production image.

# Stage 1: Install ALL dependencies (including devDependencies for build tools)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

# Stage 2: Build the application
FROM deps AS builder
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build
# Prune devDependencies for the runtime stage
RUN npm prune --production

# Stage 3: Production runtime
FROM gcr.io/distroless/nodejs20-debian12 AS runtime
WORKDIR /app
# Copy only production artifacts
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

EXPOSE 3000
USER nonroot:nonroot
CMD ["dist/index.js"]

What stays in the builder (not in production):

✓TypeScript compiler (typescript package)
✓Build tools (webpack, esbuild, swc)
✓Type definition packages (@types/*)
✓Test frameworks (jest, vitest)
✓Linters (eslint, prettier)

Non-Root Users

By default, Docker containers run as root (UID 0). If an attacker exploits a vulnerability in your application, they have root access inside the container. With certain misconfigurations (privileged mode, host PID namespace), this can escalate to root on the host.

Creating and Using a Non-Root User

# For Alpine-based images
FROM node:20-alpine
RUN addgroup -g 1001 -S appgroup && \
    adduser -u 1001 -S appuser -G appgroup
WORKDIR /app
COPY --chown=appuser:appgroup . .
USER appuser:appgroup
CMD ["node", "dist/index.js"]

# For Debian-based images
FROM node:20-slim
RUN groupadd -g 1001 appgroup && \
    useradd -u 1001 -g appgroup -m -s /bin/false appuser
WORKDIR /app
COPY --chown=appuser:appgroup . .
USER appuser:appgroup
CMD ["node", "dist/index.js"]

Common File Permission Issues

Problem: Application writes to /app/logs or /app/uploads at runtime, but these directories are owned by root.

# Create directories with correct ownership before switching user
RUN mkdir -p /app/logs /app/data && \
    chown -R appuser:appgroup /app/logs /app/data
USER appuser:appgroup

Problem: npm packages install global binaries to /usr/local/bin, which requires root.

Solution: Do not install global packages in the runtime image. Everything should be a local dependency in node_modules/.bin.

Problem: Application binds to port 80 or 443, which requires root.

Solution: Bind to a high port (3000, 8080) and use a Kubernetes Service or ingress controller for port mapping. There is no reason to run on privileged ports inside a container.

Distroless Already Provides Non-Root

Distroless images include a nonroot user (UID 65532):

FROM gcr.io/distroless/nodejs20-debian12
USER nonroot:nonroot
# That's it — the user already exists in the image

Layer Optimization

COPY Order Matters

Docker caches layers. When a layer's input changes, that layer and all subsequent layers are rebuilt. Order your instructions from least-frequently-changing to most-frequently-changing:

# GOOD: Dependencies change less often than source code
COPY package.json package-lock.json ./
RUN npm ci
COPY src/ ./src/
RUN npm run build

# BAD: Any source code change invalidates the npm install cache
COPY . .
RUN npm ci
RUN npm run build

Combine RUN Statements

Each RUN instruction creates a layer. Combining related commands reduces layer count and avoids caching deleted files:

# BAD: 3 layers. The apt cache from layer 1 persists in the image even
# though it's deleted in layer 3.
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# GOOD: 1 layer. The apt cache is created and deleted in the same layer.
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

.dockerignore

Prevent unnecessary files from entering the build context:

# .dockerignore
node_modules
.git
.github
*.md
docs/
tests/
coverage/
.env
.env.*
dist/
*.log
playwright-report/
test-results/
.vscode
.idea

Without .dockerignore, the entire directory (including node_modules, .git, and test artifacts) is sent to the Docker daemon as build context. For a typical Node.js project, this can be 500MB+.

Dependency Management

Pin Versions

# BAD: What version of curl is this? Will it change on next build?
RUN apk add curl

# GOOD: Pinned version. Reproducible builds.
RUN apk add --no-cache curl=8.5.0-r0

For Node.js dependencies, package-lock.json (used with npm ci) already ensures deterministic installs. For system packages, pin to specific versions.

Scanning with Trivy

# Scan an image
trivy image ghcr.io/org/payment-api:latest

# Scan with severity filter
trivy image --severity CRITICAL,HIGH ghcr.io/org/payment-api:latest

# Output as JSON for CI processing
trivy image --format json --output results.json ghcr.io/org/payment-api:latest

# Scan a Dockerfile (pre-build)
trivy config Dockerfile

Example Trivy output:

ghcr.io/org/payment-api:latest (debian 12.4)

Total: 0 (CRITICAL: 0, HIGH: 0)

Node.js (node_modules/package-lock.json)

Total: 2 (CRITICAL: 0, HIGH: 0, MEDIUM: 2)

┌──────────────┬───────────────┬──────────┬─────────┬──────────────────┐
│   Library    │ Vulnerability │ Severity │ Version │  Fixed Version   │
├──────────────┼───────────────┼──────────┼─────────┼──────────────────┤
│ semver       │ CVE-2022-xxxx │ MEDIUM   │ 7.3.7   │ 7.5.2            │
│ json5        │ CVE-2022-xxxx │ MEDIUM   │ 1.0.1   │ 1.0.2            │
└──────────────┴───────────────┴──────────┴─────────┴──────────────────┘

SBOM Generation with Syft

Software Bill of Materials (SBOM) lists every component in your image:

# Generate SBOM in SPDX format
syft ghcr.io/org/payment-api:latest -o spdx-json > sbom.spdx.json

# Generate SBOM in CycloneDX format
syft ghcr.io/org/payment-api:latest -o cyclonedx-json > sbom.cdx.json

SBOMs enable downstream consumers to audit your dependencies without access to your source code. Some government contracts and enterprise procurement processes now require SBOMs.

Secrets in Docker Builds

The Wrong Way

# NEVER do this — the secret is baked into an image layer
COPY .env /app/.env
ENV DATABASE_URL=postgres://user:password@host/db
ARG NPM_TOKEN=abc123
RUN echo "//registry.npmjs.org/:_authToken=${NPM_TOKEN}" > .npmrc

Even if you delete the file in a later layer, it exists in the previous layer and can be extracted with docker history or by inspecting the image filesystem.

BuildKit Secret Mounts

# syntax=docker/dockerfile:1
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
# Mount the secret at build time — it is NOT stored in any layer
RUN --mount=type=secret,id=npmrc,target=/app/.npmrc \
    npm ci

# Build with the secret
docker build --secret id=npmrc,src=$HOME/.npmrc -t payment-api .

The secret is mounted into the build container's filesystem during that specific RUN instruction. It is never written to a layer.

In CI

# GitHub Actions
- name: Build with secrets
  uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: ghcr.io/org/payment-api:latest
    secrets: |
      npmrc=${{ secrets.NPM_RC }}

Image Signing with Cosign

Image signing proves that an image was built by your CI system and has not been tampered with.

Keyless Signing with Sigstore

# Install cosign
go install github.com/sigstore/cosign/v2/cmd/cosign@latest

# Sign an image (keyless — uses OIDC identity)
cosign sign ghcr.io/org/payment-api:latest

# Verify a signature
cosign verify \
  --certificate-identity=https://github.com/org/repo/.github/workflows/ci.yaml@refs/heads/main \
  --certificate-oidc-issuer=https://token.actions.githubusercontent.com \
  ghcr.io/org/payment-api:latest

In CI (GitHub Actions):

- name: Sign image with Cosign
  env:
    COSIGN_EXPERIMENTAL: "1"
  run: |
    cosign sign --yes ghcr.io/org/payment-api@${{ steps.build.outputs.digest }}

Keyless signing uses your CI system's OIDC token as identity. No private keys to manage — the signature attests that the image was built by a specific GitHub Actions workflow.

Admission Control with Sigstore Policy Controller

Enforce that only signed images can run in your cluster:

apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
  name: require-signed-images
spec:
  images:
    - glob: "ghcr.io/org/**"
  authorities:
    - keyless:
        url: https://fulcio.sigstore.dev
        identities:
          - issuer: https://token.actions.githubusercontent.com
            subject: https://github.com/org/repo/.github/workflows/ci.yaml@refs/heads/main

Runtime Security

Read-Only Filesystem

# Kubernetes pod spec
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: payment-api
      image: ghcr.io/org/payment-api:abc123
      securityContext:
        readOnlyRootFilesystem: true
        runAsNonRoot: true
        runAsUser: 65532
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: logs
          mountPath: /app/logs
  volumes:
    - name: tmp
      emptyDir:
        sizeLimit: 100Mi
    - name: logs
      emptyDir:
        sizeLimit: 500Mi

readOnlyRootFilesystem: true prevents writing anywhere in the container filesystem. Mount emptyDir volumes for directories that need writes (temp files, logs).

Seccomp Profiles

Restrict which system calls the container can make:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": [
        "accept4", "bind", "clone", "close", "connect",
        "epoll_create1", "epoll_ctl", "epoll_wait",
        "exit", "exit_group", "fcntl", "fstat",
        "futex", "getpid", "getsockopt", "ioctl",
        "listen", "mmap", "mprotect", "munmap",
        "nanosleep", "openat", "pipe2", "read",
        "recvfrom", "rt_sigaction", "rt_sigprocmask",
        "sendto", "setsockopt", "socket", "write",
        "writev", "brk", "clock_gettime", "getuid",
        "getgid", "geteuid", "getegid"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Apply in the pod spec:

securityContext:
  seccompProfile:
    type: Localhost
    localhostProfile: profiles/node-api.json

Scanning in CI: Full Integration

# .github/workflows/security.yaml
name: Security Scan

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'  # Weekly scan of existing images

jobs:
  scan-image:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t payment-api:scan .

      - name: Run Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: payment-api:scan
          format: table
          exit-code: 1
          severity: CRITICAL,HIGH
          ignore-unfixed: true

      - name: Run Trivy for SARIF (always, for GitHub Security tab)
        uses: aquasecurity/trivy-action@master
        if: always()
        with:
          image-ref: payment-api:scan
          format: sarif
          output: trivy-results.sarif

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: trivy-results.sarif

      - name: Check image size
        run: |
          SIZE=$(docker image inspect payment-api:scan --format='{{.Size}}')
          SIZE_MB=$((SIZE / 1024 / 1024))
          echo "Image size: ${SIZE_MB}MB"
          if [ "$SIZE_MB" -gt 200 ]; then
            echo "::error::Image size ${SIZE_MB}MB exceeds 200MB budget"
            exit 1
          fi

The ignore-unfixed: true flag is important: it prevents failing builds on CVEs that have no available fix. You cannot patch what has not been patched upstream.

Image Size Budgets

Track image size over time to prevent regression:

#!/bin/bash
# scripts/check-image-size.sh
IMAGE=$1
MAX_SIZE_MB=${2:-200}

SIZE_BYTES=$(docker image inspect "$IMAGE" --format='{{.Size}}')
SIZE_MB=$((SIZE_BYTES / 1024 / 1024))

echo "Image: $IMAGE"
echo "Size: ${SIZE_MB}MB"
echo "Budget: ${MAX_SIZE_MB}MB"

if [ "$SIZE_MB" -gt "$MAX_SIZE_MB" ]; then
  echo "FAIL: Image exceeds size budget by $((SIZE_MB - MAX_SIZE_MB))MB"
  exit 1
fi

echo "PASS: Image is within size budget"

Case Study: Hardening a Node.js API Service

A Node.js API service for a financial data platform had been running in production for 18 months with an unhardened Docker image.

Before

# Original Dockerfile
FROM node:18
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "src/index.js"]

Problems:

✓Image size: 1.2GB (node:18 base + all dependencies including devDependencies)
✓CVE count: 47 critical, 182 high (mostly in base image packages)
✓Running as root: UID 0 — any code execution vulnerability gives root access
✓Full toolkit available: bash, curl, wget, apt, gcc — useful for attackers
✓Secrets in history: .env file was COPY'd into the image in an earlier build iteration; the layer persisted
✓No .dockerignore: Build context included .git (400MB), node_modules (300MB), test fixtures

After

The Stripe Systems engineering team rewrote the Dockerfile:

# Hardened Dockerfile
# syntax=docker/dockerfile:1

# Stage 1: Dependencies
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && npm cache clean --force

# Stage 2: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json tsconfig.json ./
RUN npm ci
COPY src/ ./src/
RUN npm run build

# Stage 3: Production
FROM gcr.io/distroless/nodejs20-debian12

LABEL org.opencontainers.image.source="https://github.com/org/payment-api"
LABEL org.opencontainers.image.description="Payment API Service"

WORKDIR /app

# Copy only production dependencies and compiled output
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./

EXPOSE 3000
USER nonroot:nonroot
CMD ["dist/index.js"]

With .dockerignore:

node_modules
.git
.github
tests/
coverage/
*.md
.env*
docker-compose*.yml
.vscode
.idea
playwright-report/
test-results/
src/     # Source is not needed — we copy dist/

Results

Metric	Before	After	Change
Image size	1.2GB	89MB	-93%
Critical CVEs	47	0	-100%
High CVEs	182	0	-100%
Medium CVEs	—	2	(npm deps, no fix available)
Running as	root (UID 0)	nonroot (UID 65532)	Non-root
Shell available	Yes (bash)	No	Removed
Package manager	Yes (apt)	No	Removed
Build context size	1.1GB	2.3MB	-99.8%
Cold start (K8s)	2.1s	1.7s	-400ms

The cold start improvement comes from two factors: smaller image means faster pull from the container registry, and the distroless runtime has less filesystem to initialize.

Trivy Scan Comparison

Before:

payment-api:before (debian 11.8)
Total: 229 (CRITICAL: 47, HIGH: 182)

┌──────────────┬────────────────┬──────────┬─────────────┐
│   Library    │ Vulnerability  │ Severity │   Status     │
├──────────────┼────────────────┼──────────┼─────────────┤
│ openssl      │ CVE-2023-xxxxx │ CRITICAL │ fixed        │
│ curl         │ CVE-2023-xxxxx │ CRITICAL │ fixed        │
│ glibc        │ CVE-2023-xxxxx │ HIGH     │ fixed        │
│ ... (226 more rows)                                     │
└──────────────┴────────────────┴──────────┴─────────────┘

After:

payment-api:after (distroless)
Total: 0 (CRITICAL: 0, HIGH: 0)

Node.js (package-lock.json)
Total: 2 (MEDIUM: 2)

┌──────────────┬────────────────┬──────────┬─────────┬──────────────┐
│   Library    │ Vulnerability  │ Severity │ Version │ Fixed Version│
├──────────────┼────────────────┼──────────┼─────────┼──────────────┤
│ semver       │ CVE-2022-25883 │ MEDIUM   │ 7.3.7   │ 7.5.2        │
│ json5        │ CVE-2022-46175 │ MEDIUM   │ 1.0.1   │ 1.0.2        │
└──────────────┴────────────────┴──────────┴─────────┴──────────────┘

CI Pipeline Integration

The final CI step for every build:

- name: Build and push
  id: build
  uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: ghcr.io/org/payment-api:${{ github.sha }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

- name: Trivy scan
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: ghcr.io/org/payment-api:${{ github.sha }}
    exit-code: 1
    severity: CRITICAL,HIGH
    ignore-unfixed: true

- name: Check image size budget
  run: |
    docker pull ghcr.io/org/payment-api:${{ github.sha }}
    SIZE=$(docker image inspect ghcr.io/org/payment-api:${{ github.sha }} --format='{{.Size}}')
    SIZE_MB=$((SIZE / 1024 / 1024))
    echo "Image size: ${SIZE_MB}MB"
    if [ "$SIZE_MB" -gt 150 ]; then
      echo "::error::Image size exceeds 150MB budget"
      exit 1
    fi

- name: Sign image
  env:
    COSIGN_EXPERIMENTAL: "1"
  run: cosign sign --yes ghcr.io/org/payment-api@${{ steps.build.outputs.digest }}

- name: Generate SBOM
  uses: anchore/sbom-action@v0
  with:
    image: ghcr.io/org/payment-api:${{ github.sha }}
    format: spdx-json
    output-file: sbom.spdx.json

Every image that reaches production is: scanned for vulnerabilities (build fails on critical/high), checked against a size budget, signed with a verifiable identity, and accompanied by an SBOM. This is not security theater — each step addresses a specific threat. Scanning catches known vulnerabilities before deployment. Size budgets prevent accidental inclusion of build tools. Signing prevents deployment of tampered images. SBOMs enable rapid response when a new CVE is disclosed in a transitive dependency.

The total effort to harden the image and integrate scanning into CI was approximately 2 days of engineering work. The ongoing cost is near zero — the pipeline runs automatically, and alerts fire only when action is needed.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting co...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather tha...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pric...

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers me...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public inter...

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there ...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

DevOps📅 January 23, 2026· 13 min read

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

✍️

Stripe Systems Engineering

Why Image Hardening Matters

Three concerns drive image hardening:

Compliance: SOC 2, PCI DSS, and HIPAA require demonstrating that production systems minimize unnecessary software. An auditor looking at a 1.2GB image containing a C compiler will ask why.

Base Image Selection

Comparison

Base Image	Size	Package Manager	Shell	Packages	Use Case
`node:20`	~950MB	apt	bash	~400	Development only
`node:20-slim`	~200MB	apt	bash	~100	When you need apt
`node:20-alpine`	~130MB	apk	sh	~30	General production
`gcr.io/distroless/nodejs20`	~130MB	None	None	~10	Hardened production
`cgr.dev/chainguard/node`	~90MB	None (apk in -dev)	None	~5	Hardened production
`scratch`	0MB	None	None	0	Static binaries (Go, Rust)

Alpine

Alpine uses musl libc instead of glibc. This matters for:

✓Node.js native modules: Packages with native bindings (e.g., bcrypt, sharp) may need to be compiled against musl. Use npm rebuild in the build stage.
✓DNS resolution: musl's DNS resolver behaves differently from glibc. It does not support search directives in /etc/resolv.conf the same way. In Kubernetes, this can cause service discovery issues unless ndots is configured correctly in the pod spec.
✓Performance: musl's malloc implementation is simpler than glibc's. For memory-intensive workloads, benchmark before committing.

FROM node:20-alpine AS builder
RUN apk add --no-cache python3 make g++  # For native modules
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
# Native modules are compiled against musl here

Distroless

What is included in gcr.io/distroless/nodejs20-debian12:

✓Node.js 20 binary
✓Required shared libraries (libc, libstdc++, etc.)
✓CA certificates
✓/etc/passwd with a nonroot user

What is NOT included:

✓Shell (bash, sh)
✓Package manager (apt, apk)
✓Coreutils (ls, cat, cp, mv)
✓curl, wget, netcat
✓Compilers, interpreters

Chainguard Images

Chainguard provides hardened base images rebuilt nightly with the latest package versions. They claim zero known CVEs at build time.

# Chainguard Node.js image
FROM cgr.dev/chainguard/node:latest
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["dist/index.js"]

Chainguard images are slightly smaller than Google distroless and are updated more frequently. The tradeoff: they are a third-party dependency with a commercial model (free tier is limited).

Scratch

For statically compiled binaries (Go with CGO_ENABLED=0, Rust):

FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /server /server
USER 65534:65534
ENTRYPOINT ["/server"]

The resulting image contains exactly one file (plus CA certs). Image size is typically 5-20MB.

Multi-Stage Builds

The key principle: build dependencies should never appear in the production image.

# Stage 1: Install ALL dependencies (including devDependencies for build tools)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

# Stage 2: Build the application
FROM deps AS builder
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build
# Prune devDependencies for the runtime stage
RUN npm prune --production

# Stage 3: Production runtime
FROM gcr.io/distroless/nodejs20-debian12 AS runtime
WORKDIR /app
# Copy only production artifacts
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

EXPOSE 3000
USER nonroot:nonroot
CMD ["dist/index.js"]

What stays in the builder (not in production):

✓TypeScript compiler (typescript package)
✓Build tools (webpack, esbuild, swc)
✓Type definition packages (@types/*)
✓Test frameworks (jest, vitest)
✓Linters (eslint, prettier)

Non-Root Users

Creating and Using a Non-Root User

# For Alpine-based images
FROM node:20-alpine
RUN addgroup -g 1001 -S appgroup && \
    adduser -u 1001 -S appuser -G appgroup
WORKDIR /app
COPY --chown=appuser:appgroup . .
USER appuser:appgroup
CMD ["node", "dist/index.js"]

# For Debian-based images
FROM node:20-slim
RUN groupadd -g 1001 appgroup && \
    useradd -u 1001 -g appgroup -m -s /bin/false appuser
WORKDIR /app
COPY --chown=appuser:appgroup . .
USER appuser:appgroup
CMD ["node", "dist/index.js"]

Common File Permission Issues

Problem: Application writes to /app/logs or /app/uploads at runtime, but these directories are owned by root.

# Create directories with correct ownership before switching user
RUN mkdir -p /app/logs /app/data && \
    chown -R appuser:appgroup /app/logs /app/data
USER appuser:appgroup

Problem: npm packages install global binaries to /usr/local/bin, which requires root.

Solution: Do not install global packages in the runtime image. Everything should be a local dependency in node_modules/.bin.

Problem: Application binds to port 80 or 443, which requires root.

Solution: Bind to a high port (3000, 8080) and use a Kubernetes Service or ingress controller for port mapping. There is no reason to run on privileged ports inside a container.

Distroless Already Provides Non-Root

Distroless images include a nonroot user (UID 65532):

FROM gcr.io/distroless/nodejs20-debian12
USER nonroot:nonroot
# That's it — the user already exists in the image

Layer Optimization

COPY Order Matters

Docker caches layers. When a layer's input changes, that layer and all subsequent layers are rebuilt. Order your instructions from least-frequently-changing to most-frequently-changing:

# GOOD: Dependencies change less often than source code
COPY package.json package-lock.json ./
RUN npm ci
COPY src/ ./src/
RUN npm run build

# BAD: Any source code change invalidates the npm install cache
COPY . .
RUN npm ci
RUN npm run build

Combine RUN Statements

Each RUN instruction creates a layer. Combining related commands reduces layer count and avoids caching deleted files:

# BAD: 3 layers. The apt cache from layer 1 persists in the image even
# though it's deleted in layer 3.
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# GOOD: 1 layer. The apt cache is created and deleted in the same layer.
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

.dockerignore

Prevent unnecessary files from entering the build context:

# .dockerignore
node_modules
.git
.github
*.md
docs/
tests/
coverage/
.env
.env.*
dist/
*.log
playwright-report/
test-results/
.vscode
.idea

Without .dockerignore, the entire directory (including node_modules, .git, and test artifacts) is sent to the Docker daemon as build context. For a typical Node.js project, this can be 500MB+.

Dependency Management

Pin Versions

# BAD: What version of curl is this? Will it change on next build?
RUN apk add curl

# GOOD: Pinned version. Reproducible builds.
RUN apk add --no-cache curl=8.5.0-r0

For Node.js dependencies, package-lock.json (used with npm ci) already ensures deterministic installs. For system packages, pin to specific versions.

Scanning with Trivy

# Scan an image
trivy image ghcr.io/org/payment-api:latest

# Scan with severity filter
trivy image --severity CRITICAL,HIGH ghcr.io/org/payment-api:latest

# Output as JSON for CI processing
trivy image --format json --output results.json ghcr.io/org/payment-api:latest

# Scan a Dockerfile (pre-build)
trivy config Dockerfile

Example Trivy output:

ghcr.io/org/payment-api:latest (debian 12.4)

Total: 0 (CRITICAL: 0, HIGH: 0)

Node.js (node_modules/package-lock.json)

Total: 2 (CRITICAL: 0, HIGH: 0, MEDIUM: 2)

┌──────────────┬───────────────┬──────────┬─────────┬──────────────────┐
│   Library    │ Vulnerability │ Severity │ Version │  Fixed Version   │
├──────────────┼───────────────┼──────────┼─────────┼──────────────────┤
│ semver       │ CVE-2022-xxxx │ MEDIUM   │ 7.3.7   │ 7.5.2            │
│ json5        │ CVE-2022-xxxx │ MEDIUM   │ 1.0.1   │ 1.0.2            │
└──────────────┴───────────────┴──────────┴─────────┴──────────────────┘

SBOM Generation with Syft

Software Bill of Materials (SBOM) lists every component in your image:

# Generate SBOM in SPDX format
syft ghcr.io/org/payment-api:latest -o spdx-json > sbom.spdx.json

# Generate SBOM in CycloneDX format
syft ghcr.io/org/payment-api:latest -o cyclonedx-json > sbom.cdx.json

SBOMs enable downstream consumers to audit your dependencies without access to your source code. Some government contracts and enterprise procurement processes now require SBOMs.

Secrets in Docker Builds

The Wrong Way

# NEVER do this — the secret is baked into an image layer
COPY .env /app/.env
ENV DATABASE_URL=postgres://user:password@host/db
ARG NPM_TOKEN=abc123
RUN echo "//registry.npmjs.org/:_authToken=${NPM_TOKEN}" > .npmrc

Even if you delete the file in a later layer, it exists in the previous layer and can be extracted with docker history or by inspecting the image filesystem.

BuildKit Secret Mounts

# syntax=docker/dockerfile:1
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
# Mount the secret at build time — it is NOT stored in any layer
RUN --mount=type=secret,id=npmrc,target=/app/.npmrc \
    npm ci

# Build with the secret
docker build --secret id=npmrc,src=$HOME/.npmrc -t payment-api .

The secret is mounted into the build container's filesystem during that specific RUN instruction. It is never written to a layer.

In CI

# GitHub Actions
- name: Build with secrets
  uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: ghcr.io/org/payment-api:latest
    secrets: |
      npmrc=${{ secrets.NPM_RC }}

Image Signing with Cosign

Image signing proves that an image was built by your CI system and has not been tampered with.

Keyless Signing with Sigstore

# Install cosign
go install github.com/sigstore/cosign/v2/cmd/cosign@latest

# Sign an image (keyless — uses OIDC identity)
cosign sign ghcr.io/org/payment-api:latest

# Verify a signature
cosign verify \
  --certificate-identity=https://github.com/org/repo/.github/workflows/ci.yaml@refs/heads/main \
  --certificate-oidc-issuer=https://token.actions.githubusercontent.com \
  ghcr.io/org/payment-api:latest

In CI (GitHub Actions):

- name: Sign image with Cosign
  env:
    COSIGN_EXPERIMENTAL: "1"
  run: |
    cosign sign --yes ghcr.io/org/payment-api@${{ steps.build.outputs.digest }}

Keyless signing uses your CI system's OIDC token as identity. No private keys to manage — the signature attests that the image was built by a specific GitHub Actions workflow.

Admission Control with Sigstore Policy Controller

Enforce that only signed images can run in your cluster:

apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
  name: require-signed-images
spec:
  images:
    - glob: "ghcr.io/org/**"
  authorities:
    - keyless:
        url: https://fulcio.sigstore.dev
        identities:
          - issuer: https://token.actions.githubusercontent.com
            subject: https://github.com/org/repo/.github/workflows/ci.yaml@refs/heads/main

Runtime Security

Read-Only Filesystem

# Kubernetes pod spec
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: payment-api
      image: ghcr.io/org/payment-api:abc123
      securityContext:
        readOnlyRootFilesystem: true
        runAsNonRoot: true
        runAsUser: 65532
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - ALL
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: logs
          mountPath: /app/logs
  volumes:
    - name: tmp
      emptyDir:
        sizeLimit: 100Mi
    - name: logs
      emptyDir:
        sizeLimit: 500Mi

readOnlyRootFilesystem: true prevents writing anywhere in the container filesystem. Mount emptyDir volumes for directories that need writes (temp files, logs).

Seccomp Profiles

Restrict which system calls the container can make:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": [
        "accept4", "bind", "clone", "close", "connect",
        "epoll_create1", "epoll_ctl", "epoll_wait",
        "exit", "exit_group", "fcntl", "fstat",
        "futex", "getpid", "getsockopt", "ioctl",
        "listen", "mmap", "mprotect", "munmap",
        "nanosleep", "openat", "pipe2", "read",
        "recvfrom", "rt_sigaction", "rt_sigprocmask",
        "sendto", "setsockopt", "socket", "write",
        "writev", "brk", "clock_gettime", "getuid",
        "getgid", "geteuid", "getegid"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Apply in the pod spec:

securityContext:
  seccompProfile:
    type: Localhost
    localhostProfile: profiles/node-api.json

Scanning in CI: Full Integration

# .github/workflows/security.yaml
name: Security Scan

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'  # Weekly scan of existing images

jobs:
  scan-image:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t payment-api:scan .

      - name: Run Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: payment-api:scan
          format: table
          exit-code: 1
          severity: CRITICAL,HIGH
          ignore-unfixed: true

      - name: Run Trivy for SARIF (always, for GitHub Security tab)
        uses: aquasecurity/trivy-action@master
        if: always()
        with:
          image-ref: payment-api:scan
          format: sarif
          output: trivy-results.sarif

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: trivy-results.sarif

      - name: Check image size
        run: |
          SIZE=$(docker image inspect payment-api:scan --format='{{.Size}}')
          SIZE_MB=$((SIZE / 1024 / 1024))
          echo "Image size: ${SIZE_MB}MB"
          if [ "$SIZE_MB" -gt 200 ]; then
            echo "::error::Image size ${SIZE_MB}MB exceeds 200MB budget"
            exit 1
          fi

The ignore-unfixed: true flag is important: it prevents failing builds on CVEs that have no available fix. You cannot patch what has not been patched upstream.

Image Size Budgets

Track image size over time to prevent regression:

#!/bin/bash
# scripts/check-image-size.sh
IMAGE=$1
MAX_SIZE_MB=${2:-200}

SIZE_BYTES=$(docker image inspect "$IMAGE" --format='{{.Size}}')
SIZE_MB=$((SIZE_BYTES / 1024 / 1024))

echo "Image: $IMAGE"
echo "Size: ${SIZE_MB}MB"
echo "Budget: ${MAX_SIZE_MB}MB"

if [ "$SIZE_MB" -gt "$MAX_SIZE_MB" ]; then
  echo "FAIL: Image exceeds size budget by $((SIZE_MB - MAX_SIZE_MB))MB"
  exit 1
fi

echo "PASS: Image is within size budget"

Case Study: Hardening a Node.js API Service

A Node.js API service for a financial data platform had been running in production for 18 months with an unhardened Docker image.

Before

# Original Dockerfile
FROM node:18
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "src/index.js"]

Problems:

✓Image size: 1.2GB (node:18 base + all dependencies including devDependencies)
✓CVE count: 47 critical, 182 high (mostly in base image packages)
✓Running as root: UID 0 — any code execution vulnerability gives root access
✓Full toolkit available: bash, curl, wget, apt, gcc — useful for attackers
✓Secrets in history: .env file was COPY'd into the image in an earlier build iteration; the layer persisted
✓No .dockerignore: Build context included .git (400MB), node_modules (300MB), test fixtures

After

The Stripe Systems engineering team rewrote the Dockerfile:

# Hardened Dockerfile
# syntax=docker/dockerfile:1

# Stage 1: Dependencies
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && npm cache clean --force

# Stage 2: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json tsconfig.json ./
RUN npm ci
COPY src/ ./src/
RUN npm run build

# Stage 3: Production
FROM gcr.io/distroless/nodejs20-debian12

LABEL org.opencontainers.image.source="https://github.com/org/payment-api"
LABEL org.opencontainers.image.description="Payment API Service"

WORKDIR /app

# Copy only production dependencies and compiled output
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./

EXPOSE 3000
USER nonroot:nonroot
CMD ["dist/index.js"]

With .dockerignore:

node_modules
.git
.github
tests/
coverage/
*.md
.env*
docker-compose*.yml
.vscode
.idea
playwright-report/
test-results/
src/     # Source is not needed — we copy dist/

Results

Metric	Before	After	Change
Image size	1.2GB	89MB	-93%
Critical CVEs	47	0	-100%
High CVEs	182	0	-100%
Medium CVEs	—	2	(npm deps, no fix available)
Running as	root (UID 0)	nonroot (UID 65532)	Non-root
Shell available	Yes (bash)	No	Removed
Package manager	Yes (apt)	No	Removed
Build context size	1.1GB	2.3MB	-99.8%
Cold start (K8s)	2.1s	1.7s	-400ms

The cold start improvement comes from two factors: smaller image means faster pull from the container registry, and the distroless runtime has less filesystem to initialize.

Trivy Scan Comparison

Before:

payment-api:before (debian 11.8)
Total: 229 (CRITICAL: 47, HIGH: 182)

┌──────────────┬────────────────┬──────────┬─────────────┐
│   Library    │ Vulnerability  │ Severity │   Status     │
├──────────────┼────────────────┼──────────┼─────────────┤
│ openssl      │ CVE-2023-xxxxx │ CRITICAL │ fixed        │
│ curl         │ CVE-2023-xxxxx │ CRITICAL │ fixed        │
│ glibc        │ CVE-2023-xxxxx │ HIGH     │ fixed        │
│ ... (226 more rows)                                     │
└──────────────┴────────────────┴──────────┴─────────────┘

After:

payment-api:after (distroless)
Total: 0 (CRITICAL: 0, HIGH: 0)

Node.js (package-lock.json)
Total: 2 (MEDIUM: 2)

┌──────────────┬────────────────┬──────────┬─────────┬──────────────┐
│   Library    │ Vulnerability  │ Severity │ Version │ Fixed Version│
├──────────────┼────────────────┼──────────┼─────────┼──────────────┤
│ semver       │ CVE-2022-25883 │ MEDIUM   │ 7.3.7   │ 7.5.2        │
│ json5        │ CVE-2022-46175 │ MEDIUM   │ 1.0.1   │ 1.0.2        │
└──────────────┴────────────────┴──────────┴─────────┴──────────────┘

CI Pipeline Integration

The final CI step for every build:

- name: Build and push
  id: build
  uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: ghcr.io/org/payment-api:${{ github.sha }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

- name: Trivy scan
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: ghcr.io/org/payment-api:${{ github.sha }}
    exit-code: 1
    severity: CRITICAL,HIGH
    ignore-unfixed: true

- name: Check image size budget
  run: |
    docker pull ghcr.io/org/payment-api:${{ github.sha }}
    SIZE=$(docker image inspect ghcr.io/org/payment-api:${{ github.sha }} --format='{{.Size}}')
    SIZE_MB=$((SIZE / 1024 / 1024))
    echo "Image size: ${SIZE_MB}MB"
    if [ "$SIZE_MB" -gt 150 ]; then
      echo "::error::Image size exceeds 150MB budget"
      exit 1
    fi

- name: Sign image
  env:
    COSIGN_EXPERIMENTAL: "1"
  run: cosign sign --yes ghcr.io/org/payment-api@${{ steps.build.outputs.digest }}

- name: Generate SBOM
  uses: anchore/sbom-action@v0
  with:
    image: ghcr.io/org/payment-api:${{ github.sha }}
    format: spdx-json
    output-file: sbom.spdx.json

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

DevOps

Infrastructure automation, CI/CD pipelines, and security practices integrated from project inception.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Software DevelopmentJanuary 3, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

Most teams agree that automated tests are valuable. Far fewer teams write the tests *before* the implementation. The gap between those two positions is where the majority of preventable defects live.

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Why Image Hardening Matters

Base Image Selection

Comparison

Alpine

Distroless

Chainguard Images

Scratch

Multi-Stage Builds

Non-Root Users

Creating and Using a Non-Root User

Common File Permission Issues

Distroless Already Provides Non-Root

Layer Optimization

COPY Order Matters

Combine RUN Statements

.dockerignore

Dependency Management

Pin Versions

Scanning with Trivy

SBOM Generation with Syft

Secrets in Docker Builds

The Wrong Way

BuildKit Secret Mounts

In CI

Image Signing with Cosign

Keyless Signing with Sigstore

Admission Control with Sigstore Policy Controller

Runtime Security

Read-Only Filesystem

Seccomp Profiles

Scanning in CI: Full Integration

Image Size Budgets

Case Study: Hardening a Node.js API Service

Before

After

Results

Trivy Scan Comparison

CI Pipeline Integration

Related Services from Stripe Systems

DevOps

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline