Software Development📅 January 3, 2026· 10 min read

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

✍️

Stripe Systems

Most teams agree that automated tests are valuable. Far fewer teams write the tests before the implementation. The gap between those two positions is where the majority of preventable defects live.

We practice test-driven development across our engineering teams — in Flutter mobile applications, NestJS backend services, and infrastructure automation. This is not a theoretical preference. It is a deliberate process decision based on measurable outcomes: fewer production defects, more maintainable codebases, and faster onboarding for new engineers who can read the test suite as living documentation.

This post describes how TDD works in practice, where it helps the most, and where it does not apply.

The Red-Green-Refactor Cycle

TDD follows a tight loop: write a failing test, write the minimum code to make it pass, then refactor while keeping the test green. Each iteration is small — often under five minutes.

Here is a concrete example. Suppose we are building an authentication service that validates user credentials and returns a session token.

Step 1: Red — Write a Failing Test

Before writing any implementation, we write a test that describes the behavior we want:

test('returns a session token when credentials are valid', () async {
  final authService = AuthService(
    userRepository: FakeUserRepository(
      users: [User(email: '[email protected]', passwordHash: hashOf('correct-password'))],
    ),
    tokenGenerator: FakeTokenGenerator(fixedToken: 'abc-123'),
  );

  final result = await authService.authenticate(
    email: '[email protected]',
    password: 'correct-password',
  );

  expect(result, equals(AuthResult.success(token: 'abc-123')));
});

This test will not compile. AuthService, AuthResult, FakeUserRepository, and FakeTokenGenerator do not exist yet. That is the point. We have defined the interface before writing the implementation.

Step 2: Green — Make It Pass

Now we write the minimum code required to pass this test. We create the AuthService class, define the AuthResult type, and implement just enough logic to verify the password and return the token. No session management, no rate limiting, no audit logging — those are separate test cases.

Step 3: Refactor

With a passing test, we can restructure the code with confidence. Maybe we extract the password verification into a dedicated PasswordVerifier class, or we change AuthResult from a class hierarchy to a sealed union. The test tells us immediately if the refactoring breaks anything.

We then write the next test — perhaps for invalid credentials:

test('returns failure when password does not match', () async {
  final authService = AuthService(
    userRepository: FakeUserRepository(
      users: [User(email: '[email protected]', passwordHash: hashOf('correct-password'))],
    ),
    tokenGenerator: FakeTokenGenerator(fixedToken: 'abc-123'),
  );

  final result = await authService.authenticate(
    email: '[email protected]',
    password: 'wrong-password',
  );

  expect(result, equals(AuthResult.invalidCredentials()));
});

Each test adds one new behavior. After ten or fifteen iterations, we have a fully specified authentication service with comprehensive test coverage — and every edge case is documented in the test file.

How TDD Drives Better API Design

Writing the test first forces you to become the caller before you become the implementer. This inversion is the single most important design benefit of TDD.

When you write the implementation first, you tend to expose whatever internal structure felt convenient during development. When you write the test first, you immediately encounter questions that shape the public API:

✓What arguments does this function need? If the test setup requires passing eight parameters, the interface is too complex. You will simplify it before writing a single line of production code.
✓What does the return type look like? Returning a raw Map<String, dynamic> feels fine during implementation. When you write expect(result.token, equals('abc-123')) in a test, you realize you need a typed response object.
✓How does error handling work? Writing expect(() => service.authenticate(...), throwsA(isA<RateLimitException>())) forces you to decide on the error contract upfront rather than letting exceptions leak from implementation details.

In our experience, services designed test-first have more consistent interfaces, fewer breaking changes during development, and clearer separation of concerns. The test acts as the first client of your API. If it is awkward to test, it will be awkward to use.

TDD at Different Test Layers

TDD is not limited to unit tests. The red-green-refactor cycle applies at multiple levels of the testing pyramid, though the cost and speed differ at each layer.

Unit Tests — Isolated Business Logic

Unit tests are where TDD delivers the most value per minute invested. They run in milliseconds, require no external dependencies, and give precise feedback about which behavior broke.

We use fakes and stubs (not mocks, where possible) to isolate the unit under test. In the authentication example above, FakeUserRepository is a simple in-memory implementation of the repository interface. This approach keeps tests fast and readable without coupling them to a mocking framework's API.

The target for unit tests is pure business logic: validation rules, state machines, data transformations, and domain calculations. If a class has no dependencies on databases, network calls, or file systems, it should be tested at this level.

Integration Tests — Boundary Verification

Integration tests verify that components interact correctly across boundaries: database queries return expected results, HTTP clients serialize requests properly, message queues deliver payloads in the right format.

TDD for integration tests follows the same cycle but with a longer feedback loop. We write the test first, describing the expected interaction, then implement the adapter or repository that makes it pass. These tests typically run against a real database (often a Docker container spun up in CI) rather than an in-memory fake.

For example, when building a PostgreSQL-backed user repository, we write:

it('returns null when the user does not exist', async () => {
  const repo = new PostgresUserRepository(testDataSource);
  const user = await repo.findByEmail('[email protected]');
  expect(user).toBeNull();
});

This test drives the actual SQL query implementation and catches issues like incorrect column mappings or missing null handling that unit tests with fakes would miss.

End-to-End Tests — User Journey Validation

E2E tests verify complete user workflows: a user signs up, receives a confirmation email, logs in, and sees their dashboard. These are the slowest and most expensive tests, so we write fewer of them and reserve them for critical paths.

TDD is still applicable here, but the cycle is longer. We define the expected user journey in a test, then build the features to make it pass. In Flutter, we use integration tests with IntegrationTestWidgetsFlutterBinding. For web applications, we use Playwright. The failing E2E test serves as the acceptance criterion for the feature.

Addressing Common Objections

"TDD Is Slower"

This is partially true in the short term. Empirical studies support this — and also support the trade-off being worth it.

A study conducted across multiple teams at Microsoft and IBM, published in the journal Empirical Software Engineering (Nagappan et al., 2008), measured the impact of TDD on four product teams. The results showed a 40–90% reduction in pre-release defect density compared to similar projects that did not use TDD. The corresponding increase in development time was 15–35%.

The arithmetic favors TDD in any project that will be maintained for more than a few months. A 25% increase in initial development time is easily recovered when you are not spending weeks debugging integration failures, writing regression tests after the fact, or rewriting brittle code that nobody dares refactor.

The overhead also decreases with practice. Engineers who have written tests first for six months are measurably faster than they were at the start — the design thinking that TDD requires becomes habitual.

"TDD Doesn't Work for UI"

This objection conflates visual layout with UI behavior. TDD is not effective for tweaking padding and color values. It is effective for testing widget behavior, state transitions, and conditional rendering.

In Flutter, WidgetTester allows us to write tests that pump widgets, tap buttons, enter text, and verify the resulting widget tree:

testWidgets('shows error message when login fails', (tester) async {
  await tester.pumpWidget(
    MaterialApp(home: LoginScreen(authService: FakeFailingAuthService())),
  );

  await tester.enterText(find.byKey(Key('email-field')), '[email protected]');
  await tester.enterText(find.byKey(Key('password-field')), 'wrong');
  await tester.tap(find.byKey(Key('login-button')));
  await tester.pumpAndSettle();

  expect(find.text('Invalid credentials'), findsOneWidget);
});

This test drives the implementation of the LoginScreen widget. It does not care about fonts or spacing — it verifies that the widget responds correctly to user input and displays the right feedback.

Golden tests (screenshot comparison) can catch visual regressions, but they are fragile across platforms and should not be the primary testing strategy. We use them sparingly for components where pixel-level accuracy matters, like charts or branded layouts.

"We'll Add Tests Later"

In our experience, this almost never happens. Code written without tests tends to be structurally untestable — it mixes business logic with I/O, uses concrete dependencies instead of abstractions, and relies on global state.

Retrofitting tests onto untestable code requires refactoring the code first. But without existing tests, that refactoring is risky. This is the testing deadlock: you cannot add tests without refactoring, and you cannot refactor safely without tests. TDD avoids this trap entirely by ensuring every piece of code is testable from the start, because it was literally written to satisfy a test.

There is also a prioritization problem. When a feature ships without tests, the next task is always another feature, not writing tests for the last one. The test backlog grows monotonically. We have inherited codebases with zero test coverage that took months to stabilize — time that would have been saved if TDD had been the default from the beginning.

TDD in Flutter

Flutter's testing tools make TDD practical across the full widget tree.

Unit tests cover business logic, data models, and utility functions. These are plain Dart tests with no Flutter dependency.

Widget tests use WidgetTester to render individual widgets in isolation. They run on a headless test environment and execute in seconds. We test-drive every interactive widget: forms, lists, navigation flows, conditional UI. The test defines what the widget should render given specific inputs and interactions, then we build the widget to satisfy it.

Bloc tests use the bloc_test package to verify state management logic. The pattern maps directly to TDD:

blocTest<AuthBloc, AuthState>(
  'emits [loading, authenticated] when login succeeds',
  build: () => AuthBloc(authService: FakeSuccessAuthService()),
  act: (bloc) => bloc.add(LoginRequested(email: '[email protected]', password: 'pass')),
  expect: () => [AuthLoading(), Authenticated(token: 'abc-123')],
);

The blocTest helper makes the red-green-refactor cycle fast: define the expected state sequence, run the test (red), implement the bloc event handler (green), refactor the internal logic while keeping the state contract stable.

TDD in Backend Services

Our NestJS backend services follow the same discipline. NestJS's dependency injection system makes TDD straightforward — every controller and service receives its dependencies through the constructor, so substituting test doubles is trivial.

A typical controller test looks like this:

describe('UsersController', () => {
  let controller: UsersController;
  let usersService: jest.Mocked<UsersService>;

  beforeEach(async () => {
    const module = await Test.createTestingModule({
      controllers: [UsersController],
      providers: [
        { provide: UsersService, useValue: { findById: jest.fn() } },
      ],
    }).compile();

    controller = module.get(UsersController);
    usersService = module.get(UsersService);
  });

  it('returns a user when found', async () => {
    usersService.findById.mockResolvedValue({ id: '1', name: 'Alice' });
    const result = await controller.getUser('1');
    expect(result).toEqual({ id: '1', name: 'Alice' });
  });

  it('throws NotFoundException when user does not exist', async () => {
    usersService.findById.mockResolvedValue(null);
    await expect(controller.getUser('999')).rejects.toThrow(NotFoundException);
  });
});

We write these tests before implementing the controller methods. The test defines the HTTP contract — what the endpoint returns for valid input, what error it throws for missing resources — and the implementation follows.

For database access, we use the repository pattern to isolate persistence logic behind an interface. Repository implementations are tested with integration tests against a real PostgreSQL instance. Service and controller tests use in-memory fakes or mocks of the repository interface, keeping them fast and deterministic.

When TDD Is Less Effective

TDD is a tool, not a dogma. There are situations where writing tests first adds friction without proportional benefit.

Exploratory prototyping. When we are investigating whether an approach is feasible — experimenting with a new API, testing a library's behavior, or spiking a proof of concept — writing tests first slows down the exploration. We prototype without tests, then rewrite with TDD once the approach is validated. The key discipline is that prototype code does not ship. If it is worth building, it is worth building again with tests.

Pure visual layout. Adjusting CSS grid configurations, tweaking spacing values, or selecting color palettes are visual decisions verified by looking at the screen, not by assertions in code. Widget behavior is testable; widget aesthetics are not.

One-off scripts and migrations. A data migration script that runs once and is discarded does not benefit from the TDD cycle. We still test critical transformations manually, but investing in a full test suite for throwaway code is not a good use of time.

Third-party integration spikes. When integrating with a poorly documented external API, the first step is usually exploratory — sending requests, inspecting responses, and understanding the actual behavior. Once the API's behavior is understood, we write tests that codify those expectations and then build the integration adapter using TDD.

The Long-Term Return

The value of TDD compounds over time. At month one, you have a tested authentication service. At month twelve, you have a codebase where every module has a corresponding test file that documents its behavior, every edge case is captured in an assertion, and any engineer can refactor with confidence because the test suite will catch regressions in seconds.

The alternative — a codebase with sparse or absent test coverage — becomes progressively harder to change. Engineers lose confidence. Deployments become stressful. Bug fixes introduce new bugs. The cost of change increases until the team spends more time working around the code than working on the product.

TDD is not free. It requires discipline, practice, and organizational commitment. But in our experience, the engineering teams that practice TDD consistently ship more reliable software, onboard new developers faster, and maintain velocity over the long term. The initial investment in writing tests first pays for itself within the first few months of a project's life.

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

Custom Software Development

Purpose-built software designed around your business logic, data workflows, and operational requirements.

Learn more →

Quality Assurance & Testing

Structured testing practices integrated into the development lifecycle, not bolted on at the end.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

The term "AI agent" has been diluted by marketing to the point where it describes everything from a chatbot with a system prompt to a fully autonomous multi-step reasoning system. For this discussi...

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

The methodology debate in software development is older than most of the frameworks we argue about on the internet. Waterfall has been declared dead roughly once per year since the Agile Manifesto ...

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Code review is the most important quality gate in a software team, and it is also the most common bottleneck. Every team has the same problem: senior engineers are the reviewers, they have their ow...

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

The phrase "AI-augmented SDLC" gets thrown around loosely. Vendors pitch it as "AI writes your code." That is not what it means in practice. What it actually means: at every phase of the developmen...

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI-assisted testing has moved from research papers into daily engineering workflows. Tools powered by large language models can generate test scaffolds, detect visual regressions, predict flaky tes...

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Generic AI code review tools are good at catching syntax errors, unused variables, and simple bugs. They are poor at catching architecture violations — the kind of issues that compound over months ...

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

AI tools are not magic. They do not replace engineers, they do not understand your codebase, and they will confidently generate code that compiles but violates your business rules. What they do — w...

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Every team building on microservices eventually hits the same question: how should clients talk to your backend? The answer is some form of API gateway — but which pattern you choose has lasting co...

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on ...

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder s...

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

One of the first and most important decisions in any mobile app project is choosing between native and cross-platform development. Each approach has distinct advantages, and the right choice depend...

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Monorepos consolidate multiple services, shared libraries, and frontend applications into a single repository. This brings benefits — atomic cross-service changes, shared tooling, simplified depend...

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Software architecture is not about choosing the right framework. It is about deciding which parts of a system should be easy to change and which should be stable — then enforcing that decision stru...

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

Flutter gives you a rendering engine and a widget tree. It does not give you an architecture. That gap is where most projects accumulate the technical debt that slows them down six months after lau...

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Most enterprise teams treat DevOps as something to bolt on after the application takes shape. Security gets deferred even further — relegated to a penetration test two weeks before launch. This seq...

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

A default Docker image built from `node:18` or `python:3.11` ships with hundreds of packages you do not need in production — compilers, package managers, shells, debug utilities. Each unnecessary p...

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Most backend systems start as synchronous request-response services. A client sends a request, the server processes it, and returns a result. This model is simple to reason about, easy to debug, an...

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Most organizations overspend on AWS by 25–35%. Not because their engineers are careless, but because cloud billing is structurally opaque. Pricing varies by region, instance family, tenancy, paymen...

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

Cross-platform mobile development has converged on two serious contenders: Flutter and React Native. Both are production-ready for enterprise applications, but they make fundamentally different arc...

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure drift — the divergence between what is declared in code and what is actually running — is the root cause of a large class of production incidents. GitOps addresses this by making Git...

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Cloud misconfigurations remain the most common cause of cloud security incidents. The 2024 Verizon Data Breach Investigations Report attributes 74% of cloud breaches to misconfiguration or misuse, ...

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Backend concurrency is not a solved problem. It is a set of trade-offs that shift with every workload profile. Java 21 introduced virtual threads — lightweight threads managed by the JVM rather tha...

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

Multi-tenancy in Kubernetes is not a single problem — it is a spectrum of isolation requirements that vary based on trust boundaries, compliance mandates, and operational capacity. This post examin...

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

LLM API costs follow a simple formula: tokens consumed × price per token. At low volume, this is negligible. At production scale, it becomes a significant line item. A system processing 1 million r...

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

The pitch for micro-frontends is compelling: split a monolithic frontend into independently deployable units owned by autonomous teams. The reality is more nuanced. Module Federation, introduced in...

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

The architecture decision between microservices and a monolith is not a technology choice — it is an organizational one. The right answer depends on your team size, your domain maturity, your opera...

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Multi-cloud is one of the most oversold ideas in infrastructure. The pitch is simple: run workloads across AWS, GCP, and Azure to avoid vendor lock-in, improve resilience, and negotiate better pric...

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

REST and GraphQL dominate client-facing APIs for good reason: browser support, tooling maturity, and developer familiarity. But for service-to-service communication inside a cluster, gRPC offers me...

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Engineering leaders who need to extend capacity beyond their core team face a fundamental choice between two models: hire individual freelancers through marketplace platforms, or establish a dedica...

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Most web applications treat offline support as an afterthought — a "no internet" screen with a sad dinosaur. Offline-first flips this: the app is designed to work without a network connection, and ...

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

The offshore development industry has a reputation problem, and it is largely self-inflicted. For two decades, the dominant sales pitch was cost arbitrage: "Get the same work done for 60% less." Th...

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

The single biggest risk in staff augmentation is not cost, quality, or attrition. It is the velocity dip during onboarding. A team that goes from signing a contract to productive output in 4 weeks ...

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Most engineering leaders approach the onshore-vs-offshore decision with a spreadsheet containing hourly rates and a vague sense of "risk." That is insufficient. The actual decision involves at leas...

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Retrieval-Augmented Generation (RAG) has become the default architecture for building LLM-powered applications over proprietary data. The core idea is straightforward: instead of fine-tuning a lang...

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Every developer on your team uses LLMs differently. One engineer writes "make me a login page" and gets generic boilerplate. Another writes a structured prompt with framework constraints, authentic...

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Every year, engineering leaders evaluate staff augmentation options by comparing hourly rates on a spreadsheet. Offshore at $40–55/hr, nearshore at $65–85/hr, onshore at $130–180/hr. The math looks...

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Most teams adopt the Next.js App Router and immediately add `"use client"` to every component that does anything interactive. Within a week, they've recreated a fully client-rendered SPA with extra...

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

If you are a CTO or founder evaluating India for an Offshore Development Centre (ODC), you have probably encountered two types of advice: breathless marketing from outsourcing firms promising effor...

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

"Shift left" means running security checks earlier in the development lifecycle — during coding and code review rather than after deployment. The economic argument is straightforward: a vulnerabili...

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

SOC 2 Type II audits examine whether your security controls work consistently over a defined observation period — typically 6 to 12 months. Unlike Type I, which captures a point-in-time snapshot, T...

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Staff augmentation is a staffing model where external engineers join your team on a contract basis, working under your technical leadership and within your existing processes. Unlike project outsou...

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

React 19 shipped server components, and with them came a reasonable question: do we still need client-side state management libraries? The answer is yes, but the reasoning has shifted. Server compo...

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Terraform works well for a single team managing a handful of resources. It does not work well when five teams share a single state file containing 200+ resources. This post covers the specific prob...

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

In today's competitive landscape, growing businesses face a critical decision: should they rely on off-the-shelf software or invest in custom-built solutions? While pre-built tools offer quick depl...

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Zero-trust networking operates on a simple principle: no request is trusted based on its network origin. A request from inside your VPC receives the same scrutiny as a request from the public inter...

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Traditional network security operates on a simple assumption: traffic inside the firewall is trusted, traffic outside is not. This model fails in cloud environments for three reasons. First, there ...

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

Most "offshoring rate" guides float a single dollar number per country and call it analysis. That number is almost always wrong — because it conflates raw salary with the fully-loaded cost of empl...

DevOpsApril 28, 2026

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Most engineering organisations think they have a DevOps problem. They do not. They have a DevOps *belief* problem — they believe their CI/CD pipeline, weekly deploys, and a Datadog dashboard amou...

Software Development📅 January 3, 2026· 10 min read

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

✍️

Stripe Systems

Most teams agree that automated tests are valuable. Far fewer teams write the tests before the implementation. The gap between those two positions is where the majority of preventable defects live.

This post describes how TDD works in practice, where it helps the most, and where it does not apply.

The Red-Green-Refactor Cycle

TDD follows a tight loop: write a failing test, write the minimum code to make it pass, then refactor while keeping the test green. Each iteration is small — often under five minutes.

Here is a concrete example. Suppose we are building an authentication service that validates user credentials and returns a session token.

Step 1: Red — Write a Failing Test

Before writing any implementation, we write a test that describes the behavior we want:

test('returns a session token when credentials are valid', () async {
  final authService = AuthService(
    userRepository: FakeUserRepository(
      users: [User(email: '[email protected]', passwordHash: hashOf('correct-password'))],
    ),
    tokenGenerator: FakeTokenGenerator(fixedToken: 'abc-123'),
  );

  final result = await authService.authenticate(
    email: '[email protected]',
    password: 'correct-password',
  );

  expect(result, equals(AuthResult.success(token: 'abc-123')));
});

Step 2: Green — Make It Pass

Step 3: Refactor

We then write the next test — perhaps for invalid credentials:

test('returns failure when password does not match', () async {
  final authService = AuthService(
    userRepository: FakeUserRepository(
      users: [User(email: '[email protected]', passwordHash: hashOf('correct-password'))],
    ),
    tokenGenerator: FakeTokenGenerator(fixedToken: 'abc-123'),
  );

  final result = await authService.authenticate(
    email: '[email protected]',
    password: 'wrong-password',
  );

  expect(result, equals(AuthResult.invalidCredentials()));
});

How TDD Drives Better API Design

Writing the test first forces you to become the caller before you become the implementer. This inversion is the single most important design benefit of TDD.

✓What arguments does this function need? If the test setup requires passing eight parameters, the interface is too complex. You will simplify it before writing a single line of production code.
✓What does the return type look like? Returning a raw Map<String, dynamic> feels fine during implementation. When you write expect(result.token, equals('abc-123')) in a test, you realize you need a typed response object.
✓How does error handling work? Writing expect(() => service.authenticate(...), throwsA(isA<RateLimitException>())) forces you to decide on the error contract upfront rather than letting exceptions leak from implementation details.

TDD at Different Test Layers

TDD is not limited to unit tests. The red-green-refactor cycle applies at multiple levels of the testing pyramid, though the cost and speed differ at each layer.

Unit Tests — Isolated Business Logic

Unit tests are where TDD delivers the most value per minute invested. They run in milliseconds, require no external dependencies, and give precise feedback about which behavior broke.

Integration Tests — Boundary Verification

For example, when building a PostgreSQL-backed user repository, we write:

it('returns null when the user does not exist', async () => {
  const repo = new PostgresUserRepository(testDataSource);
  const user = await repo.findByEmail('[email protected]');
  expect(user).toBeNull();
});

This test drives the actual SQL query implementation and catches issues like incorrect column mappings or missing null handling that unit tests with fakes would miss.

End-to-End Tests — User Journey Validation

Addressing Common Objections

"TDD Is Slower"

This is partially true in the short term. Empirical studies support this — and also support the trade-off being worth it.

"TDD Doesn't Work for UI"

In Flutter, WidgetTester allows us to write tests that pump widgets, tap buttons, enter text, and verify the resulting widget tree:

testWidgets('shows error message when login fails', (tester) async {
  await tester.pumpWidget(
    MaterialApp(home: LoginScreen(authService: FakeFailingAuthService())),
  );

  await tester.enterText(find.byKey(Key('email-field')), '[email protected]');
  await tester.enterText(find.byKey(Key('password-field')), 'wrong');
  await tester.tap(find.byKey(Key('login-button')));
  await tester.pumpAndSettle();

  expect(find.text('Invalid credentials'), findsOneWidget);
});

"We'll Add Tests Later"

TDD in Flutter

Flutter's testing tools make TDD practical across the full widget tree.

Unit tests cover business logic, data models, and utility functions. These are plain Dart tests with no Flutter dependency.

Bloc tests use the bloc_test package to verify state management logic. The pattern maps directly to TDD:

blocTest<AuthBloc, AuthState>(
  'emits [loading, authenticated] when login succeeds',
  build: () => AuthBloc(authService: FakeSuccessAuthService()),
  act: (bloc) => bloc.add(LoginRequested(email: '[email protected]', password: 'pass')),
  expect: () => [AuthLoading(), Authenticated(token: 'abc-123')],
);

TDD in Backend Services

A typical controller test looks like this:

describe('UsersController', () => {
  let controller: UsersController;
  let usersService: jest.Mocked<UsersService>;

  beforeEach(async () => {
    const module = await Test.createTestingModule({
      controllers: [UsersController],
      providers: [
        { provide: UsersService, useValue: { findById: jest.fn() } },
      ],
    }).compile();

    controller = module.get(UsersController);
    usersService = module.get(UsersService);
  });

  it('returns a user when found', async () => {
    usersService.findById.mockResolvedValue({ id: '1', name: 'Alice' });
    const result = await controller.getUser('1');
    expect(result).toEqual({ id: '1', name: 'Alice' });
  });

  it('throws NotFoundException when user does not exist', async () => {
    usersService.findById.mockResolvedValue(null);
    await expect(controller.getUser('999')).rejects.toThrow(NotFoundException);
  });
});

When TDD Is Less Effective

TDD is a tool, not a dogma. There are situations where writing tests first adds friction without proportional benefit.

The Long-Term Return

Ready to discuss your project?

Get in Touch →

Related Services from Stripe Systems

Stripe Systems helps teams put the patterns covered in this article into production.

Custom Software Development

Purpose-built software designed around your business logic, data workflows, and operational requirements.

Learn more →

Quality Assurance & Testing

Structured testing practices integrated into the development lifecycle, not bolted on at the end.

Learn more →

← Back to Blog

AI/MLFebruary 28, 2026

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Software DevelopmentFebruary 10, 2026

Agile vs Waterfall — Choosing the Right Methodology for Your Project

Engineering CultureMarch 5, 2026

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

Engineering CultureFebruary 5, 2026

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

Quality AssuranceMarch 15, 2026

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI/MLMarch 19, 2026

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

Engineering CultureMarch 20, 2026

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

Backend DevelopmentJanuary 15, 2026

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

Cloud ComputingFebruary 24, 2026

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

Cloud ComputingFebruary 15, 2026

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Mobile DevelopmentMarch 1, 2026

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

DevOpsMarch 7, 2026

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Backend DevelopmentJanuary 29, 2026

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

Mobile DevelopmentJanuary 6, 2026

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

DevOpsFebruary 28, 2026

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

DevOpsJanuary 23, 2026

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Backend DevelopmentJanuary 18, 2026

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

Cloud ComputingMarch 5, 2026

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Mobile DevelopmentJanuary 10, 2026

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

DevOpsMarch 13, 2026

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

DevSecOpsFebruary 18, 2026

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Backend DevelopmentFebruary 10, 2026

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

DevOpsJanuary 25, 2026

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

AI/MLJanuary 18, 2026

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Frontend DevelopmentMarch 2, 2026

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Software DevelopmentJanuary 9, 2026

Microservices vs Monolith — Making the Right Architecture Decision

Cloud ComputingMarch 22, 2026

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

Backend DevelopmentFebruary 21, 2026

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Staff AugmentationFebruary 27, 2026

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Frontend DevelopmentFebruary 4, 2026

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Staff AugmentationFebruary 1, 2026

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

Staff AugmentationFebruary 10, 2026

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Staff AugmentationMarch 15, 2026

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

AI/MLMarch 10, 2026

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Engineering CultureMarch 25, 2026

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

Staff AugmentationJanuary 5, 2026

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Frontend DevelopmentMarch 16, 2026

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Staff AugmentationFebruary 13, 2026

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

DevSecOpsMarch 10, 2026

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

DevSecOpsFebruary 20, 2026

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

TechnologyJanuary 12, 2026

Staff Augmentation — A Practical Guide for Engineering Leaders

Frontend DevelopmentJanuary 26, 2026

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

DevOpsFebruary 15, 2026

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Software DevelopmentMarch 15, 2026

Why Custom Software Development Matters for Growing Businesses

DevSecOpsJanuary 21, 2026

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Cloud ComputingFebruary 7, 2026

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

Staff AugmentationApril 28, 2026

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOpsApril 28, 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

The Red-Green-Refactor Cycle

Step 1: Red — Write a Failing Test

Step 2: Green — Make It Pass

Step 3: Refactor

How TDD Drives Better API Design

TDD at Different Test Layers

Unit Tests — Isolated Business Logic

Integration Tests — Boundary Verification

End-to-End Tests — User Journey Validation

Addressing Common Objections

"TDD Is Slower"

"TDD Doesn't Work for UI"

"We'll Add Tests Later"

TDD in Flutter

TDD in Backend Services

When TDD Is Less Effective

The Long-Term Return

Related Services from Stripe Systems

Custom Software Development

Quality Assurance & Testing

More Articles

Agentic AI in the Enterprise: Designing Multi-Agent Systems with LangGraph and Tool Orchestration

Agile vs Waterfall — Choosing the Right Methodology for Your Project

AI-Assisted Code Review at Scale: How We Cut Review Cycle Time by 60% Without Sacrificing Architecture Standards

The AI-Augmented SDLC: How We've Embedded AI at Every Phase — From Requirements to Deployment

How AI Is Transforming Automated Testing — Unit Tests, Code Coverage, and E2E Integration

AI Code Review Agents: How We Built a Custom Pipeline That Catches Architecture Violations, Not Just Bugs

How Our Engineering Team Uses AI Tools Daily to Ship Faster, Catch More Bugs, and Write Better Code — A Practitioner's Honest Breakdown

API Gateway Patterns: BFF vs Aggregator vs Direct — Choosing for Your Stack

AWS Lambda Cold Starts — Root Causes, Benchmarks, and 7 Proven Mitigation Strategies

AWS vs Azure vs GCP for Startups in 2026 — An Honest Cost and Capability Breakdown

Choosing the Right Mobile Development Approach: Native vs Cross-Platform

Building a Production-Grade CI/CD Pipeline for a Monorepo (GitHub Actions + Docker + Kubernetes)

Clean Architecture in .NET 8 — Structuring Enterprise Apps That Scale Without Rot

CLEAN Architecture in Flutter — BLoC vs Riverpod for State Management

How DevOps and DevSecOps Integrate Into Enterprise Product Development From Day One

Docker Image Hardening for Production — Distroless, Non-Root Users, and Layer Optimization

Event-Driven Architecture with Kafka, NestJS, and Outbox Pattern — A Production Walkthrough

FinOps in Practice: How We Cut a Client's AWS Bill by 40% Without Touching Their Codebase

Flutter vs React Native in 2026 — A Deep Technical Comparison for Enterprise Apps

GitOps with ArgoCD and Terraform: The Infrastructure Deployment Workflow That Eliminates Drift

Infrastructure as Code Security: Detecting Misconfigurations with Checkov and OPA Before Deployment

Java Virtual Threads (Project Loom) vs Node.js — Concurrency Models Compared for Backend Engineers

Kubernetes Multi-Tenancy Patterns — Namespace Isolation vs Virtual Clusters vs Separate Clusters

LLM Cost Optimization at Scale — Prompt Caching, Model Routing, and Batch Inference in Production

Micro-Frontend Architecture at Scale: Module Federation with React and Webpack 5

Microservices vs Monolith — Making the Right Architecture Decision

Multi-Cloud Architecture: Avoiding Vendor Lock-in Without Sacrificing Performance

NestJS Microservices with gRPC — Architecture Patterns for High-Throughput APIs

Why an Offshore Development Centre (ODC) Beats a Distributed Freelance Model — And How Stripe Systems Sets One Up

Building Offline-First PWAs with Next.js, Service Workers, and IndexedDB

Beyond Cost Arbitrage: How Stripe Systems' Offshore Teams Deliver Senior-Level Architecture, Not Just Execution

How to Onboard an Augmented Team Without Losing Velocity — A 90-Day Playbook for Engineering Leads

Onshore vs Offshore vs Nearshore Augmentation — A Decision Framework for CTOs Beyond Just Cost

Building Production-Ready RAG Pipelines — Chunking Strategies, Vector DBs, and Evaluation Frameworks

Prompt Engineering for Software Teams: The Internal Playbook We Built to Maximize Developer Output with LLMs

The Real ROI of Offshore vs Nearshore vs Onshore Augmentation — A Data-Driven Cost-Benefit Framework for Engineering Leaders

Server Components vs Client Components in Next.js 14 — When to Use Which (And Why Most Teams Get It Wrong)

Setting Up an ODC in India: Legal, Compliance, HR, and Infrastructure — What CTOs and Founders Actually Need to Know

Shifting Security Left: Integrating SAST, DAST, and Secret Scanning into Your CI/CD Pipeline

SOC 2 Type II for Engineering Teams — What Developers Actually Need to Build and Change

Staff Augmentation — A Practical Guide for Engineering Leaders

State Management Showdown: Zustand vs Redux Toolkit vs Jotai for Large React Codebases

Terraform at Scale: Remote State, Workspaces, and Module Versioning for Multi-Team Environments

Why Custom Software Development Matters for Growing Businesses

Zero-Trust API Security — mTLS, JWT Validation, and Rate Limiting in a Kubernetes-Native Stack

Building a Zero-Trust Network on GCP with VPC Service Controls and Identity-Aware Proxy

2026 Global Software Engineering Rate Benchmark — India vs US vs UK vs LATAM vs Eastern Europe

DevOps Maturity Benchmarks: What Top 1% Engineering Teams Do Differently in 2026

Why Test-Driven Development Is Non-Negotiable in Our Engineering Process

The Red-Green-Refactor Cycle

Step 1: Red — Write a Failing Test

Step 2: Green — Make It Pass

Step 3: Refactor

How TDD Drives Better API Design

TDD at Different Test Layers

Unit Tests — Isolated Business Logic

Integration Tests — Boundary Verification

End-to-End Tests — User Journey Validation