Most teams agree that automated tests are valuable. Far fewer teams write the tests before the implementation. The gap between those two positions is where the majority of preventable defects live.
We practice test-driven development across our engineering teams โ in Flutter mobile applications, NestJS backend services, and infrastructure automation. This is not a theoretical preference. It is a deliberate process decision based on measurable outcomes: fewer production defects, more maintainable codebases, and faster onboarding for new engineers who can read the test suite as living documentation.
This post describes how TDD works in practice, where it helps the most, and where it does not apply.
The Red-Green-Refactor Cycle
TDD follows a tight loop: write a failing test, write the minimum code to make it pass, then refactor while keeping the test green. Each iteration is small โ often under five minutes.
Here is a concrete example. Suppose we are building an authentication service that validates user credentials and returns a session token.
Step 1: Red โ Write a Failing Test
Before writing any implementation, we write a test that describes the behavior we want:
test('returns a session token when credentials are valid', () async {
final authService = AuthService(
userRepository: FakeUserRepository(
users: [User(email: '[email protected]', passwordHash: hashOf('correct-password'))],
),
tokenGenerator: FakeTokenGenerator(fixedToken: 'abc-123'),
);
final result = await authService.authenticate(
email: '[email protected]',
password: 'correct-password',
);
expect(result, equals(AuthResult.success(token: 'abc-123')));
});
This test will not compile. AuthService, AuthResult, FakeUserRepository, and FakeTokenGenerator do not exist yet. That is the point. We have defined the interface before writing the implementation.
Step 2: Green โ Make It Pass
Now we write the minimum code required to pass this test. We create the AuthService class, define the AuthResult type, and implement just enough logic to verify the password and return the token. No session management, no rate limiting, no audit logging โ those are separate test cases.
Step 3: Refactor
With a passing test, we can restructure the code with confidence. Maybe we extract the password verification into a dedicated PasswordVerifier class, or we change AuthResult from a class hierarchy to a sealed union. The test tells us immediately if the refactoring breaks anything.
We then write the next test โ perhaps for invalid credentials:
test('returns failure when password does not match', () async {
final authService = AuthService(
userRepository: FakeUserRepository(
users: [User(email: '[email protected]', passwordHash: hashOf('correct-password'))],
),
tokenGenerator: FakeTokenGenerator(fixedToken: 'abc-123'),
);
final result = await authService.authenticate(
email: '[email protected]',
password: 'wrong-password',
);
expect(result, equals(AuthResult.invalidCredentials()));
});
Each test adds one new behavior. After ten or fifteen iterations, we have a fully specified authentication service with comprehensive test coverage โ and every edge case is documented in the test file.
How TDD Drives Better API Design
Writing the test first forces you to become the caller before you become the implementer. This inversion is the single most important design benefit of TDD.
When you write the implementation first, you tend to expose whatever internal structure felt convenient during development. When you write the test first, you immediately encounter questions that shape the public API:
- โWhat arguments does this function need? If the test setup requires passing eight parameters, the interface is too complex. You will simplify it before writing a single line of production code.
- โWhat does the return type look like? Returning a raw
Map<String, dynamic>feels fine during implementation. When you writeexpect(result.token, equals('abc-123'))in a test, you realize you need a typed response object. - โHow does error handling work? Writing
expect(() => service.authenticate(...), throwsA(isA<RateLimitException>()))forces you to decide on the error contract upfront rather than letting exceptions leak from implementation details.
In our experience, services designed test-first have more consistent interfaces, fewer breaking changes during development, and clearer separation of concerns. The test acts as the first client of your API. If it is awkward to test, it will be awkward to use.
TDD at Different Test Layers
TDD is not limited to unit tests. The red-green-refactor cycle applies at multiple levels of the testing pyramid, though the cost and speed differ at each layer.
Unit Tests โ Isolated Business Logic
Unit tests are where TDD delivers the most value per minute invested. They run in milliseconds, require no external dependencies, and give precise feedback about which behavior broke.
We use fakes and stubs (not mocks, where possible) to isolate the unit under test. In the authentication example above, FakeUserRepository is a simple in-memory implementation of the repository interface. This approach keeps tests fast and readable without coupling them to a mocking framework's API.
The target for unit tests is pure business logic: validation rules, state machines, data transformations, and domain calculations. If a class has no dependencies on databases, network calls, or file systems, it should be tested at this level.
Integration Tests โ Boundary Verification
Integration tests verify that components interact correctly across boundaries: database queries return expected results, HTTP clients serialize requests properly, message queues deliver payloads in the right format.
TDD for integration tests follows the same cycle but with a longer feedback loop. We write the test first, describing the expected interaction, then implement the adapter or repository that makes it pass. These tests typically run against a real database (often a Docker container spun up in CI) rather than an in-memory fake.
For example, when building a PostgreSQL-backed user repository, we write:
it('returns null when the user does not exist', async () => {
const repo = new PostgresUserRepository(testDataSource);
const user = await repo.findByEmail('[email protected]');
expect(user).toBeNull();
});
This test drives the actual SQL query implementation and catches issues like incorrect column mappings or missing null handling that unit tests with fakes would miss.
End-to-End Tests โ User Journey Validation
E2E tests verify complete user workflows: a user signs up, receives a confirmation email, logs in, and sees their dashboard. These are the slowest and most expensive tests, so we write fewer of them and reserve them for critical paths.
TDD is still applicable here, but the cycle is longer. We define the expected user journey in a test, then build the features to make it pass. In Flutter, we use integration tests with IntegrationTestWidgetsFlutterBinding. For web applications, we use Playwright. The failing E2E test serves as the acceptance criterion for the feature.
Addressing Common Objections
"TDD Is Slower"
This is partially true in the short term. Empirical studies support this โ and also support the trade-off being worth it.
A study conducted across multiple teams at Microsoft and IBM, published in the journal Empirical Software Engineering (Nagappan et al., 2008), measured the impact of TDD on four product teams. The results showed a 40โ90% reduction in pre-release defect density compared to similar projects that did not use TDD. The corresponding increase in development time was 15โ35%.
The arithmetic favors TDD in any project that will be maintained for more than a few months. A 25% increase in initial development time is easily recovered when you are not spending weeks debugging integration failures, writing regression tests after the fact, or rewriting brittle code that nobody dares refactor.
The overhead also decreases with practice. Engineers who have written tests first for six months are measurably faster than they were at the start โ the design thinking that TDD requires becomes habitual.
"TDD Doesn't Work for UI"
This objection conflates visual layout with UI behavior. TDD is not effective for tweaking padding and color values. It is effective for testing widget behavior, state transitions, and conditional rendering.
In Flutter, WidgetTester allows us to write tests that pump widgets, tap buttons, enter text, and verify the resulting widget tree:
testWidgets('shows error message when login fails', (tester) async {
await tester.pumpWidget(
MaterialApp(home: LoginScreen(authService: FakeFailingAuthService())),
);
await tester.enterText(find.byKey(Key('email-field')), '[email protected]');
await tester.enterText(find.byKey(Key('password-field')), 'wrong');
await tester.tap(find.byKey(Key('login-button')));
await tester.pumpAndSettle();
expect(find.text('Invalid credentials'), findsOneWidget);
});
This test drives the implementation of the LoginScreen widget. It does not care about fonts or spacing โ it verifies that the widget responds correctly to user input and displays the right feedback.
Golden tests (screenshot comparison) can catch visual regressions, but they are fragile across platforms and should not be the primary testing strategy. We use them sparingly for components where pixel-level accuracy matters, like charts or branded layouts.
"We'll Add Tests Later"
In our experience, this almost never happens. Code written without tests tends to be structurally untestable โ it mixes business logic with I/O, uses concrete dependencies instead of abstractions, and relies on global state.
Retrofitting tests onto untestable code requires refactoring the code first. But without existing tests, that refactoring is risky. This is the testing deadlock: you cannot add tests without refactoring, and you cannot refactor safely without tests. TDD avoids this trap entirely by ensuring every piece of code is testable from the start, because it was literally written to satisfy a test.
There is also a prioritization problem. When a feature ships without tests, the next task is always another feature, not writing tests for the last one. The test backlog grows monotonically. We have inherited codebases with zero test coverage that took months to stabilize โ time that would have been saved if TDD had been the default from the beginning.
TDD in Flutter
Flutter's testing tools make TDD practical across the full widget tree.
Unit tests cover business logic, data models, and utility functions. These are plain Dart tests with no Flutter dependency.
Widget tests use WidgetTester to render individual widgets in isolation. They run on a headless test environment and execute in seconds. We test-drive every interactive widget: forms, lists, navigation flows, conditional UI. The test defines what the widget should render given specific inputs and interactions, then we build the widget to satisfy it.
Bloc tests use the bloc_test package to verify state management logic. The pattern maps directly to TDD:
blocTest<AuthBloc, AuthState>(
'emits [loading, authenticated] when login succeeds',
build: () => AuthBloc(authService: FakeSuccessAuthService()),
act: (bloc) => bloc.add(LoginRequested(email: '[email protected]', password: 'pass')),
expect: () => [AuthLoading(), Authenticated(token: 'abc-123')],
);
The blocTest helper makes the red-green-refactor cycle fast: define the expected state sequence, run the test (red), implement the bloc event handler (green), refactor the internal logic while keeping the state contract stable.
TDD in Backend Services
Our NestJS backend services follow the same discipline. NestJS's dependency injection system makes TDD straightforward โ every controller and service receives its dependencies through the constructor, so substituting test doubles is trivial.
A typical controller test looks like this:
describe('UsersController', () => {
let controller: UsersController;
let usersService: jest.Mocked<UsersService>;
beforeEach(async () => {
const module = await Test.createTestingModule({
controllers: [UsersController],
providers: [
{ provide: UsersService, useValue: { findById: jest.fn() } },
],
}).compile();
controller = module.get(UsersController);
usersService = module.get(UsersService);
});
it('returns a user when found', async () => {
usersService.findById.mockResolvedValue({ id: '1', name: 'Alice' });
const result = await controller.getUser('1');
expect(result).toEqual({ id: '1', name: 'Alice' });
});
it('throws NotFoundException when user does not exist', async () => {
usersService.findById.mockResolvedValue(null);
await expect(controller.getUser('999')).rejects.toThrow(NotFoundException);
});
});
We write these tests before implementing the controller methods. The test defines the HTTP contract โ what the endpoint returns for valid input, what error it throws for missing resources โ and the implementation follows.
For database access, we use the repository pattern to isolate persistence logic behind an interface. Repository implementations are tested with integration tests against a real PostgreSQL instance. Service and controller tests use in-memory fakes or mocks of the repository interface, keeping them fast and deterministic.
When TDD Is Less Effective
TDD is a tool, not a dogma. There are situations where writing tests first adds friction without proportional benefit.
Exploratory prototyping. When we are investigating whether an approach is feasible โ experimenting with a new API, testing a library's behavior, or spiking a proof of concept โ writing tests first slows down the exploration. We prototype without tests, then rewrite with TDD once the approach is validated. The key discipline is that prototype code does not ship. If it is worth building, it is worth building again with tests.
Pure visual layout. Adjusting CSS grid configurations, tweaking spacing values, or selecting color palettes are visual decisions verified by looking at the screen, not by assertions in code. Widget behavior is testable; widget aesthetics are not.
One-off scripts and migrations. A data migration script that runs once and is discarded does not benefit from the TDD cycle. We still test critical transformations manually, but investing in a full test suite for throwaway code is not a good use of time.
Third-party integration spikes. When integrating with a poorly documented external API, the first step is usually exploratory โ sending requests, inspecting responses, and understanding the actual behavior. Once the API's behavior is understood, we write tests that codify those expectations and then build the integration adapter using TDD.
The Long-Term Return
The value of TDD compounds over time. At month one, you have a tested authentication service. At month twelve, you have a codebase where every module has a corresponding test file that documents its behavior, every edge case is captured in an assertion, and any engineer can refactor with confidence because the test suite will catch regressions in seconds.
The alternative โ a codebase with sparse or absent test coverage โ becomes progressively harder to change. Engineers lose confidence. Deployments become stressful. Bug fixes introduce new bugs. The cost of change increases until the team spends more time working around the code than working on the product.
TDD is not free. It requires discipline, practice, and organizational commitment. But in our experience, the engineering teams that practice TDD consistently ship more reliable software, onboard new developers faster, and maintain velocity over the long term. The initial investment in writing tests first pays for itself within the first few months of a project's life.
Ready to discuss your project?
Get in Touch โ