Back to articles
February 4, 2026Christian Barra9 min read

How to Reduce Flaky Tests: A Practical Guide for Engineering Teams

Eliminate flaky tests that undermine your CI/CD pipeline. Learn proven strategies to identify, fix, and prevent test flakiness in your codebase.

testingreliabilityautomationflaky-tests
Test stability dashboard showing improved reliability metrics

Every engineering team has experienced the frustration. A test passes locally but fails in CI. The same test fails on Tuesday but passes on Wednesday. Engineers investigate for hours, find no real bug, and eventually just re-run the pipeline until it passes. Welcome to the world of flaky tests.

Test flakiness is more than an annoyance—it’s a serious threat to engineering productivity and software quality. When tests fail unpredictably, teams lose trust in their test suite. They start ignoring failures, missing real bugs hidden among the noise. CI pipelines become games of chance rather than reliable quality gates.

This guide provides practical strategies to identify, fix, and prevent flaky tests. Whether you’re dealing with a handful of unstable tests or a systemic flakiness problem, these approaches will help you restore trust in your testing infrastructure.

Understanding What Makes Tests Flaky

Before fixing flaky tests, you need to understand why they happen. Test flakiness occurs when a test produces different results on different runs despite no changes to the code under test. The causes fall into several categories.

Timing and Race Conditions

The most common source of flakiness is timing-related issues. Tests that depend on specific execution timing, sleep statements, or operations completing in a certain order will eventually fail when that timing varies.

Consider a UI test that clicks a button and immediately checks for a result. If the JavaScript handling that click runs asynchronously, the check might happen before the handler completes. The test passes when the system is fast and fails when it’s under load.

External Dependencies

Tests that rely on external services, databases, or network resources inherit the reliability of those dependencies. An API that responds slowly once per thousand requests will cause occasional test failures. A database connection that times out under load will flake.

Shared State

When tests share state—database records, file system resources, or global variables—they can interfere with each other. Test A might pass when run alone but fail when Test B runs first and modifies shared data. This creates execution-order dependencies that are difficult to diagnose.

Resource Constraints

Tests that work fine on a developer’s powerful laptop might flake on a resource-constrained CI runner. Memory limits, CPU contention, or disk I/O bottlenecks can cause timeouts or unexpected behavior that only manifests under specific conditions.

Environmental Differences

Differences between environments—time zones, locale settings, operating systems, library versions—can cause inconsistent behavior. A test that relies on date formatting might pass on one system and fail on another.

Identifying Flaky Tests in Your Suite

You can’t fix what you can’t measure. Building visibility into test flakiness is the first step toward addressing it.

Track Test Results Over Time

Collect historical data on every test execution. Track which tests pass, which fail, and how often. Tests with varying outcomes over recent runs without corresponding code changes are flaky. A test that fails 5% of the time is flaky even if it usually passes.

Many CI platforms provide this data, but you may need additional tooling for analysis. Create dashboards showing flakiness rates per test and trends over time.

Implement Automatic Retry Detection

Configure your test runner to retry failed tests automatically, then flag tests that pass on retry. These tests are definitionally flaky—they produced different results on the same code. Retries can keep your pipeline moving, but track these occurrences for later investigation.

Run Tests Multiple Times

For suspected flaky tests, run them repeatedly in isolation. Execute the same test 100 times. If it fails even once, you’ve confirmed flakiness. This approach is particularly useful when triaging reported flakiness.

Monitor Failure Patterns

Look for patterns in when failures occur. Tests that only fail on certain CI runners might indicate resource constraints. Tests that fail at specific times might depend on time-zone-sensitive code. Tests that fail after other specific tests might have shared state issues.

Fixing Common Causes of Flakiness

Once you’ve identified flaky tests, systematic debugging helps resolve them efficiently.

Replace Sleep with Explicit Waits

Arbitrary sleep statements are a leading cause of test flakiness. A 2-second sleep might work on fast systems but fail on slower ones. If the operation sometimes takes 3 seconds, the test breaks.

Replace sleeps with explicit waits that poll for expected conditions:

// Flaky: arbitrary sleep
await page.click('#submit-button');
await sleep(2000);
expect(await page.getText('#result')).toBe('Success');

// Stable: explicit wait for condition
await page.click('#submit-button');
await page.waitForSelector('#result:has-text("Success")', {
  timeout: 10000,
});

Explicit waits return as soon as the condition is met, making tests both more reliable and faster on average.

Isolate External Dependencies

For tests that must interact with external services, consider these approaches:

Mock external calls: Use test doubles that return controlled responses. This eliminates network variability and external service unreliability.

Use containers: Run dependencies locally in containers. A local PostgreSQL container is more reliable than a shared staging database.

Implement retry logic: When testing integration with genuinely external services, build retry logic into the test infrastructure. Allow a few retries with exponential backoff for network transients.

Eliminate Shared State

Each test should set up its own data and not depend on state from other tests. Implement proper test isolation:

Use unique identifiers: Generate unique IDs for test data to avoid conflicts with parallel tests.

Clean up after tests: Remove created data in teardown steps, or use transaction rollback to restore database state.

Avoid global state: Don’t rely on module-level variables that persist across tests. Initialize state explicitly in each test.

# Problematic: relies on previous test state
def test_get_user():
    user = get_user_by_email("[email protected]")  # Created by previous test?
    assert user is not None

# Better: creates own data
def test_get_user():
    user = UserFactory.create(email="[email protected]")
    fetched = get_user_by_email(user.email)
    assert fetched.id == user.id

Handle Asynchronous Operations

Modern applications are highly asynchronous. Tests must account for this:

Wait for network activity to settle: Before asserting on page content, wait for pending API calls to complete.

Use async-aware assertions: Many testing frameworks offer assertions that retry until passing or timeout. Use these for checking conditions that might take time to become true.

Avoid timing assumptions: Don’t assume operations complete in any particular order unless the application explicitly guarantees it.

Stabilize Time-Dependent Tests

Tests that depend on the current time are inherently unstable. A test that passes at 11 AM might fail at 11:59 AM if it crosses a date boundary.

Mock the clock: Use time-mocking utilities to control what time the application sees during tests. Libraries like freezegun (Python) or sinon (JavaScript) make this straightforward.

Avoid boundary conditions: If mocking isn’t practical, avoid testing near boundaries. Don’t test date logic at 11:59 PM or time zone logic at UTC offset boundaries.

Preventing Future Flakiness

Fixing existing flaky tests is essential, but preventing new ones saves more effort long-term.

Establish Coding Standards

Document and enforce patterns that produce stable tests. Prohibit arbitrary sleeps. Require explicit waits. Mandate test isolation. Code review should catch violations before they merge.

Quarantine Flaky Tests

When flakiness is discovered, immediately quarantine the affected test. Move it to a separate suite that runs but doesn’t block deployment. This prevents flaky tests from disrupting the pipeline while investigation continues.

Set time limits for quarantined tests. If a test isn’t fixed within a defined period, decide: fix it, delete it, or accept that it’s a monitoring test rather than a gate.

Invest in Test Infrastructure

Flakiness often stems from inadequate infrastructure. Consistent CI environments, sufficient resources, and reliable test data pipelines all reduce environmental causes of flakiness.

Consider dedicated test databases rather than shared staging environments. Use container-based CI runners for consistency. Ensure CI machines have resources comparable to development machines.

Build Flakiness Detection Into CI

Configure your CI system to detect and report flakiness automatically. Some approaches:

Auto-retry with tagging: Retry failed tests once, but tag tests that needed retry. Track these in dashboards.

Periodic stability runs: Regularly run your entire suite multiple times on unchanged code. Any failures indicate flakiness.

Trend alerting: Alert when flakiness metrics cross thresholds. Don’t wait until the problem is severe.

INTERNAL LINK: Test automation best practices for reliable pipelines

Advanced Strategies for Stubborn Flakiness

Some flakiness resists basic fixes. These advanced strategies address harder problems.

Reproduce with Deterministic Builds

When flakiness is hard to reproduce, focus on eliminating environmental variables. Use containerized builds with locked dependencies. Pin library versions exactly. Control environment variables. The goal is bit-for-bit identical execution environments.

Profile Resource Usage

Flakiness under load often indicates resource exhaustion. Profile test execution to identify memory leaks, connection pool exhaustion, or CPU spikes. Tools like memory profilers and connection monitors reveal issues invisible in normal execution.

Implement Chaos Testing

Deliberately introduce failures to find tests that can’t handle them. Slow down network calls, inject random delays, or kill processes unexpectedly. Tests that handle chaos gracefully are robust; tests that break reveal hidden fragility.

Consider AI-Powered Solutions

Modern AI testing tools can help identify and resolve flakiness:

Self-healing locators: AI can identify when element selectors fail due to minor UI changes and automatically update them.

Intelligent waits: AI can learn application behavior patterns and wait for the right conditions without explicit configuration.

Pattern detection: Machine learning can identify failure patterns humans might miss, suggesting root causes for investigation.

INTERNAL LINK: How AI testing is transforming test reliability

Measuring Progress

Reducing flakiness is a continuous process. Track metrics to ensure you’re making progress.

Flakiness Rate

Calculate the percentage of test runs that exhibit flaky behavior. A test with 95% pass rate has 5% flakiness. Track this aggregate across your suite and for individual tests.

Target 0% flakiness for critical paths. Accept slightly higher rates for tests that provide valuable coverage despite occasional failures. But any flakiness above 1-2% is typically unacceptable for blocking tests.

Mean Time to Detect Flakiness

How quickly do you identify new flaky tests? Faster detection means faster resolution and less pipeline disruption. Target detecting flakiness within a day of introduction.

Quarantine Duration

How long do tests stay quarantined? Extended quarantine suggests inadequate investment in fixes. Track average quarantine duration and set targets for resolution time.

Pipeline Success Rate

The ultimate metric: how often does your CI pipeline succeed without retries? Improving this number confirms that flakiness efforts are delivering value.

Building Team Commitment

Technical solutions only work with team commitment. Flakiness reduction requires sustained effort.

Make Flakiness Visible

Display flakiness metrics prominently. Weekly reports on worst offenders, dashboards in team areas, and trends in sprint retrospectives all keep attention focused.

Allocate Dedicated Time

Flaky test fixes compete with feature work. Without dedicated allocation, they’ll always lose. Reserve explicit time—perhaps 10% of each sprint—for test reliability improvements.

Celebrate Improvements

Acknowledge progress. When the pipeline hasn’t flaked in a week, recognize the achievement. When a chronically flaky test gets fixed, thank the engineer who resolved it.

Test flakiness is a solvable problem. With systematic identification, targeted fixes, and preventive practices, engineering teams can restore trust in their test suites. The investment pays dividends in faster pipelines, more confident releases, and engineers who spend time building rather than babysitting tests.

Dear Machines uses AI to detect and prevent test instability before it impacts your pipeline. If flaky tests are slowing your team, see how Dear Machines approaches test reliability.