Flaky Test Detector
Identify, diagnose, and eliminate flaky tests. Runs your test suite multiple times, isolates non-deterministic failures, categorizes root causes, and applies targeted fixes.
Pipeline
- Detect runner & configuration
- Harvest flaky candidates via repeated runs
- Analyze failure patterns to identify root cause
- Fix each flakiness category with proven strategies
- Verify fixes by re-running the detection loop
Phase 1: Detect runner and configuration
Read project setup:
cat package.json
Identify the test runner:
| Indicator | Runner | Base command |
|---|---|---|
"vitest" in devDependencies | Vitest | npx vitest |
"jest" in devDependencies | Jest | npx jest |
"react-scripts" in devDependencies | react-scripts | npx react-scripts test |
"@playwright/test" in devDependencies | Playwright | npx playwright test |
"cypress" in devDependencies | Cypress | npx cypress run |
"mocha" in devDependencies | Mocha | npx mocha |
"ava" in devDependencies | AVA | npx ava |
Check existing retry/flaky config:
- Vitest:
retryinvitest.config.ts - Jest:
testRetryinjest.config.tsorjest.retryTimes()in setup - Playwright:
retriesinplaywright.config.ts - Cypress:
retriesincypress.config.ts
Phase 2: Harvest flaky candidates
Run the full suite multiple times and collect failures:
# Run 3+ times, log each result
for i in 1 2 3; do
echo "=== RUN $i ==="
npx vitest --reporter=json 2>/dev/null | tee ".flaky-run-$i.json"
done
For a quick flaky check on the last failed run:
npx vitest --reporter=json 2>/dev/null | tee ".flaky-last-run.json"
For Jest:
for i in 1 2 3; do
echo "=== RUN $i ==="
npx jest --json 2>/dev/null | tee ".flaky-run-$i.json"
done
For Playwright:
for i in 1 2 3; do
echo "=== RUN $i ==="
npx playwright test --reporter=json 2>/dev/null | tee ".flaky-run-$i.json"
done
Collect test results across runs and identify:
- Always passes — stable, exclude from analysis
- Always fails — broken test, report separately
- Sometimes passes, sometimes fails — FLAKY candidate
Phase 3: Analyze root cause
For each flaky candidate, categorize by symptom:
Category A: Async timing
Symptoms:
- Test passes locally but fails on CI (slower environments)
- Failure mentions timeout,
setTimeout,setInterval,requestAnimationFrame - DOM assertions fail intermittently in browser tests
Diagnosis:
# Grep the test file for async patterns
grep -n "setTimeout\|setInterval\|waitFor\|delay\|sleep\|requestAnimationFrame\|transition\|animation" <test-file>
Look for:
- Fixed timeouts instead of polling/retrying
- Missing
waitFororexpect.pollin Vitest - Missing
waitFor/findBy*in Testing Library - Assertions before async operations complete
Category B: Shared mutable state
Symptoms:
- Failures depend on test order (run
--shardor--bail 1changes results) - Tests pass in isolation but fail in full suite
- Module-level variables, singletons, globals
Diagnosis:
# Check for shared state patterns
grep -n "let [a-z]* = \|var [a-z]* = \|const [a-z]* = \|global\." <test-file> | head -20
Look for:
- Test-level variables not reset between tests
- Mock state leaking across tests (
vi.fn()/jest.fn()not cleared) - Environment variable mutations without restore
- Filesystem or database state not cleaned up
Category C: Test isolation
Symptoms:
- Test passes with
--testPathPattern=<single-file>but fails in full suite - Test fails when specific other tests run before it
Diagnosis: Run tests in random order to confirm:
# Vitest (sequential + random)
npx vitest --sequence=sequential --reporter=json 2>/dev/null
npx vitest --sequence=random --reporter=json 2>/dev/null
Look for:
- Missing
beforeEach/afterEachcleanup - Tests modifying DOM or globals without restoring
- Database/seeded data not isolated per test
- Mock implementations leaking (
vi.restoreAllMocks()/jest.restoreAllMocks()missing)
Category D: Environment sensitivity
Symptoms:
- Fails on CI only (never locally)
- Fails on certain OS/browser combinations
- Fails at specific times (timezone, date-dependent)
Diagnosis: Look for:
- Hardcoded time/date values
- Locale-specific formatting
- Timezone-dependent assertions
- Screen resolution, viewport size assumptions
- Node.js/Python version-specific behavior
- Network-dependent tests (API calls, rate limiting)
Category E: Race conditions
Symptoms:
- Failures involve concurrent operations, Web Workers, or parallel requests
Promise.all,Promise.race,setTimeoutwith 0ms delay- Event listeners on shared emitters
Diagnosis:
grep -n "Promise\.all\|Promise\.race\|Promise\.any\|Promise\.allSettled\|setTimeout\|setImmediate\|process\.nextTick" <test-file>
Phase 4: Apply fixes
Apply fix based on the detected category:
Fix A: Async timing
Replace brittle timing patterns:
- await new Promise(r => setTimeout(r, 1000))
- expect(screen.getByText('Loaded')).toBeInTheDocument()
+ await waitFor(() => {
+ expect(screen.getByText('Loaded')).toBeInTheDocument()
+ }, { timeout: 5000, interval: 100 })
For Vitest, use expect.poll or waitFor:
import { waitFor } from '@testing-library/react'
// or Vitest built-in:
await expect.poll(() => getStatus(), { timeout: 5000, interval: 100 }).toBe('done')
For Playwright, use auto-waiting locators:
// Bad:
await page.waitForTimeout(2000)
expect(await page.textContent('.status')).toBe('done')
// Good:
await expect(page.locator('.status')).toHaveText('done', { timeout: 10000 })
Fix B: Shared mutable state
+ import { vi } from 'vitest'
+
+ beforeEach(() => {
+ vi.clearAllMocks()
+ })
+
+ afterEach(() => {
+ vi.restoreAllMocks()
+ })
Reset module-level state:
beforeEach(() => {
// Reset counters, caches, singletons
counter = 0
cache.clear()
})
Isolate runtime configuration mutations:
const ORIGINAL_RUNTIME_MODE = getRuntimeMode()
afterEach(() => {
setRuntimeMode(ORIGINAL_RUNTIME_MODE)
})
Fix C: Test isolation
+ beforeEach(() => {
+ document.body.innerHTML = ''
+ localStorage.clear()
+ })
+
+ afterEach(() => {
+ vi.restoreAllMocks()
+ vi.useRealTimers()
+ })
For database-backed tests:
beforeEach(async () => {
await db.migrate.latest()
await db.seed.run()
})
afterEach(async () => {
await db.migrate.rollback()
})
Fix D: Environment sensitivity
Mock time-dependent code:
beforeEach(() => {
vi.useFakeTimers()
vi.setSystemTime(new Date('2024-01-15T12:00:00Z'))
})
afterEach(() => {
vi.useRealTimers()
})
For locale-sensitive tests:
import { locale } from '../i18n'
beforeEach(() => {
locale.set('en-US')
})
For network-dependent tests, mock all external calls:
vi.mock('../api/client')
// or
global.fetch = vi.fn()
Fix E: Race conditions
Ensure sequential processing in tests:
// Instead of Promise.all, test sequentially
await operation1()
await operation2()
For event emitter tests:
const events: string[] = []
emitter.on('data', (d) => events.push(d))
emitter.emit('data', 'a')
emitter.emit('data', 'b')
await vi.waitFor(() => {
expect(events).toEqual(['a', 'b'])
})
Phase 5: Add retry configuration
After fixing the root cause, add retry as a safety net. Prefer CI-specific config files or CI job commands with literal retry values instead of reading runtime environment variables inside the test config.
Vitest — vitest.ci.config.ts:
import baseConfig from './vitest.config'
export default defineConfig({
...baseConfig,
test: {
...baseConfig.test,
retry: 2,
// Sequential execution for flaky tests:
sequence: {
seed: 123,
shuffle: false,
},
},
})
Run in CI with vitest --config vitest.ci.config.ts.
Or per-test/file with vi.retry():
vi.retry(2)
describe('flaky file', () => { ... })
Jest — jest.ci.config.ts (requires jest-retry):
const baseConfig = require('./jest.config')
export default {
...baseConfig,
setupFilesAfterSetup: ['jest-retry'],
retryTimes: 2,
}
Run in CI with jest --config jest.ci.config.ts.
Playwright — playwright.ci.config.ts:
import baseConfig from './playwright.config'
export default defineConfig({
...baseConfig,
retries: 2,
})
Run in CI with playwright test --config playwright.ci.config.ts.
Phase 6: Verify
Re-run the flaky detection loop:
for i in 1 2 3; do
echo "=== RUN $i ==="
npx vitest --reporter=json 2>/dev/null | tee ".flaky-verify-$i.json"
done
Compare results across runs:
- Zero flaky tests — success, report fixed count
- Remaining flaky tests — iterate back to Phase 3 with deeper analysis
FLAKY_REPORT.md generation
After fixing, generate a summary report:
# Flaky Test Report
| Test | Runner | Category | Status | Fix applied |
|------|--------|----------|--------|-------------|
| `auth.test.ts` | Vitest | Async timing | Fixed | Replaced `setTimeout` with `waitFor` |
| `api.test.ts` | Jest | Shared state | Fixed | Added `beforeEach` cleanup |
| `login.spec.ts` | Playwright | Environment | Quarantined | Needs API mock setup |
**Stable rate:** 98/100 tests (98%)
**Next step:** Review remaining flaky tests manually
Example
User: "our CI is full of flaky tests, we use vitest"
Agent actions:
- Run
npx vitest --reporter=json3 times with loop - Detect 4 tests pass/fail inconsistently across runs
- For
auth.test.ts: grep showssetTimeout(500)before assertion - Replace with
waitFor—auth.test.tsnow consistent across 3 runs - For
api.test.ts: module-level auth token leaks between tests - Add
beforeEach(() => localStorage.clear())— stable - Configure
vitest.config.tswith CI retry = 2 - Verify: all 4 formerly flaky tests pass consistently
- Show report
User: "playwright e2e tests flaky on CI"
Agent actions:
- Run
npx playwright test --reporter=json3 times, identifylogin.spec.tsflaky - Analyze: uses
page.waitForTimeout(3000)— async timing issue - Replace with
await expect(page.locator('.dashboard')).toBeVisible({ timeout: 15000 }) - Set
retries: 2inplaywright.config.tsfor CI - Also detect
user-profile.spec.ts— environment sensitivity (timezone) - Mock timezone in test setup
- Verify: 3/3 runs pass for both tests