Flaky Test Detector

Identify, diagnose, and eliminate flaky tests. Runs your test suite multiple times, isolates non-deterministic failures, categorizes root causes, and applies targeted fixes.

Pipeline

Detect runner & configuration
Harvest flaky candidates via repeated runs
Analyze failure patterns to identify root cause
Fix each flakiness category with proven strategies
Verify fixes by re-running the detection loop

Phase 1: Detect runner and configuration

Read project setup:

cat package.json

Identify the test runner:

Indicator	Runner	Base command
`"vitest"` in devDependencies	Vitest	`npx vitest`
`"jest"` in devDependencies	Jest	`npx jest`
`"react-scripts"` in devDependencies	react-scripts	`npx react-scripts test`
`"@playwright/test"` in devDependencies	Playwright	`npx playwright test`
`"cypress"` in devDependencies	Cypress	`npx cypress run`
`"mocha"` in devDependencies	Mocha	`npx mocha`
`"ava"` in devDependencies	AVA	`npx ava`

Check existing retry/flaky config:

Vitest: retry in vitest.config.ts
Jest: testRetry in jest.config.ts or jest.retryTimes() in setup
Playwright: retries in playwright.config.ts
Cypress: retries in cypress.config.ts

Phase 2: Harvest flaky candidates

Run the full suite multiple times and collect failures:

# Run 3+ times, log each result
for i in 1 2 3; do
  echo "=== RUN $i ==="
  npx vitest --reporter=json 2>/dev/null | tee ".flaky-run-$i.json"
done

For a quick flaky check on the last failed run:

npx vitest --reporter=json 2>/dev/null | tee ".flaky-last-run.json"

For Jest:

for i in 1 2 3; do
  echo "=== RUN $i ==="
  npx jest --json 2>/dev/null | tee ".flaky-run-$i.json"
done

For Playwright:

for i in 1 2 3; do
  echo "=== RUN $i ==="
  npx playwright test --reporter=json 2>/dev/null | tee ".flaky-run-$i.json"
done

Collect test results across runs and identify:

Always passes — stable, exclude from analysis
Always fails — broken test, report separately
Sometimes passes, sometimes fails — FLAKY candidate

Phase 3: Analyze root cause

For each flaky candidate, categorize by symptom:

Category A: Async timing

Symptoms:

Test passes locally but fails on CI (slower environments)
Failure mentions timeout, setTimeout, setInterval, requestAnimationFrame
DOM assertions fail intermittently in browser tests

Diagnosis:

# Grep the test file for async patterns
grep -n "setTimeout\|setInterval\|waitFor\|delay\|sleep\|requestAnimationFrame\|transition\|animation" <test-file>

Look for:

Fixed timeouts instead of polling/retrying
Missing waitFor or expect.poll in Vitest
Missing waitFor / findBy* in Testing Library
Assertions before async operations complete

Category B: Shared mutable state

Symptoms:

Failures depend on test order (run --shard or --bail 1 changes results)
Tests pass in isolation but fail in full suite
Module-level variables, singletons, globals

Diagnosis:

# Check for shared state patterns
grep -n "let [a-z]* = \|var [a-z]* = \|const [a-z]* = \|global\." <test-file> | head -20

Look for:

Test-level variables not reset between tests
Mock state leaking across tests (vi.fn() / jest.fn() not cleared)
Environment variable mutations without restore
Filesystem or database state not cleaned up

Category C: Test isolation

Symptoms:

Test passes with --testPathPattern=<single-file> but fails in full suite
Test fails when specific other tests run before it

Diagnosis: Run tests in random order to confirm:

# Vitest (sequential + random)
npx vitest --sequence=sequential --reporter=json 2>/dev/null
npx vitest --sequence=random --reporter=json 2>/dev/null

Look for:

Missing beforeEach / afterEach cleanup
Tests modifying DOM or globals without restoring
Database/seeded data not isolated per test
Mock implementations leaking (vi.restoreAllMocks() / jest.restoreAllMocks() missing)

Category D: Environment sensitivity

Symptoms:

Fails on CI only (never locally)
Fails on certain OS/browser combinations
Fails at specific times (timezone, date-dependent)

Diagnosis: Look for:

Hardcoded time/date values
Locale-specific formatting
Timezone-dependent assertions
Screen resolution, viewport size assumptions
Node.js/Python version-specific behavior
Network-dependent tests (API calls, rate limiting)

Category E: Race conditions

Symptoms:

Failures involve concurrent operations, Web Workers, or parallel requests
Promise.all, Promise.race, setTimeout with 0ms delay
Event listeners on shared emitters

Diagnosis:

grep -n "Promise\.all\|Promise\.race\|Promise\.any\|Promise\.allSettled\|setTimeout\|setImmediate\|process\.nextTick" <test-file>

Phase 4: Apply fixes

Apply fix based on the detected category:

Fix A: Async timing

Replace brittle timing patterns:

- await new Promise(r => setTimeout(r, 1000))
- expect(screen.getByText('Loaded')).toBeInTheDocument()
+ await waitFor(() => {
+   expect(screen.getByText('Loaded')).toBeInTheDocument()
+ }, { timeout: 5000, interval: 100 })

For Vitest, use expect.poll or waitFor:

import { waitFor } from '@testing-library/react'
// or Vitest built-in:
await expect.poll(() => getStatus(), { timeout: 5000, interval: 100 }).toBe('done')

For Playwright, use auto-waiting locators:

// Bad:
await page.waitForTimeout(2000)
expect(await page.textContent('.status')).toBe('done')

// Good:
await expect(page.locator('.status')).toHaveText('done', { timeout: 10000 })

Fix B: Shared mutable state

+ import { vi } from 'vitest'
+
+ beforeEach(() => {
+   vi.clearAllMocks()
+ })
+
+ afterEach(() => {
+   vi.restoreAllMocks()
+ })

Reset module-level state:

beforeEach(() => {
  // Reset counters, caches, singletons
  counter = 0
  cache.clear()
})

Isolate runtime configuration mutations:

const ORIGINAL_RUNTIME_MODE = getRuntimeMode()
afterEach(() => {
  setRuntimeMode(ORIGINAL_RUNTIME_MODE)
})

Fix C: Test isolation

+ beforeEach(() => {
+   document.body.innerHTML = ''
+   localStorage.clear()
+ })
+
+ afterEach(() => {
+   vi.restoreAllMocks()
+   vi.useRealTimers()
+ })

For database-backed tests:

beforeEach(async () => {
  await db.migrate.latest()
  await db.seed.run()
})

afterEach(async () => {
  await db.migrate.rollback()
})

Fix D: Environment sensitivity

Mock time-dependent code:

beforeEach(() => {
  vi.useFakeTimers()
  vi.setSystemTime(new Date('2024-01-15T12:00:00Z'))
})

afterEach(() => {
  vi.useRealTimers()
})

For locale-sensitive tests:

import { locale } from '../i18n'

beforeEach(() => {
  locale.set('en-US')
})

For network-dependent tests, mock all external calls:

vi.mock('../api/client')
// or
global.fetch = vi.fn()

Fix E: Race conditions

Ensure sequential processing in tests:

// Instead of Promise.all, test sequentially
await operation1()
await operation2()

For event emitter tests:

const events: string[] = []
emitter.on('data', (d) => events.push(d))

emitter.emit('data', 'a')
emitter.emit('data', 'b')
await vi.waitFor(() => {
  expect(events).toEqual(['a', 'b'])
})

Phase 5: Add retry configuration

After fixing the root cause, add retry as a safety net. Prefer CI-specific config files or CI job commands with literal retry values instead of reading runtime environment variables inside the test config.

Vitest — vitest.ci.config.ts:

import baseConfig from './vitest.config'

export default defineConfig({
  ...baseConfig,
  test: {
    ...baseConfig.test,
    retry: 2,
    // Sequential execution for flaky tests:
    sequence: {
      seed: 123,
      shuffle: false,
    },
  },
})

Run in CI with vitest --config vitest.ci.config.ts.

Or per-test/file with vi.retry():

vi.retry(2)
describe('flaky file', () => { ... })

Jest — jest.ci.config.ts (requires jest-retry):

const baseConfig = require('./jest.config')

export default {
  ...baseConfig,
  setupFilesAfterSetup: ['jest-retry'],
  retryTimes: 2,
}

Run in CI with jest --config jest.ci.config.ts.

Playwright — playwright.ci.config.ts:

import baseConfig from './playwright.config'

export default defineConfig({
  ...baseConfig,
  retries: 2,
})

Run in CI with playwright test --config playwright.ci.config.ts.

Phase 6: Verify

Re-run the flaky detection loop:

for i in 1 2 3; do
  echo "=== RUN $i ==="
  npx vitest --reporter=json 2>/dev/null | tee ".flaky-verify-$i.json"
done

Compare results across runs:

Zero flaky tests — success, report fixed count
Remaining flaky tests — iterate back to Phase 3 with deeper analysis

FLAKY_REPORT.md generation

After fixing, generate a summary report:

# Flaky Test Report

| Test | Runner | Category | Status | Fix applied |
|------|--------|----------|--------|-------------|
| `auth.test.ts` | Vitest | Async timing | Fixed | Replaced `setTimeout` with `waitFor` |
| `api.test.ts` | Jest | Shared state | Fixed | Added `beforeEach` cleanup |
| `login.spec.ts` | Playwright | Environment | Quarantined | Needs API mock setup |

**Stable rate:** 98/100 tests (98%)
**Next step:** Review remaining flaky tests manually

Example

User: "our CI is full of flaky tests, we use vitest"

Agent actions:

Run npx vitest --reporter=json 3 times with loop
Detect 4 tests pass/fail inconsistently across runs
For auth.test.ts: grep shows setTimeout(500) before assertion
Replace with waitFor — auth.test.ts now consistent across 3 runs
For api.test.ts: module-level auth token leaks between tests
Add beforeEach(() => localStorage.clear()) — stable
Configure vitest.config.ts with CI retry = 2
Verify: all 4 formerly flaky tests pass consistently
Show report

User: "playwright e2e tests flaky on CI"

Agent actions:

Run npx playwright test --reporter=json 3 times, identify login.spec.ts flaky
Analyze: uses page.waitForTimeout(3000) — async timing issue
Replace with await expect(page.locator('.dashboard')).toBeVisible({ timeout: 15000 })
Set retries: 2 in playwright.config.ts for CI
Also detect user-profile.spec.ts — environment sensitivity (timezone)
Mock timezone in test setup
Verify: 3/3 runs pass for both tests

sametcelikbicak/flaky-test-detector

Was ist flaky-test-detector?

In Ihrer bevorzugten KI fragen

Dokumentation

Flaky Test Detector

Pipeline

Phase 1: Detect runner and configuration

Phase 2: Harvest flaky candidates

Phase 3: Analyze root cause

Category A: Async timing

Category B: Shared mutable state

Category C: Test isolation

Category D: Environment sensitivity

Category E: Race conditions

Phase 4: Apply fixes

Fix A: Async timing

Fix B: Shared mutable state

Fix C: Test isolation

Fix D: Environment sensitivity

Fix E: Race conditions

Phase 5: Add retry configuration

Phase 6: Verify

FLAKY_REPORT.md generation

Example

Verwandte Skills

obra/superpowers

affaan-m/quarkus-verification

affaan-m/uspto-database

affaan-m/scholar-evaluation

affaan-m/literature-review

affaan-m/research-ops