sametcelikbicak/flaky-test-detector

Detect, analyze, and eliminate flaky tests across any test runner. An AI agent skill that identifies non-deterministic test failures, categorizes root causes, and applies targeted fixes.

¿Qué es flaky-test-detector?

flaky-test-detector is a Claude Code agent skill that detect, analyze, and eliminate flaky tests across any test runner. An AI agent skill that identifies non-deterministic test failures, categorizes root causes, and applies targeted fixes.

Compatible conClaude Code~Codex CLICursorOpenCodeWindsurf
npx skills add sametcelikbicak/flaky-test-detector

Installed? Explore more Investigación y análisis de datos skills: obra/superpowers, affaan-m/quarkus-verification, affaan-m/uspto-database · View all 6 →

Preguntar en tu IA favorita

Abre un nuevo chat con esta habilidad de agente ya precargada.

Documentación

Flaky Test Detector

Identify, diagnose, and eliminate flaky tests. Runs your test suite multiple times, isolates non-deterministic failures, categorizes root causes, and applies targeted fixes.

Pipeline

  1. Detect runner & configuration
  2. Harvest flaky candidates via repeated runs
  3. Analyze failure patterns to identify root cause
  4. Fix each flakiness category with proven strategies
  5. Verify fixes by re-running the detection loop

Phase 1: Detect runner and configuration

Read project setup:

cat package.json

Identify the test runner:

IndicatorRunnerBase command
"vitest" in devDependenciesVitestnpx vitest
"jest" in devDependenciesJestnpx jest
"react-scripts" in devDependenciesreact-scriptsnpx react-scripts test
"@playwright/test" in devDependenciesPlaywrightnpx playwright test
"cypress" in devDependenciesCypressnpx cypress run
"mocha" in devDependenciesMochanpx mocha
"ava" in devDependenciesAVAnpx ava

Check existing retry/flaky config:

  • Vitest: retry in vitest.config.ts
  • Jest: testRetry in jest.config.ts or jest.retryTimes() in setup
  • Playwright: retries in playwright.config.ts
  • Cypress: retries in cypress.config.ts

Phase 2: Harvest flaky candidates

Run the full suite multiple times and collect failures:

# Run 3+ times, log each result
for i in 1 2 3; do
  echo "=== RUN $i ==="
  npx vitest --reporter=json 2>/dev/null | tee ".flaky-run-$i.json"
done

For a quick flaky check on the last failed run:

npx vitest --reporter=json 2>/dev/null | tee ".flaky-last-run.json"

For Jest:

for i in 1 2 3; do
  echo "=== RUN $i ==="
  npx jest --json 2>/dev/null | tee ".flaky-run-$i.json"
done

For Playwright:

for i in 1 2 3; do
  echo "=== RUN $i ==="
  npx playwright test --reporter=json 2>/dev/null | tee ".flaky-run-$i.json"
done

Collect test results across runs and identify:

  • Always passes — stable, exclude from analysis
  • Always fails — broken test, report separately
  • Sometimes passes, sometimes fails — FLAKY candidate

Phase 3: Analyze root cause

For each flaky candidate, categorize by symptom:

Category A: Async timing

Symptoms:

  • Test passes locally but fails on CI (slower environments)
  • Failure mentions timeout, setTimeout, setInterval, requestAnimationFrame
  • DOM assertions fail intermittently in browser tests

Diagnosis:

# Grep the test file for async patterns
grep -n "setTimeout\|setInterval\|waitFor\|delay\|sleep\|requestAnimationFrame\|transition\|animation" <test-file>

Look for:

  • Fixed timeouts instead of polling/retrying
  • Missing waitFor or expect.poll in Vitest
  • Missing waitFor / findBy* in Testing Library
  • Assertions before async operations complete

Category B: Shared mutable state

Symptoms:

  • Failures depend on test order (run --shard or --bail 1 changes results)
  • Tests pass in isolation but fail in full suite
  • Module-level variables, singletons, globals

Diagnosis:

# Check for shared state patterns
grep -n "let [a-z]* = \|var [a-z]* = \|const [a-z]* = \|global\." <test-file> | head -20

Look for:

  • Test-level variables not reset between tests
  • Mock state leaking across tests (vi.fn() / jest.fn() not cleared)
  • Environment variable mutations without restore
  • Filesystem or database state not cleaned up

Category C: Test isolation

Symptoms:

  • Test passes with --testPathPattern=<single-file> but fails in full suite
  • Test fails when specific other tests run before it

Diagnosis: Run tests in random order to confirm:

# Vitest (sequential + random)
npx vitest --sequence=sequential --reporter=json 2>/dev/null
npx vitest --sequence=random --reporter=json 2>/dev/null

Look for:

  • Missing beforeEach / afterEach cleanup
  • Tests modifying DOM or globals without restoring
  • Database/seeded data not isolated per test
  • Mock implementations leaking (vi.restoreAllMocks() / jest.restoreAllMocks() missing)

Category D: Environment sensitivity

Symptoms:

  • Fails on CI only (never locally)
  • Fails on certain OS/browser combinations
  • Fails at specific times (timezone, date-dependent)

Diagnosis: Look for:

  • Hardcoded time/date values
  • Locale-specific formatting
  • Timezone-dependent assertions
  • Screen resolution, viewport size assumptions
  • Node.js/Python version-specific behavior
  • Network-dependent tests (API calls, rate limiting)

Category E: Race conditions

Symptoms:

  • Failures involve concurrent operations, Web Workers, or parallel requests
  • Promise.all, Promise.race, setTimeout with 0ms delay
  • Event listeners on shared emitters

Diagnosis:

grep -n "Promise\.all\|Promise\.race\|Promise\.any\|Promise\.allSettled\|setTimeout\|setImmediate\|process\.nextTick" <test-file>

Phase 4: Apply fixes

Apply fix based on the detected category:

Fix A: Async timing

Replace brittle timing patterns:

- await new Promise(r => setTimeout(r, 1000))
- expect(screen.getByText('Loaded')).toBeInTheDocument()
+ await waitFor(() => {
+   expect(screen.getByText('Loaded')).toBeInTheDocument()
+ }, { timeout: 5000, interval: 100 })

For Vitest, use expect.poll or waitFor:

import { waitFor } from '@testing-library/react'
// or Vitest built-in:
await expect.poll(() => getStatus(), { timeout: 5000, interval: 100 }).toBe('done')

For Playwright, use auto-waiting locators:

// Bad:
await page.waitForTimeout(2000)
expect(await page.textContent('.status')).toBe('done')

// Good:
await expect(page.locator('.status')).toHaveText('done', { timeout: 10000 })

Fix B: Shared mutable state

+ import { vi } from 'vitest'
+
+ beforeEach(() => {
+   vi.clearAllMocks()
+ })
+
+ afterEach(() => {
+   vi.restoreAllMocks()
+ })

Reset module-level state:

beforeEach(() => {
  // Reset counters, caches, singletons
  counter = 0
  cache.clear()
})

Isolate runtime configuration mutations:

const ORIGINAL_RUNTIME_MODE = getRuntimeMode()
afterEach(() => {
  setRuntimeMode(ORIGINAL_RUNTIME_MODE)
})

Fix C: Test isolation

+ beforeEach(() => {
+   document.body.innerHTML = ''
+   localStorage.clear()
+ })
+
+ afterEach(() => {
+   vi.restoreAllMocks()
+   vi.useRealTimers()
+ })

For database-backed tests:

beforeEach(async () => {
  await db.migrate.latest()
  await db.seed.run()
})

afterEach(async () => {
  await db.migrate.rollback()
})

Fix D: Environment sensitivity

Mock time-dependent code:

beforeEach(() => {
  vi.useFakeTimers()
  vi.setSystemTime(new Date('2024-01-15T12:00:00Z'))
})

afterEach(() => {
  vi.useRealTimers()
})

For locale-sensitive tests:

import { locale } from '../i18n'

beforeEach(() => {
  locale.set('en-US')
})

For network-dependent tests, mock all external calls:

vi.mock('../api/client')
// or
global.fetch = vi.fn()

Fix E: Race conditions

Ensure sequential processing in tests:

// Instead of Promise.all, test sequentially
await operation1()
await operation2()

For event emitter tests:

const events: string[] = []
emitter.on('data', (d) => events.push(d))

emitter.emit('data', 'a')
emitter.emit('data', 'b')
await vi.waitFor(() => {
  expect(events).toEqual(['a', 'b'])
})

Phase 5: Add retry configuration

After fixing the root cause, add retry as a safety net. Prefer CI-specific config files or CI job commands with literal retry values instead of reading runtime environment variables inside the test config.

Vitestvitest.ci.config.ts:

import baseConfig from './vitest.config'

export default defineConfig({
  ...baseConfig,
  test: {
    ...baseConfig.test,
    retry: 2,
    // Sequential execution for flaky tests:
    sequence: {
      seed: 123,
      shuffle: false,
    },
  },
})

Run in CI with vitest --config vitest.ci.config.ts.

Or per-test/file with vi.retry():

vi.retry(2)
describe('flaky file', () => { ... })

Jestjest.ci.config.ts (requires jest-retry):

const baseConfig = require('./jest.config')

export default {
  ...baseConfig,
  setupFilesAfterSetup: ['jest-retry'],
  retryTimes: 2,
}

Run in CI with jest --config jest.ci.config.ts.

Playwrightplaywright.ci.config.ts:

import baseConfig from './playwright.config'

export default defineConfig({
  ...baseConfig,
  retries: 2,
})

Run in CI with playwright test --config playwright.ci.config.ts.


Phase 6: Verify

Re-run the flaky detection loop:

for i in 1 2 3; do
  echo "=== RUN $i ==="
  npx vitest --reporter=json 2>/dev/null | tee ".flaky-verify-$i.json"
done

Compare results across runs:

  • Zero flaky tests — success, report fixed count
  • Remaining flaky tests — iterate back to Phase 3 with deeper analysis

FLAKY_REPORT.md generation

After fixing, generate a summary report:

# Flaky Test Report

| Test | Runner | Category | Status | Fix applied |
|------|--------|----------|--------|-------------|
| `auth.test.ts` | Vitest | Async timing | Fixed | Replaced `setTimeout` with `waitFor` |
| `api.test.ts` | Jest | Shared state | Fixed | Added `beforeEach` cleanup |
| `login.spec.ts` | Playwright | Environment | Quarantined | Needs API mock setup |

**Stable rate:** 98/100 tests (98%)
**Next step:** Review remaining flaky tests manually

Example

User: "our CI is full of flaky tests, we use vitest"

Agent actions:

  1. Run npx vitest --reporter=json 3 times with loop
  2. Detect 4 tests pass/fail inconsistently across runs
  3. For auth.test.ts: grep shows setTimeout(500) before assertion
  4. Replace with waitForauth.test.ts now consistent across 3 runs
  5. For api.test.ts: module-level auth token leaks between tests
  6. Add beforeEach(() => localStorage.clear()) — stable
  7. Configure vitest.config.ts with CI retry = 2
  8. Verify: all 4 formerly flaky tests pass consistently
  9. Show report

User: "playwright e2e tests flaky on CI"

Agent actions:

  1. Run npx playwright test --reporter=json 3 times, identify login.spec.ts flaky
  2. Analyze: uses page.waitForTimeout(3000) — async timing issue
  3. Replace with await expect(page.locator('.dashboard')).toBeVisible({ timeout: 15000 })
  4. Set retries: 2 in playwright.config.ts for CI
  5. Also detect user-profile.spec.ts — environment sensitivity (timezone)
  6. Mock timezone in test setup
  7. Verify: 3/3 runs pass for both tests

Skills relacionados