Flakiness Analysis

To specifically quantify the flakiness score in tests, you want to focus on how often the test fluctuates between passing and failing (i.e., where a test intermittently passes and fails without any apparent changes in the underlying code or environment).

So, to quantify a flaky test the following definitions:

  1. Fluctuations occur when a test result alternates between pass and fail in successive executions.

  2. The more frequent these alternations, the more “flaky” the test is considered to be.

So, when extracting data from the database in a time-dependent manner, for each test, track the sequence of test outcomes(i.e. pass/fail history) over a period of time.

Example of a Test Sequences:

  • Test A: Pass, Fail, Pass, Pass, Fail, Pass → This shows fluctuations. (flaky)

  • Test B: Pass, Pass, Pass → No fluctuations. (not flaky)

  • Test C: Fail, Fail, Fail → Also no fluctuations, though it has a consistent failure. (not flaky)

Calculating Flakiness Score

  1. Track the status changes in a particular test within that test suite. For example,

    • First Run: Pass

    • Second Run: Fail

    • Third Run: Pass

    • Fourth Run: Pass

  2. The total number of status changes is 2

  3. The flakiness score is calculated as:

Flakiness Score = Number of fluctuations / (Total Runs−1 )

So in this example,

Flakiness score = (2 / (4 - 1)) * 100 = 66%

This means the test has a 66% flakiness index, indicating significant volatility in its behaviour.

The severity of the volatility is defined by the below table:

Flakiness Score RangeSeverityColour
More than 75CriticalRed
less than 75ConcerningOrange
less than 50ModerateYellow
less than 25StableGreen
less than 0No Flaky TestsNo Flaky Tests