Skip to main contentFlakiness Analysis
To specifically quantify the flakiness score in tests, you want to focus on how often the test fluctuates between passing and failing (i.e., where a test intermittently passes and fails without any apparent changes in the underlying code or environment).
So, to quantify a flaky test the following definitions:
-
Fluctuations occur when a test result alternates between pass and fail in successive executions.
-
The more frequent these alternations, the more “flaky” the test is considered to be.
So, when extracting data from the database in a time-dependent manner, for each test, track the sequence of test outcomes(i.e. pass/fail history) over a period of time.
Example of a Test Sequences:
-
Test A: Pass, Fail, Pass, Pass, Fail, Pass → This shows fluctuations. (flaky)
-
Test B: Pass, Pass, Pass → No fluctuations. (not flaky)
-
Test C: Fail, Fail, Fail → Also no fluctuations, though it has a consistent failure. (not flaky)
Calculating Flakiness Score
-
Track the status changes in a particular test within that test suite. For example,
-
First Run: Pass
-
Second Run: Fail
-
Third Run: Pass
-
Fourth Run: Pass
-
The total number of status changes is 2
-
The flakiness score is calculated as:
Flakiness Score = Number of fluctuations / (Total Runs−1 )
So in this example,
Flakiness score = (2 / (4 - 1)) * 100 = 66%
This means the test has a 66% flakiness index, indicating significant volatility in its behaviour.
The severity of the volatility is defined by the below table:
| Flakiness Score Range | Severity | Colour |
|---|
| More than 75 | Critical | Red |
| less than 75 | Concerning | Orange |
| less than 50 | Moderate | Yellow |
| less than 25 | Stable | Green |
| less than 0 | No Flaky Tests | No Flaky Tests |