Replication package of the paper On the Diagnosis of Flaky Job Failures: Understanding and Prioritizing Failure Categories accepted at the 47th International Conference on Software Engineering ICSE SEIP 2025.
This replication package includes Jupyter Notebooks and full analysis results of the notebooks to provide in-depth details of the analysis and foster replication.
It also includes the source code of the FlakeRanker CLI tool. This tool is an engineered version of the notebook scripts to facilitate reuse of our RFM prioritization approach, through automated labeling of flaky job failures with failure categories and prioritization of the categories using RFM modeling.
To conduct the study, we collected build job data from GitLab projects using the python-gitlab library. For confidentiality reasons, the data collected from TELUS projects are not included. However, we prepared a build job dataset collected from the open-source project Veloren to demonstrate the FlakeRanker CLI tool's functionalities.
We provide the Jupyter Notebooks used to answer the RQs. These notebooks are not directly exercisable and are for read-only purpose. For reuse of our approach, please go to the section FlakeRanker CLI Tool for Reuse
below.
RQ1-3. RFM Analysis
- RQ1. What are the main categories of flaky failures?
- RQ2. Which failure categories are the most costly?
- RQ3. How do the failure categories evolve over time?
RQ4. RFM Modeling and Prioritization
- RQ4. What are the priority flaky failure categories?
The notebooks/results/
directory contains additional research results materials including:
- Full figures of categories evolution and related costs
- Computed RFM values and scores
- Scatter plots of Recency vs Frequency, Frequency vs Monetary, and Recency vs Monetary values
- Correlation matrix of RFM scores used for K-means clustering
- Clustering model dump and clustering results.
We provide two options for intalling flakeranker
.
Clone this repository. In the root directory, run the following command.
docker build --tag flakeranker --file docker/Dockerfile .
Install the flakeranker
Python library.
pip install flakeranker
unzip example/data/veloren.zip -d example/data/
It outputs inside the example/data/veloren/
directory, the jobs.csv
and labeled_jobs.csv
files.
We recommend running the experiment using the already labeled dataset for faster execution. To do so, simply copy and run the following command depending on your installation choice.
To further test the labeling processing on a clean dataset (which might take a while ~ 34min on an Ubuntu 22.04 RAM 16GB Dual Core i7 2.80GHz), simply change labeled_jobs.csv
with jobs.csv
in the command.
Using the Docker Image
docker run \
-v ./example/data/veloren/labeled_jobs.csv:/opt/flakeranker/jobs.csv \
-v ./example/results/:/opt/flakeranker/ \
flakeranker run /opt/flakeranker/jobs.csv -o /opt/flakeranker/
Using the Python Package
flakeranker run ./example/data/veloren/labeled_jobs.csv -o ./example/results/
FlakeRanker CLI outputs the experiments results into the example/results/ directory as follows:
labeled_jobs.csv
: Labeled dataset of jobs produced by the labeler module.rfm_dataset.csv
: RFM dataset of flaky job failure categories produced by the analyzer module.ranked_rfm_dataset.csv
: Ranked RFM dataset including the scores, cluster, and pattern produced by the ranker module. Outlier categories are affected to the cluster -1.
For more details on each FlakeRanker sub-command, please read the documentation also available on the official page of the flakeranker python package.
@inproceedings{aidasso_diagnosis_2025,
Author = {Aïdasso, Henri and Bordeleau, Francis and Tizghadam, Ali},
Title = {On the {Diagnosis} of {Flaky} {Job} {Failures}: {Understanding} and {Prioritizing} {Failure} {Categories}},
Year = {2025},
Booktitle = {Proceedings of 2025 {IEEE}/{ACM} 47th {International} {Conference} on {Software} {Engineering} ({ICSE})},
Pages = {To appear}
}