Artifact for On the Diagnosis of Flaky Job Failures

Replication package of the paper On the Diagnosis of Flaky Job Failures: Understanding and Prioritizing Failure Categories accepted at the 47th International Conference on Software Engineering ICSE SEIP 2025.

Purpose

This replication package includes Jupyter Notebooks and full analysis results of the notebooks to provide in-depth details of the analysis and foster replication.

It also includes the source code of the FlakeRanker CLI tool. This tool is an engineered version of the notebook scripts to facilitate reuse of our RFM prioritization approach, through automated labeling of flaky job failures with failure categories and prioritization of the categories using RFM modeling.

Data

To conduct the study, we collected build job data from GitLab projects using the python-gitlab library. For confidentiality reasons, the data collected from TELUS projects are not included. However, we prepared a build job dataset collected from the open-source project Veloren to demonstrate the FlakeRanker CLI tool's functionalities.

Available Study Replication Package

Content

We provide the Jupyter Notebooks used to answer the RQs. These notebooks are not directly exercisable and are for read-only purpose. For reuse of our approach, please go to the section FlakeRanker CLI Tool for Reuse below.

PQ. Data Labeling Process

RQ1-3. RFM Analysis

RQ1. What are the main categories of flaky failures?
RQ2. Which failure categories are the most costly?
RQ3. How do the failure categories evolve over time?

RQ4. RFM Modeling and Prioritization

RQ4. What are the priority flaky failure categories?

Additional Study Materials

The notebooks/results/ directory contains additional research results materials including:

Full figures of categories evolution and related costs
Computed RFM values and scores
Scatter plots of Recency vs Frequency, Frequency vs Monetary, and Recency vs Monetary values
Correlation matrix of RFM scores used for K-means clustering
Clustering model dump and clustering results.

FlakeRanker CLI Tool for Reuse

⚙️ Installation

We provide two options for intalling flakeranker.

1. Build Docker Image (recommended)

Clone this repository. In the root directory, run the following command.

docker build --tag flakeranker --file docker/Dockerfile .

2. Install Python Package

Install the flakeranker Python library.

pip install flakeranker

🚀 Quickstart Reuse Example

Unzip the example dataset

unzip example/data/veloren.zip -d example/data/

It outputs inside the example/data/veloren/ directory, the jobs.csv and labeled_jobs.csv files.

Run the experiment on the example dataset

We recommend running the experiment using the already labeled dataset for faster execution. To do so, simply copy and run the following command depending on your installation choice.

To further test the labeling processing on a clean dataset (which might take a while ~ 34min on an Ubuntu 22.04 RAM 16GB Dual Core i7 2.80GHz), simply change labeled_jobs.csv with jobs.csv in the command.

Using the Docker Image

docker run \
-v ./example/data/veloren/labeled_jobs.csv:/opt/flakeranker/jobs.csv \
-v ./example/results/:/opt/flakeranker/ \
flakeranker run /opt/flakeranker/jobs.csv -o /opt/flakeranker/

Using the Python Package

flakeranker run ./example/data/veloren/labeled_jobs.csv -o ./example/results/

FlakeRanker CLI outputs the experiments results into the example/results/ directory as follows:

labeled_jobs.csv: Labeled dataset of jobs produced by the labeler module.
rfm_dataset.csv: RFM dataset of flaky job failure categories produced by the analyzer module.
ranked_rfm_dataset.csv: Ranked RFM dataset including the scores, cluster, and pattern produced by the ranker module. Outlier categories are affected to the cluster -1.

FlakeRanker Documentation

For more details on each FlakeRanker sub-command, please read the documentation also available on the official page of the flakeranker python package.

Citation

@inproceedings{aidasso_diagnosis_2025,
  Author = {Aïdasso, Henri and Bordeleau, Francis and Tizghadam, Ali},
  Title = {On the {Diagnosis} of {Flaky} {Job} {Failures}: {Understanding} and {Prioritizing} {Failure} {Categories}},
  Year = {2025},
  Booktitle = {Proceedings of 2025 {IEEE}/{ACM} 47th {International} {Conference} on {Software} {Engineering} ({ICSE})},
  Pages = {To appear}
}

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
docker		docker
example		example
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
PAPER.pdf		PAPER.pdf
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Artifact for On the Diagnosis of Flaky Job Failures

Purpose

Data

Available Study Replication Package

Content

Additional Study Materials

FlakeRanker CLI Tool for Reuse

⚙️ Installation

1. Build Docker Image (recommended)

2. Install Python Package

🚀 Quickstart Reuse Example

Unzip the example dataset

Run the experiment on the example dataset

FlakeRanker Documentation

Citation

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

devopsirc/telus-flaky-job-failures-prioritization

Folders and files

Latest commit

History

Repository files navigation

Artifact for On the Diagnosis of Flaky Job Failures

Purpose

Data

Available Study Replication Package

Content

Additional Study Materials

FlakeRanker CLI Tool for Reuse

⚙️ Installation

1. Build Docker Image (recommended)

2. Install Python Package

🚀 Quickstart Reuse Example

Unzip the example dataset

Run the experiment on the example dataset

FlakeRanker Documentation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages