Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers

This repository contains the code, experiments, and datasets associated with the paper "Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers". The research focuses on stabilizing machine learning models in finance against distribution shifts and sudden macroeconomic shocks in developing economies by leveraging synthetic outliers.

Repository Content

Notebooks and scripts for generating synthetic outliers, training models, and evaluating stability.
Datasets:
- Preprocessed open dataset: Lending Club
- Synthetic data with and without outliers generated by zGAN.
Metrics:
- Stabilization Score (SS) – measures relative performance drop under shocks normalized by covariate drift.
- Stabilization Uplift (SU) – weight-adjusted metric for comparing two models pre and post-shock.

Key Contributions

Stability metrics for model drift evaluation under shocks
Introduces a two-level evaluation framework (SS and SU) to quantify model performance under sudden distribution shifts.
Synthetic outliers for model drift mitigation
Demonstrates that carefully generated synthetic outliers improve model stability when combined with real and synthetic data.
Focus on economies
Experiments are conducted on datasets from markets where macroeconomic shocks are frequent and unpredictable.

Project Structure

1 pre-processing & research.ipynb — data preprocessing, exploratory analysis, saving prepared datasets.
2 experiments_on_open_data_catboost.ipynb — experiments with CatBoost on open data.
3 experiments_on_open_data_tabpfn.ipynb — experiments with TabPFN on open data.
4 experiments_on_data_stability_other_models.ipynb — data stability and experiments with other models (LightGBM, NGBoost, TabNet, etc.).
artifacts/ — experiment results (csv files).
data/ — raw, synthetic, and preprocessed data.
src/ — utility scripts and modules.

Quick Start

Install dependencies:
```
pip install -r src/requirements.txt
```
Run Jupyter Notebook:
```
jupyter notebook
```
Open the desired notebook and follow the cell instructions.

Data Description

Open and synthetic datasets for evaluating uplift model stability.
Example data paths:
- Preprocessed: /data/preprocessed/
- Synthetic: /data/synthetic/

Experiments

Comparison of various models (CatBoost, TabPFN, LightGBM, NGBoost, TabNet, HGBoosting, XGBoost, FT-Transformer).
Analysis of model stability to data changes.
Results are saved in the artifacts/ folder.

Results

Experiment results are saved as csv files in the artifacts/ folder.

Contacts

Authors:

Ilyas Varshavskiy¹, Bonu Boboeva¹, Shuhrat Khalilbekov¹, Azizjon Azimi¹, Sergey Shulgin¹, Akhlitdin Nizamitdinov¹, Haitz Sáez de Ocáriz Borde²^,³

Affiliations:

¹ zypl.ai, ² University of Oxford, ³ University of Cambridge

Questions and suggestions: issues or pull requests are welcome!

NB: Python 3.11+ and Jupyter Notebook support are required

License

This repository is licensed under the Creative Commons Attribution (CC BY) License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers

Repository Content

Key Contributions

Project Structure

Quick Start

Data Description

Experiments

Results

Contacts

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
artifacts		artifacts
data		data
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
1 pre-processing & research.ipynb		1 pre-processing & research.ipynb
2 experiments_on_open_data_catboost.ipynb		2 experiments_on_open_data_catboost.ipynb
3 experiments_on_open_data_tabpfn.ipynb		3 experiments_on_open_data_tabpfn.ipynb
4 experiments_on_data_stability_other_models.ipynb		4 experiments_on_data_stability_other_models.ipynb
README.md		README.md

zypl-ai/stabilization_uplift

Folders and files

Latest commit

History

Repository files navigation

Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers

Repository Content

Key Contributions

Project Structure

Quick Start

Data Description

Experiments

Results

Contacts

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages