Skip to content

zypl-ai/stabilization_uplift

Repository files navigation

Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers

This repository contains the code, experiments, and datasets associated with the paper "Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers". The research focuses on stabilizing machine learning models in finance against distribution shifts and sudden macroeconomic shocks in developing economies by leveraging synthetic outliers.

Repository Content

  • Notebooks and scripts for generating synthetic outliers, training models, and evaluating stability.
  • Datasets:
    • Preprocessed open dataset: Lending Club
    • Synthetic data with and without outliers generated by zGAN.
  • Metrics:
    • Stabilization Score (SS) – measures relative performance drop under shocks normalized by covariate drift.
    • Stabilization Uplift (SU) – weight-adjusted metric for comparing two models pre and post-shock.

Key Contributions

  1. Stability metrics for model drift evaluation under shocks
    Introduces a two-level evaluation framework (SS and SU) to quantify model performance under sudden distribution shifts.

  2. Synthetic outliers for model drift mitigation
    Demonstrates that carefully generated synthetic outliers improve model stability when combined with real and synthetic data.

  3. Focus on economies
    Experiments are conducted on datasets from markets where macroeconomic shocks are frequent and unpredictable.

Project Structure

  • 1 pre-processing & research.ipynb — data preprocessing, exploratory analysis, saving prepared datasets.
  • 2 experiments_on_open_data_catboost.ipynb — experiments with CatBoost on open data.
  • 3 experiments_on_open_data_tabpfn.ipynb — experiments with TabPFN on open data.
  • 4 experiments_on_data_stability_other_models.ipynb — data stability and experiments with other models (LightGBM, NGBoost, TabNet, etc.).
  • artifacts/ — experiment results (csv files).
  • data/ — raw, synthetic, and preprocessed data.
  • src/ — utility scripts and modules.

Quick Start

  1. Install dependencies:

    pip install -r src/requirements.txt
  2. Run Jupyter Notebook:

    jupyter notebook

    Open the desired notebook and follow the cell instructions.

Data Description

  • Open and synthetic datasets for evaluating uplift model stability.
  • Example data paths:
    • Preprocessed: /data/preprocessed/
    • Synthetic: /data/synthetic/

Experiments

Results

Experiment results are saved as csv files in the artifacts/ folder.

Contacts

Authors:

Ilyas Varshavskiy¹, Bonu Boboeva¹, Shuhrat Khalilbekov¹, Azizjon Azimi¹, Sergey Shulgin¹, Akhlitdin Nizamitdinov¹, Haitz Sáez de Ocáriz Borde2,3

Affiliations:

¹ zypl.ai, ² University of Oxford, ³ University of Cambridge

Questions and suggestions: issues or pull requests are welcome!


NB: Python 3.11+ and Jupyter Notebook support are required

License

This repository is licensed under the Creative Commons Attribution (CC BY) License.