ML System for Price Prediction

Visit

Table of Content

The Project Overview
The System Architecture
The Inference
Quick Start
Tuning
Deployment
- Publishing Docker image
- Connecting cache storage
Package Management
Data CI/CD Automation
Contributing
- Pre-commit hooks
Trouble Shooting
Ref. Repository Structure

The Project Overview

This project describes the development and deployment of a serverless machine learning system designed to recommend optimal retail pricing which maximizes product sales.

The system aims to allow mid-sized retailers to compete effectively with larger players.

The System Architecture

The architecture establishes a scalable, serverless microservice using AWS Lambda, triggered by an API Gateway.

The prediction logic is fully containerized via Docker, which stored in AWS ECR.

Trained models and features are centrally managed in S3, while ElastiCache (Redis) provides a low-latency caching layer for historical data and predictions.

This event-driven setup ensures automatic scaling and pay-per-use efficiency.

Figure A. The system architecture (Created by Kuriko IWAI)

Core AWS Resources

The infrastructure leverages AWS ecosystem:

Docker / AWS ECR as Microservice container: Packages the prediction logic and dependencies. AWS Lambda pulls the image from ECR for consistent, universal deployment.
AWS API Gateway as REST API endpoint: Routes external client-side UI requests (via a Flask application) to trigger the Lambda function.
AWS Lambda as inference: Executes the inference function, loading the container, models, and features to calculate price recommendations.
AWS S3 as storage & feature store: Stores raw features, trained model artifacts, processors, and DVC metadata for ML Lineage.
AWS ElastiCache and Redis client as caching layer: Stores cached analytical data and past price predictions to improve latency and resource efficiency.

ML Lineage Integration

A dedicated ML Lineage process is integrated using DVC (Data Version Control) and scheduled by Prefect, an open-source workflow scheduler, running weekly.

Lineage Scope (DVC): DVC tracks the entire lifecycle, including Data (ETL/preprocessing), Experiments (hyperparameter tuning/validation), and Models/Prediction (artifacts, metrics).
Data Quality Gate: Models must pass stringent quality checks before being authorized to serve predictions:
- Data Drift Tests: Handled by Evently AI to identify shifts in data distribution.
- Fairness Tests: Measures SHAP scores and other custom metrics to ensure the model operates without bias.
Automation: Prefect triggers DVC weekly to check for updates in data or scripts and executes the full lineage process if changes are detected, ensuring continuous model freshness and quality.

CI/CD Pipeline Integration

The infrastructure and model lifecycle are managed through a robust MLOps practice using a CI/CD pipeline integrated with GitHub.

Code Lineage: Handled by GitHub, protected by branch rules and enforced pull request reviews.
Source: Code commit to GitHub triggers a GitHub Actions workflow.
Testing & Building: Automated GitHub Actions run:
- Test Phase: Runs PyTest (unit/integration tests), SAST (Static Application Security Testing), and SCA (Software Composition Analysis) for dependencies using Synk.
- Build Phase: If tests pass, AWS CodeBuild is triggered to build the Docker image and push it to ECR.
Deployment: A human review phase is mandatory between the build and deployment. After approval, another GitHub Actions workflow is manually triggered to deploy the updated Lambda function to staging or production.

Figure B. The CI/CD pipeline (Created by Kuriko IWAI)

The Inference

The process is designed for consistent, automated data and model management through MLOps tools:

The client UI sends a price recommendation request via the Flask application.
The request hits the API Gateway endpoint.
API Gateway triggers the AWS Lambda function.
Lambda loads the Docker container from ECR.
The function retrieves the latest features and model artifacts from S3 and checks ElastiCache/Redis for cached data.
The primary model performs inference on the logarithmically transformed quantity data and returns the optimal price recommendation.

Models Trained

The system utilizes multiple machine learning models to ensure prediction redundancy and reliability. The primary mechanism involves predicting the quantity of product sold at a given price point.

Primary Model: Multi-layered feedforward network (PyTorch).
- Role: Serves first-line predictions.
- Tuning: Tuned via Optuna's Bayesian Optimization (with grid search fallback).
Backup Models: LightGBM, SVR, and Elastic Net (Scikit-Learn).
- Role: Prioritized backups used if the primary model fails, ensuring redundancy.
- Tuning: Tuned via the Scikit-Optimize framework.

Performance Validation Metrics

Models are evaluated using metrics corresponding to both transformed and original data, where a lower value indicates better performance.

For Logged Values: Mean Squared Error (MSE).
For Actual (Original) Values: Root Mean Squared Log Error (RMSLE) and Mean Absolute Error (MAE).

ML Techniques Implemented

Logarithmic Transformation (Data Preprocessing):
- Quantity data is logged before training and prediction to achieve a denser data distribution. This is crucial for normalizing skewed data and reducing the influence of extreme values (outliers), enabling all models to learn underlying patterns more effectively.
Model Diversity and Redundancy:
- The system employs a hybrid approach combining a Multi-layered Feedforward Network (Deep Learning) as the primary predictor with diverse Traditional Machine Learning Models (LightGBM, SVR, Elastic Net) as backups.
- This multi-model inference strategy provides a failover mechanism, ensuring high availability by loading a prioritized backup model if the primary fails.
Advanced Hyperparameter Optimization:
- Bayesian Optimization (Optuna) is utilized for the deep learning primary model, efficiently searching the hyperparameter space to find optimal settings (with a grid search fallback available).
- The backup Scikit-Learn models are tuned using the Scikit-Optimize framework.
Production Quality Gates:
- To ensure the model remains reliable in a dynamic retail environment, the ML Lineage process incorporates necessary quality checks as techniques:
  - Data Drift Testing (Evently AI): Continuously identifies shifts in data distributions in production that could compromise the model's generalization capabilities.
  - Fairness Testing: Validates that the model operates without unwanted bias across different features or segments before being authorized to serve predictions.

Quick Start

Installing the package manager

For MacOS:

brew install uv

For Ubuntu/Debian:

sudo apt-get install uv

Installing dependencies

uv venv
source .venv/bin/activate
uv lock --upgrade
uv sync

or

pip env
pip install -r requirements.txt

AssertionError/module mismatch errors: Set up the default Python version using .pyenv

pyenv install 3.12.8
pyenv global 3.12.8  (optional: `pyenv global system` to get back to the system default ver.)
uv python pin 3.12.8
echo 3.12.8 >> .python-version

Adding env secrets to .env file

Create .env file in the project root and add secret vars following .env.sample file.

Running API endpoints

uv run app.py --cache-clear

The API endpoint is available at http://localhost:5002.

Tuning

Feature engineering

The data_handling folder contains data relerated scripts.
After updating scripts, run:

uv run src/data_handling/main.py

Model retraining

The retrain script will load the serialized model in the model store, then retrain with new data, and upload the retrained model to the model store.

uv run src/retrain.py

Tuning from scratch (with caution)

The main script will run feature engineering and model tuning from scratch, and update instances saved in model store and feature store in S3.

uv run src/main.py

Before running the script, make sure testing the new script in notebook.

Tuning for stockcode (with caution)

Run the main script for stockcode to tune the model based on training data of specific stockcode.

uv run src/main_stockcode.py {STOCKCODE} --cache-clear

Deployment

Publishing Docker image

Build and run Docker image:

docker build -t <APP NAME> .
docker run -p 5002:5002 -e ENV=local <APP NAME> app.py

Replace <APP NAME> with an app name of your choice.

Push the Dokcer image to AWS Elastic Container Registory (ECR)

# tagging
docker tag <YOUR ECR NAME>:<YOUR ECR VERSION> <URI>.dkr.ecr.<REGION>.amazonaws.com/<ECR NAME>:<VERSION>

# push to the ECR
docker push <URI>.dkr.ecr.<REGION>.amazonaws.com/<ECR NAME>:<VERSION>

Connecting cache storage

Cache storage (ElastiCache) run on Redis engine.
To test the connection locally:

redis-cli --tls -h clustercfg.{REDIS_CLUSTER}.cache.amazonaws.com -p 6379 -c

To flush all caches (WITH CAUTION):

redis-cli -h clustercfg.{REDIS_CLUSTER}.cache.amazonaws.com -p 6379 --tls

# once connected, flush all data
FLUSHALL

# or flush specific database (if using multiple databases)
FLUSHDB

Package Management

Add a package: uv add <package>
Remove a package: uv remove <package>
Run a command in the virtual environment: uv run <command>
To completely refresh the environement:

rm -rf .venv
rm -rf uv.lock
uv cache clean
uv venv
source .venv/bin/activate
uv sync

Data CI/CD Automation

Managing DVC Pipeline

Run the DVC pipeline and push the updated data to cache:

dvc repro

# add updated lock file
git add dvc.lock
git commit -m'updated'
git push

# dvc push
dvc push

Force run all stages in the DVC pipeline including stages without any updates:

dvc repro -f

Run the DVC pipeline for a specific stockcode:

dvc repro etl_pipeline_stockcode -p stockcode={STOCKCODE}
dvc repro preprocess_stockcode -p stockcode={STOCKCODE}

Train the model using data from the DVC pipeline:

uv run src/main_stockcode.py {STOCKCODE}
dvc add models/production/dfn_best_{STOCKCODE}.pth
dvc push

rm models/production/dfn_best_{STOCKCODE}.pth

To check the cache status explicitly:

dvc data status --not-in-remote

To edit the DVC pipeline, update dvc.yaml and params.yaml for parameter updates.

Schedule run with Prefect

Run Prefect server in local

uv run prefect server start

export PREFECT_API_URL="http://127.0.0.1:4200/api"

Deploy the weekly DVC pipeline run (from the Docker container)

uv run src/prefect_flows.py

Test run the Prefect worker

# add a user group USER to the docker
sudo dscl . -append /Groups/docker GroupMembership $USER

prefect worker start --pool <YOUR-WORKER-POOL-NAME>

Create a flow run for deployment.

prefect deployment run 'etl-pipeline/deploy-etl-pipeline'

Ref.

Prefect dashboard: http://127.0.0.1:4200/dashboard.
Prefect official documentation - deploy via Python

EventlyAI Reports

Report

Contributing

Create your feature branch (git checkout -b feature/your-amazing-feature)
Create a feature.
Pull the latest version of source code from the main branch (git pull origin main) *Address conflicts if any.
Commit your changes (git add . / git commit -m 'Add your-amazing-feature')
Push to the branch (git push origin feature/your-amazing-feature)
Open a pull request

Flag #REFINEME for any improvement needed and #FIXME for any errors.

Pre-commit hooks

Pre-commit hooks runs hooks defined in the pre-commit-config.yaml file before every commit.

To activate the hooks:

Install pre-commit hooks:

uv run pre-commit install

Run pre-commit checks manually:

uv run pre-commit run --all-files

Pre-commit hooks help maintain code quality by running checks for formatting, linting, and other issues before each commit.

To skip pre-commit hooks

git commit --no-verify -m "your-commit-message"

Trouble Shooting

Common issues and solutions:

API key errors: Ensure all API keys in the .env file are correct and up to date. Make sure to add load_dotenv() on the top of the python file to apply the latest environment values.
Data warehouse connection issues: Check logs on AWS consoles, CloudWatch. Check if .env and Lambda's environment configuration are correct.
Memory errors: If processing large contracts, you may need to increase the available memory for the Python process.
Issues related to Python quit unexpectedly: Check this stackoverflow article.
reportMissingImports error from pyright after installing the package: This might occur when installing new libraries while VSCode is running. Open the command pallete (ctrl + shift + p) and run the Python: Restart language server task.

Ref. Repository Structure

.
└── .venv/              [.gitignore]    # stores uv venv
│
└── .github/                            # infrastructure ci/cd
│
└── .dvc/                               # dvc folder - cache, tmp, config
│
└── data/               [dvc track]     # version tracked by dvc
└── preprocessors/      [dvc track]     # version tracked by dvc
└── models/                             # stores serialized model after training and tuning
│     └──dfn/                           # deep feedforward network
│     └──gbm/                           # light gbm
│     └──en/                            # elastic net
│     └──production/    [dvc track]     # models to be stored in S3 for production use
└── reports/            [dvc track]     # reports on data drift, shap values
└── metrics/            [dvc track]     # model evaluation metrics (mae, mse, rmsle)
|
└── notebooks/                          # stores experimentation notebooks
│
└── src/                                # core functions
│     └──_utils/                        # utility functions
│     └──data_handling/                 # functions to engineer features
│     └──model/                         # functions to train, tune, validate models
│     │     └── sklearn_model
│     │     └── torch_model
│     │     └── ...
│     └──main.py                        # main script to preform inference locally (without dvc repro)
│
└── app.py                              # flask application (API endpoints)
│
└── tests/                              # pytest scripts and config
└── pytest.ini
│
└── pyproject.toml                      # project config
│
└── .env                [.gitignore]    # environment variables
│
└── uv.lock                             # dependency locking
│
└── .python-version                     # python version locking (3.12)
│
└── Dockerfile.lambda.local             # docker config
└── Dockerfile.lambda.production
└── .dockerignore
└── requirements.txt
│
└── dvc.yaml                            # dvc pipeline config
└── params.yaml
└── .dvcignore
└── dvc.lock
│
└── .pre-commit-config.yaml             # pre-commit check config
└── .synk                               # synk (dependency and code scanning) config

All images and contents, unless otherwise noted, are by the author.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.dvc		.dvc
.github/workflows		.github/workflows
data		data
metrics		metrics
models/production		models/production
preprocessors		preprocessors
reports		reports
src		src
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.env.sample		.env.sample
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.synk		.synk
Dockerfile.lambda.local		Dockerfile.lambda.local
Dockerfile.lambda.production		Dockerfile.lambda.production
LICENSE		LICENSE
README.md		README.md
app.py		app.py
buildspec.yml		buildspec.yml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
requirements_docker.txt		requirements_docker.txt
uv.lock		uv.lock

License

krik8235/ml-sales-prediction

Folders and files

Latest commit

History

Repository files navigation