Skip to content

alan-turing-institute/ARC-MLFlow

Repository files navigation

ARC-MLFlow

Pre-requisites

brew install azure-cli
az login

Deployment

MLFlow Container Build

Azure Deployment

az login
cd azure-deployment
bash deploy.sh

Delete the Deployment (and all data!)

az group delete --name $RESOURCE_GROUP

Where $RESOURCE_GROUP is the name of the resource group where you deployed MLFlow on Azure.

Using MLFlow

Python Dependencies

uv sync

The main ones are:

  • mlflow: The Python library for interacting with a MLFlow server
  • psutil, "nvidia-ml-py: If you want to log system (CPU, GPU respectively) stats with your job
  • azure-storage-blob: If you want to log artifacts (files, e.g. models), as these are stored in an Azure blob.
  • hyperopt: Is the package MLFlow recommends for hyperparameter sweeps.

The rest of the dependencies in pyproject.toml are just for the examples.

MLFlow Environment Variables

export MLFLOW_TRACKING_URI="<TRACKING_URI>"

e.g.

export MLFLOW_TRACKING_URI="http://$(az container show --resource-group ${RESOURCE_GROUP} --name mlflow-aci --query ipAddress.fqdn -o tsv):5000/"

or mlflow.set_tracking_uri("http://0.0.0.0:5000")

Logging Artifacts

If you want your script to save artifacts (files, models etc.), the default location for the MLFlow server is an Azure blob. To be able to do this you must have the connection string for the Azure blob set as an environment variable where your script is running. You can get it as follows:

export AZURE_STORAGE_CONNECTION_STRING=$(az storage account show-connection-string \
    --name $STORAGE_ACCOUNT_NAME \
    --resource-group $RESOURCE_GROUP \
    --output tsv)

If you want to log an artifact locally instead, you should be able to do so by setting the artifact_location when creating the MLFlow experiment you are logging results to, e.g. mlflow.create_experiment("experiment_name", artifact_location="/your/local/path").

Examples

The scripts in mlflow-examples give a few examples of using MLFlow:

  • uv run mlflow-examples/hello.py: Basic logging of a parameter, metric, and artifact.
  • uv run mlflow-examples/train.py: Automated logging of metrics and models with the HuggingFace transformers Trainer
  • uv run mlflow-examples/sweep.py: A hyperparameter sweep.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages