Labspace - Fine tuning

This Labspace provides a hands-on walkthrough on fine-tuning models using Docker Offload, Docker Model Runner, and Unsloth.

Learning objectives

This Labspace will teach you the following:

Use Docker Offload to fine tune a model
Package and share the model on Docker Hub
Run the custom model with Docker Model Runner

Launch the Labspace

To launch the Labspace, run the following command:

docker compose -f oci://dockersamples/labspace-fine-tuning up -d

And then open your browser to http://localhost:3030.

Using the Docker Desktop extension

If you have the Labspace extension installed (docker extension install dockersamples/labspace-extension if not), you can also click this link to launch the Labspace.

Acknowledgements

This repository contains an example of fine-tuning a language model for PII (Personally Identifiable Information) masking using the Unsloth framework.

Special thanks to AI4Privacy for providing the comprehensive PII masking dataset that makes this fine-tuning example possible.

Dataset Source

The training dataset data/training_data.json used in this example was created from the AI4Privacy PII Masking 400k Dataset:

Original Dataset: ai4privacy/pii-masking-400k
Dataset Description: World's largest open dataset for privacy masking with 406,896 entries
Languages Supported: English, Italian, French, German, Dutch, Spanish
PII Classes: 17 different types of PII in the public dataset

Licensing and Attribution

Important License Notice

The original dataset is provided by AI4Privacy under a custom license with the following requirements:

Academic Use: Encouraged with proper citation
Commercial Use: Requires licensing from AI4Privacy
- Contact: [email protected]
Attribution: This project uses data derived from the ai4privacy/pii-masking-400k dataset

Citation

If you use this dataset or derivatives in academic work, please cite:

@dataset{ai4privacy_pii_masking_400k,
  title={PII Masking 400k Dataset},
  author={AI4Privacy},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/ai4privacy/pii-masking-400k}
}

Data Preparation Process

The dataset was prepared using the prepare_pii_masking_for_unsloth.py script, which:

Loads the original dataset from HuggingFace: ai4privacy/pii-masking-400k
Transforms the data into a format suitable for fine-tuning with Unsloth
Creates instruction-following pairs with two modes:
- Redaction mode: Maps source text to masked text with PII labels
- Spans mode: Extracts PII spans with their positions and labels

Using the Data Preparation Script

# Basic usage - creates redaction mode dataset
python prepare_pii_masking_for_unsloth.py --outdir data_out

# Filter by languages (e.g., English and Spanish)
python prepare_pii_masking_for_unsloth.py --langs en es --outdir data_out_en_es

# Filter by locales (e.g., US and GB)
python prepare_pii_masking_for_unsloth.py --locales US GB --outdir data_out_us_gb

# Use spans mode instead of redaction
python prepare_pii_masking_for_unsloth.py --mode spans --outdir spans_out

# Add custom EOS token
python prepare_pii_masking_for_unsloth.py --outdir data_out --eos "</s>"

# Sample a subset for testing
python prepare_pii_masking_for_unsloth.py --outdir data_out --sample_train 1000 --sample_val 200

Fine-Tuning Process

The finetune.py script demonstrates how to:

Load a pre-trained model (Gemma-3-270M-IT)
Prepare the dataset for instruction fine-tuning
Configure LoRA adapters for efficient training
Train the model using the SFTTrainer from TRL
Save the fine-tuned model

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
.labspace		.labspace
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
finetune.py		finetune.py
labspace.yaml		labspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Labspace - Fine tuning

Learning objectives

Launch the Labspace

Using the Docker Desktop extension

Acknowledgements

Dataset Source

Licensing and Attribution

Important License Notice

Citation

Data Preparation Process

Using the Data Preparation Script

Fine-Tuning Process

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dockersamples/labspace-fine-tuning

Folders and files

Latest commit

History

Repository files navigation

Labspace - Fine tuning

Learning objectives

Launch the Labspace

Using the Docker Desktop extension

Acknowledgements

Dataset Source

Licensing and Attribution

Important License Notice

Citation

Data Preparation Process

Using the Data Preparation Script

Fine-Tuning Process

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages