FuzzAug

Official code release for paper FuzzAug: Data Augmentation by Fuzzing for Neural Test Generation (EMNLP 2025 Findings).

TL;DR: FuzzAug is a coverage-guided data augmentation method that brings fuzzing's diversity and valid testing semantics to LLM-based unit test generation. By doubling training data with diverse, semantically meaningful tests, especially for newer languages like Rust that training resource is relatively limited, FuzzAug significantly outperforms baselines and shows how dynamic analysis priors can boost LLM-based unit test generation.

Repository Organization

directory	purpose
`fuzz`	scripts for transforming fuzz targets and collecting inputs
`evaluation`	scripts for run-time evaluation metrics
`training`	script for model training
`UniTSyn`	`UniTSyn` functions for collecting focal and source pairs
`tests`	unit tests for the important modules

Setup

Python

We use Python 3.11. We recommend using uv to manage your Python dependencies.

cd FuzzAug
uv sync # create a virtual environment, and install dependencies
source .venv/bin/activate

UniTSyn

We depend on UniTSyn, which is already included as a submodule.

git submodule init
git submodule update

Then, please install the dependencies for UniTSyn:

uv pip install -r UniTSyn/requirements.txt

Environment Variables

Both env.sh for this project and UniTSyn are required

pushd UniTSyn
cd UniTSyn && source ./scripts/env.sh && cd .. && source ./env.sh

or just do ./init.sh, which will do everything above.

Installing Rust and Cargo Fuzz

Get Rust and rustup

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Switch to nightly (required by cargo fuzz)

rustup install nightly
rustup default nightly

Install cargo fuzz

cargo install cargo-fuzz

Install rust-fuzz-gen to convert fuzz targets to unit test format.

cargo install --git https://github.com/SecurityLab-UCD/rust-fuzzer-gen.git

Download Rust Repos

cd UniTSyn
python scripts/download_repos.py -r ../data/repo_meta/rust.txt --oroot ../data/repos --decompress=True --oauth=<your token>

Collecting Rust Fuzzing Data from Repos

fuzz/collect_fuzz.py is used to collect fuzzing data from the Rust fuzzing corpus. The pipeline is as follows:

transform: transform the fuzz_target! to print the input to stdout and get test template,
build: build the fuzzing target in each repo, cargo fuzz build
fuzz: fuzz the target in each repo, cargo fuzz run <target>
testgen: substitute the input to the test template and get the test code

python fuzz/collect_fuzz.py --repo_id data/repo_meta/rust.txt -p all
cd UniTSyn
mkdir -p data/focal data/source
python frontend/rust/collect_all.py --repo_id ../data/repo_meta/rust.txt --repo_root ../data/rust_repos --fuzz True
python main.py --language rust --repo_root ../data/rust_repos

Coverage

To evaluate coverage of unit tests, the following dependencies are required:

cargo install grcov
rustup component add llvm-tools-preview

Model Training and Inference

Fine-tuning

python training/train.py \
    --dataset_path data/fuzz100.jsonl \
    --model_name "codellama/CodeLlama-7b-hf" \
    --run_name "fuzzcoder" \
    --max_steps=100 \
    --save_path saved_models/fuzzcoder \
    --lora True

Inference

python training/generate.py \
    --model_name "codellama/CodeLlama-7b-hf" \
    --checkpoint saved_models/fuzzcoder/checkpoint-100 \
    -i data/humaneval_rust.jsonl \
    -o generated.jsonl

Citing this Paper

@inproceedings{he2025fuzzaug,
    author = {He, Yifeng and Wang, Jicheng and Rong, Yuyang and Chen, Hao},
    title = {FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation},  
    booktitle = {Conference on Empirical Methods in Natural Language Processing},
    date = {2025-11-05/2025-11-09},
    address = {Suzhou, China},
}

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.github/workflows		.github/workflows
UniTSyn @ 0d9e0df		UniTSyn @ 0d9e0df
data		data
evaluation		evaluation
fuzz		fuzz
modeling		modeling
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
env.sh		env.sh
image_build_results.json		image_build_results.json
init.sh		init.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FuzzAug

Repository Organization

Setup

Python

UniTSyn

Environment Variables

Installing Rust and Cargo Fuzz

Download Rust Repos

Collecting Rust Fuzzing Data from Repos

Coverage

Model Training and Inference

Fine-tuning

Inference

Citing this Paper

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

SecurityLab-UCD/FuzzAug

Folders and files

Latest commit

History

Repository files navigation

FuzzAug

Repository Organization

Setup

Python

UniTSyn

Environment Variables

Installing Rust and Cargo Fuzz

Download Rust Repos

Collecting Rust Fuzzing Data from Repos

Coverage

Model Training and Inference

Fine-tuning

Inference

Citing this Paper

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages