Official implementation of Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models.
This repository contains the codebase used in our experiments.
Below is a simple instruction for how to run our code. Detailed instructions for setup and usage will be provided by Oct 30, 2025.
- Prepare the dataset: create a
datafolder under this repo and run the following command under thedatafolder:
import os, kagglehub
path = kagglehub.dataset_download("google-brain/nips-2017-adversarial-learning-development-set")
os.system(f"mv {path} ./nips2017_adv_dev/")
- Extract features for ImageNet validation set images.
First Link the ImageNet dataset (only the validation set is needed) to the
datafolder, and then run the following command
python3 utils/extract_feat.py --mdoel_id 0
You can also use our extracted features shared in google drive. Untar the file and put it under the data folder.
- Optimize the attack:
bash run.sh
Generated attacks will be saved at results/saved_folder/, where "saved_folder" is specified in batch_attack.py, for example s299_x9_eps8. Images with file name starting with "ema_" are the final outputs.
This google drive contains some generated images from our method.