Skip to content

Official implementation of Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models.

hukkai/transferable_mllm_attack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transferable Adversarial Attacks for Multimodal Large Language Models

Official implementation of Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models.

📄 Read the paper on arXiv

This repository contains the codebase used in our experiments.
Below is a simple instruction for how to run our code. Detailed instructions for setup and usage will be provided by Oct 30, 2025.

Quick start

  1. Prepare the dataset: create a data folder under this repo and run the following command under the data folder:
import os, kagglehub
path = kagglehub.dataset_download("google-brain/nips-2017-adversarial-learning-development-set")
os.system(f"mv {path} ./nips2017_adv_dev/")
  1. Extract features for ImageNet validation set images. First Link the ImageNet dataset (only the validation set is needed) to the data folder, and then run the following command
python3 utils/extract_feat.py --mdoel_id 0

You can also use our extracted features shared in google drive. Untar the file and put it under the data folder.

  1. Optimize the attack:
bash run.sh

Generated attacks will be saved at results/saved_folder/, where "saved_folder" is specified in batch_attack.py, for example s299_x9_eps8. Images with file name starting with "ema_" are the final outputs.

This google drive contains some generated images from our method.

About

Official implementation of Transferable Visual Adversarial Attacks for Proprietary Multimodal Large Language Models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published