This is the repository for a group project for the course CSCE 689: Vision Foundation Models at Texas A&M University. The project itself is described in detail in the PDF report. The project was inspired by the paper "Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards". Furthermore, the scripts used for training, ddpo_aesthetic.py
and ddpo_prominence.py
, are both based on the example DDPO script featured in the TRL library. Finally, the code makes use of GroundingDINO.
The script ddpo_aesthetic.py is used to train a LoRa adapter with DDPO on the aesthetic reward with color score (the color score can be disabled). Checkpoints will be periodically created (every 10 epochs by default), and each checkpoint will include a LoRa adapter which can be used for image generation in basic_inference.py
or full_eval.py
. The final model after all epochs will be saved the final_model directory (you still only need the LoRa adapter to do inference). In order to see the average reward obtained on each epoch (which is useful for deciding which checkpoint to ultimately keep) and save tensorboard logs, you need to include the command line flag --do_logging
. Assuming that you use the default project_dir argument, the tensorboard logs which contain the rewards will be found in save_aesthetic/logs/. To train under the same settings used to obtain the results in our report, you can simply use the default values for all of the command line arguments that go into the DDPOConfig
.
The script ddpo_prominence.py is used to train on the prominence reward and functions similar to ddpo_aesthetic.py; however, to use the prominence reward you must clone this repo and install the groundingdino package from it. The script ddpo_prominence.py has a required argument, which is the path to the cloned repository. To train under the same settings used to obtain the results in our report, you can simply use the default values for all of the command line arguments that go into the DDPOConfig
.
To test image generation using a trained LoRa adapter, run basic_inference.py and provide the path to the LoRa adapter. Make sure that the prompts.txt file generated using gen_prompt_dataset.py is in the same directory you run the script from. The prompts used to generate the images come from prompts.txt, and you can specify how many to use with the num_prompts command line argument.
To observe the effects of fine-tuning both qualitatively and quantitatively, the script full_eval.py can be run which generates a certain number of images (specified by the num_copies argument) for each prompt and evaluates the reward for both images according to the aesthetic and prominence reward models. The average per-prompt rewards are logged separately for the train and test prompts and are saved to a text file. If the save_imgs flag is used, then the image with the highest reward for each prompt will be saved in the folder corresponding to that reward; these folders can be found in the evaluation_images directory. Note that the full_eval script has two required arguments: a name used to identify the model which gave the results and the path to the cloned GroundingDINO repository. The optional lora_path argument specifies which adapter obtained from ddpo_aesthetic.py or ddpo_prominence.py to use, and if none is provided then the evaluation will be run on the vanilla StableDiffusion V1.5 model.