RagVL

This is the official repo for the paper: "MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training".

Updates

[2024-09-20]: To better reflect the generality of our proposed method, we rename it to RagVL.
[2024-08-05]: Codes of RagVL (RagLLaVA) released.
[2024-07-31]: Paper of RagVL (RagLLaVA) online.

Getting Started

Environment Setup

The required libraries for running RagVL can be found in requirements.txt. We recommend following LLaVA to configure your environment.

Data Preparation

Before running RagVL, please:

Download from Google Drive for datasets and checkpoints.
Download from WebQA and MultimodalQA for image files.
Unzip the file. Place the checkpoints/ and datasets/ into RagVL/.
Place the tasks/ into RagVL/finetune/.
Place the MMQA_imgs/ and train_img/ into RagVL/finetune/tasks/.
Place the val_image/ into RagVL/datasets/.

Training

Reranker

Models	Global Batch Size	Epochs
LLaVA-v1.5-13B	16	2 (WebQA) / 1 (others)
Qwen-VL-Chat	16	2 (WebQA) / 1 (others)
mPLUG-Owl2	16	2 (WebQA) / 1 (others)
InternVL2-1B	16	1
InternVL2-2B	16	1

Generator

Models	Global Batch Size	Epochs
LLaVA-v1.5-13B	16	2 (WebQA) / 3 (MMQA)
InternVL2-1B	16	1
InternVL2-2B	16	1

Except for the above two hyperparameters, the others follow the default settings from different models.

To finetune LLaVA-v1.5-13B, Qwen-VL-Chat, and mPLUG-Owl2, find the corresponding finetune script in RagVL/finetune/scripts/.

To finetune InternVL2-1B and InternVL2-2B, find the corresponding finetune script in RagVL/internvl_chat/shell/internvl2.0/2nd_finetune.

Evaluation

To evaluate RagVL on WebQA / MultimodalQA, you can employ the following command:

python webqa_pipeline.py \  # same arguments on mmqa_pipeline.py
--reranker_model caption_lora \ # select the reranker
--generator_model noise_injected_lora \ # select the generator
--filter 0 \ # select the adaptive threshold
--clip_topk 20 \ # we first retrieve 20 candidates by default

To evaluate the oracle settings on WebQA / MultimodalQA, you can employ the following command:

python webqa_oracle.py \  # same arguments on mmqa_oracle.py

Citation

If you are interested or inspired by this work, you can cite us by:

@article{chen2024mllm,
  title={MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training},
  author={Chen, Zhanpeng and Xu, Chengjin and Qi, Yiyan and Guo, Jian},
  journal={arXiv preprint arXiv:2407.21439},
  year={2024}
}

Related Projects

LLaVA: Large Language and Vision Assistant
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
InternVL: A Pioneering Open-Source Alternative to GPT-4o
Visualized BGE: A universal multi-modal embedding model
VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
CAL: Prioritizing Visual Correlation by Contrastive Alignment

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
assets		assets
finetune/scripts		finetune/scripts
internvl_chat		internvl_chat
llava		llava
mplug_owl2		mplug_owl2
qwenvl		qwenvl
utils		utils
vcd_utils		vcd_utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
flickr30k_pipeline.py		flickr30k_pipeline.py
merge_lora.py		merge_lora.py
mmqa_oracle.py		mmqa_oracle.py
mmqa_pipeline.py		mmqa_pipeline.py
mscoco_pipeline.py		mscoco_pipeline.py
requirements.txt		requirements.txt
webqa_oracle.py		webqa_oracle.py
webqa_pipeline.py		webqa_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RagVL

Updates

Getting Started

Environment Setup

Data Preparation

Training

Evaluation

Citation

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

DataArcTech/RagVL

Folders and files

Latest commit

History

Repository files navigation

RagVL

Updates

Getting Started

Environment Setup

Data Preparation

Training

Evaluation

Citation

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages