Skip to content
Albert Gong edited this page Jun 29, 2025 · 20 revisions

PhantomEval supports evaluation of RAG methods via FlashRAG and vLLM.

FlashRAG

FlashRAG should automatically be installed by pip install phantom-wiki[eval]. Assuming it's installed, please follow these steps to generate a corpus:

  1. Save the corpus in .jsonl format using the following command:
python examples/flashrag/save_as_jsonl.py --dataset DATASET --split_list SPLIT_LIST

Tip

SPLIT_LIST can be a single split name or a list of split names. If loading a dataset from a local directory (e.g., wiki-v1-easy), additionally pass in the --from_local flag.

For each split in SPLIT_LIST, a .jsonl file will be generated at indexes/<split>.jsonl containing the corpus data and second .json file will be generated at dataset/<split>.jsonl containing the QA pairs.

  1. Construct a BM25 index using the following command:
python -m flashrag.retriever.index_builder \
    --retrieval_method bm25 \
    --corpus_path indexes/SPLIT.jsonl \
    --bm25_backend bm25s \
    --save_dir indexes/SPLIT
  1. Run evaluation using phantom_eval, now specifying --index_path and --corpus_path:
python -m phantom_eval \
    --server SERVER -m MODEL_NAME --inf_vllm_offline \
    --dataset DATASET --split_list SPLIT \
    --method cot-rag \
    --retrieval_method bm25 
    --index_path indexes/SPLIT/bm25 
    --corpus_path indexes/SPLIT.jsonl
    --output_dir OUTPUT_DIR

Tip

We also provide BM25 indexes for all splits in the kilian-group/phantom-wiki-v1 dataset at https://huggingface.co/datasets/kilian-group/phantom-wiki-v1-index.

References:

vLLM

Clone this wiki locally