FACTOR

This repo contains data from AI21 Labs' paper Generating Benchmarks for Factuality Evaluation of Language Models.

Data

We include the following FACTOR benchmarks for evaluating factuality of language models:

WIKI-FACTOR: Based on the Wikipedia section of The Pile’s) validation split. The dataset consists of 2994 examples.
NEWS-FACTOR: Based on Reuters articles extracted from The RefinedWeb Dataset. The dataset consists of 1036 examples.
EXPERT-FACTOR: Based on the validation and test splits of ExpertQA, a long-from question answering dataset. The benchmark consists of 236 examples.

Evaluation

Setup

To install the required libraries in our repo, run:

pip install -r requirements.txt

To have a Pytorch version specific to your CUDA, install your version before running the above command.

List of Language Models

In the paper, we give the results for the following models (replace $MODEL_NAME with one of those).

GPT-2: gpt2, gpt2-medium, gpt2-large, gpt2-xl
GPT-Neo: EleutherAI/gpt-neo-1.3B, EleutherAI/gpt-neo-2.7B, EleutherAI/gpt-j-6B
OPT: facebook/opt-125m, facebook/opt-350m, facebook/opt-1.3b, facebook/opt-2.7b, facebook/opt-6.7b, facebook/opt-13b, facebook/opt-30b, facebook/opt-66b

Evaluation Script

To run evaluation on models over FACTOR datasets, please use the following command:

python python eval_factuality.py \
--data_file ./data/wiki_factor.csv \
--output_folder $OUTPUT_DIR \
--model_name $MODEL_NAME

License

wiki_factor, expert_factor and code: Released under the MIT license.
news_factor: The benchmark is derived from The RefinedWeb Dataset. The public extract is made available under an ODC-By 1.0 license; users should also abide to the CommonCrawl ToU: https://commoncrawl.org/terms-of-use/.

Citation

If you find our paper or code helpful, please cite our paper:

@article{muhlgay2023generating,
  title={Generating benchmarks for factuality evaluation of language models},
  author={Muhlgay, Dor and Ram, Ori and Magar, Inbal and Levine, Yoav and Ratner, Nir and Belinkov, Yonatan and Abend, Omri and Leyton-Brown, Kevin and Shashua, Amnon and Shoham, Yoav},
  journal={arXiv preprint arXiv:2307.06908},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
FACTOR_NEWS_LICENSE.md		FACTOR_NEWS_LICENSE.md
LICENSE		LICENSE
README.md		README.md
eval_factuality.py		eval_factuality.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

FACTOR

Data

Evaluation

Setup

List of Language Models

Evaluation Script

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

AI21Labs/factor

Folders and files

Latest commit

History

Repository files navigation

FACTOR

Data

Evaluation

Setup

List of Language Models

Evaluation Script

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages