Skip to content
/ NAIP Public

This repository contains the official implementation for the AAAI25 paper "From Words to Worth: Newborn Article Impact Prediction with LLM".

Notifications You must be signed in to change notification settings

ssocean/NAIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAIP Logo

Framework for newborn article impact & quality estimation.

Overview Hugging Face Spaces

NAIP Framework Overview

The NAIP series uses fine-tuned LLMs to quickly predict the **impact** or **quality** of articles based on their internal content.
Version Input Output Model Weights Homepage Paper
v1 Title & Abstract Impact Estimation (0–1) Link Link AAAI 2025
v2 Title & Abstract Quality Estimation Link Link arXiv

🚀 Update Log

  • 250930 – Introducing NAIPv2: extending the series with an emphasis on quality estimation.
  • 241210 - The paper has now been accepted by AAAI 2025!
  • 241204 - Huggingface Spaces Support🥰
    • We've set up an online demo on Hugging Face Spaces—now you can easily give it a try without writing a single line of code!
  • 241126 - V1.0 We’re thrilled to announce the end of Early Access and the official release of V1.0! ✨
    • The codebase is now more organized and easier to navigate! 🧹
    • Updated and streamlined README with detailed instructions for setup and usage. 💡
    • Decoupling the dataset, more LoRa adapters weight download links, and more! 🔄
    • Known Issues: The functionality for building the NAID dataset has not been tested on other machines, which may lead to potential issues. We plan to replace this function with a more powerful framefowk in our another codebase.
  • 240808 - Eerly Access
    • We have released the Early Access version of our code!

Quick Deployment (for most researchers)

First, pull the repo and type following commands in the console:

git clone https://github.com/ssocean/NAIP.git
cd NAIP
pip install -r requirements.txt
  • To try v1, please use demo_v1.py.
  • To try v2, please use demo_v2.py.
  • You may need to download the corresponding model weights.
  • When providing the title and abstract, please avoid line breaks, LaTeX symbols, or other special formatting.

Reproducing NAIPv1 (optional)

The following instructions are outdated. We are undergoing a major code refactoring. An updated version will be released after 2025.10.7.

Fine-tuning

For fine-tuning, you may manually modify the 'xxxForSequenceClassification' in the transformers package (see llama_for_naip/NAIP_LLaMA.py for more details). Or follow the instruction to use custom code.

Then, prepare train.sh bash file like below:

DATA_PATH="ScImpactPredict/NAID/NAID_train_extrainfo.csv"
TEST_DATA_PATH="ScImpactPredict/NAID/NAID_test_extrainfo.csv"

OMP_NUM_THREADS=1 accelerate launch offcial_train.py \
    --total_epochs 5 \
    --learning_rate 1e-4 \
    --data_path $DATA_PATH \
    --test_data_path $TEST_DATA_PATH \
    --runs_dir official_runs/LLAMA3 \
    --checkpoint  path_to_huggingface_LLaMA3

Finally, type sh train.sh in the console. Wating for the training ends~

Testing

Similar to fine-tuning, prepare test.sh as below:

python official_test.py \
 --data_path NAIP/NAID/NAID_test_extrainfo.csv \
 --weight_dir path_to_runs_dir

Then, type sh test.sh.

Reproducing NAIPv2 (optional)

Preliminary code and dataset are released at ./v2_resource, detailed instructions will be released after 2025.10.7. 🚀 (Core team members are on vacation 🏖️)

🛠️ Technical Support

If you would like to conduct comparison experiments with NAIP but encounter difficulties in setting up the environment or reproducing the code, we provide free technical support.

Simply send us a .csv file containing the "title" and "abstract" fields, and we will return the prediction results to you.

  • In urgent cases, results can be provided within one day.
  • This service is free of charge and intended to facilitate fair, reproducible comparisons in research.

📩 Please contact us via [[email protected]].

📚 Citation

If you find this work useful, please cite:

@article{Zhao2024NAIP,
  title={From Words to Worth: Newborn Article Impact Prediction with LLM},
  author={Penghai Zhao and Qinghua Xing and Kairan Dou and Jinyu Tian and Ying Tai and Jian Yang and Ming-Ming Cheng and Xiang Li},
  journal={ArXiv},
  year={2024},
  volume={abs/2408.03934},
  url={https://api.semanticscholar.org/CorpusID:271744831}
}

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •