Toursim dialogue system not only needs to chat with the user and but also understand and extract the entities from dialogues for making travel plan. This repository provides a tool to automatically extract the entities critical to making travel plan such as the travel destination and activities.
We provide an web page to interact with the entity extraction model. To use this interface, you need clone this repository
git clone [email protected]:mynlp/TourismDialogue.gitThen start the web page server with the following command.
bash run_server.shThe web page could be accessed with the link
http://127.0.0.1:9200In addition, we also provide script to extract
entities in a command line. To extract
entities from your own data, you could replace
the test file with your own in test_seq2seq3 of
run.sh and run the following command.
bash run.sh testTo train the model, you need prepare the dataset by splitting the dataset into train, dev, and test set.
bash run.sh split_datasetThen run the extract2.py script to generate
the training instances.
cd dataset
bash run.sh create_datasetFinally train the model by running the train.py
bash run.sh trainTo train the model on customized dataset, you need prepare the dataset in the following format. each utterance within the dialogue should be like
{
"utterance": "えーとー、ちょっと紅葉が綺麗な所に行きたいんですけども。",
"speaker": "customer",
"annotation": [{
"query": {
"大ジャンル": "遊ぶ",
"キーワード": "紅葉"
}
}]
}The performance of the model is evaluated on the
held out test set of the toruism dialogue
dataset. Three foundation models are evaluated
and two different decoding strategies are tested.
The results are exhibited in the following
table.
