pytod is a library which provides a simulated environment for the Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2019). It simulates SGD APIs, including database responses and API behavior, according to the complex policies inherent in the dataset, providing a resource for conversational tool-use and zero-shot end-to-end task-oriented dialogue research.
Accompanies the paper PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback.
In order to set up the necessary environment:
- While in the
pytod
root directory, run the following commands:
pipenv install
NOTE: The pipenv environment has
pytod
installed in editable mode. Some changes, e.g. insetup.cfg
, might require you to runpipenv install.
again. Alternatively, simply usepipenv update
to add new dependencies to the virtual environment.
To activate the project's virtualenv run
pipenv shell
and you can run commands inside the virtual environment with
pipenv run
Add
eval "$(_PIPENV_COMPLETE=zsh_source pipenv)"
to ~/.zshrc
to get shell completion.
- For contributing, install the developer requirements via
pipenv install --dev tox sphinx pre-commit pytest
- install several [pre-commit] git hooks with:
and checkout the configuration under
pre-commit install # You might also want to run `pre-commit autoupdate`
.pre-commit-config.yaml
. The-n, --no-verify
flag ofgit commit
can be used to deactivate pre-commit hooks temporarily.
- Download the Schema Guided Dialogue (SGD) dataset [1].
chmod +X scripts/fetch_data.sh && bash scripts/fetch_data.sh
This step should only be performed once upon installation. Run
chmod +X scripts/pytod_setup.sh && bash scripts/pytod_setup.sh
This will apply various corrections to the data, build databases and API simulations, normalisation tables and so on.
You can convert the SGD training set dialogue files to PyTOD transcripts, with the following command
pipenv run python scripts/create_pytod_dialogues.py \
filters=v0.6.1 version=v0.9.1 interpreter=v0.5 \
interpreter/backend/actions=v0.3.4 \
interpreter.backend.show_values_samples_in_slot_filling_hints=false \
interpreter.backend.include_slot_value_references_in_confirmation_hints=false \
interpreter.backend.include_slot_value_references_in_alternative_hints=false \
interpreter.backend.multiple_confirmation_hints=true \
interpreter.user.reference_command_for_wildcard_carryover=true \
interpreter.user.value_carryover='natural_language' \
interpreter.user.user_task_retry_instruction_format=call \
interpreter.user.user_query_retry_instruction_format=call \
interpreter.copy_selected_entities_to_search=true \
interpreter.resolve_carry_over_to_entity=true \
split=train --config-name=v3
Set split=dev
and split=test
to process the development and test sets.
To print the transcripts specify one or multiple PyTOD transcripts in human-readable format use the display_pytod
endpoint:
pipenv run display_pytod split=dev ids="['1_00000']" version=v0.9.1
The ids
are the transcript IDs, which match the dialogue_id
in the SGD dialogue_*.json
files. This command will display a transcript as follows:
user: I want to make a restaurant reservation for 2 people at half past 11 in the morning. x1 restaurants_2_reserve_restaurant(time = 'half past 11 in the morning', number_of_seats = 2) x2 Hint('request value: restaurant_name') x3 Hint('request value: location') x4 say(x2, x3) agent: What city do you want to dine in? Do you have a preferred restaurant? user: Please find restaurants in San Jose. Can you try Sino? x5 x1.location = 'San Jose'; x1.restaurant_name = 'Sino' x6 Hint('ask the user to confirm: date') x7 Hint('ask the user to confirm: number_of_seats') x8 Hint('ask the user to confirm: restaurant_name') x9 Hint('ask the user to confirm: time') x10 Hint('ask the user to confirm: location') x11 say(x6, x7, x8, x9, x10) agent: Confirming: I will reserve a table for 2 people at Sino in San Jose. The reservation time is 11:30 am today. user: Yes, thanks. What's their phone number? x12 confirm(x1) x13 perform(x1) x14 say(x1.phone_number, x13) agent: Your reservation has been made. Their phone number is 408-247-8880. user: What's their address? Do they have vegetarian options on their menu? x15 say(x1.has_vegetarian_options, x1.address) agent: The street address is 377 Santana Row #1000. They have good vegetarian options. user: Thanks very much. x16 conversation_pause() x17 Hint('ask the user if they require further assistance') x18 say(x17) agent: Is there anything else I can help you with? user: No, that's all. Thanks. x19 say() agent: Have a great day.
If you wish to finetune PyTOD, you may run the following commands to obtain:
- Action Parser Examples
pipenv run python scripts/create_pytod_text2text_examples.py \
split=train version=v0.9.1 patch_version=1 \
text2text_conversion=rendered_entities_resolved_values \
history_processor=rendered_entities \
multidomain_prompts=true \
debug=true
and set split=dev
and split=test
to process the other splits.
- Parser supervisor examples
pipenv run python scripts/create_nlu_text2text_examples.py \
split=train formatter.randomise_prompt_elements=true \
formatter.unk_slot_probability=0.5 \
formatter.similar_slot_probability=0.5 \
version=v0.9.1 patch_version=1
You should then finetune your model on both datasets.
The simulation environment is implemented in the simulation
package. A service is defined as a collection of
APIs which can be search APIs which provide interfaces to databases (e.g., FindBus
) or APIs that allow users
to execute transactions (BuyBusTicket
). An SGD service is implemented as follows:
@register_command(service="Events_1") class FindEvents(SearchCommand): category: SearchCommandArgument[str] = SearchCommandArgument() subcategory: SearchCommandArgument[str] = SearchCommandArgument() city_of_event: SearchCommandArgument[str] = SearchCommandArgument() date: SearchCommandArgument[str] = SearchCommandArgument() def __init__(self, dialogue_id: DialogueID): super().__init__(dialogue_id) @register_command(service="Events_1") class BuyEventTickets(ConfirmedCommand): event_name: ConfirmedCommandArgument[str] = ConfirmedCommandArgument() number_of_seats: ConfirmedCommandArgument[str] = ConfirmedCommandArgument() date: ConfirmedCommandArgument[str] = ConfirmedCommandArgument() city_of_event: ConfirmedCommandArgument[str] = ConfirmedCommandArgument() def __init__(self, dialogue_id: DialogueID): super().__init__(dialogue_id)
Here SearchCommand
and ConfirmedCommand
implement the SGD policy graph. In other words, when called, these objects return
the system actions (e.g, Hints
) which guide agents according to the SGD policy. SearchCommandArgument
and ConfirmedCommandArgument
are descriptors, which
perform post-processing operations which convert LM outputs to schema compatible values (e.g., type coercion) and can be configured to provide feedback.
The package contains:
- Simulations for the SGD transactional APIs (
api_driver.py
) - A
mongoquery
database simulation (database.py
) for all the SGD databases - Code to parse entities from
json
dicts topython
objects
This package implements the PyTOD
agent. The execute_instructions
method takes as input a list of program
statements generated by the action parser, returning a list of system actions that guide the agent according to the SGD policy.
We provide the OfflineSessionExecutor
helper which takes as an input a PyTOD transcript and returns the conversation state
the format required by the official DSTC8 evaluator. The commands
pipenv run python scripts/evaluate_pytod.py split=dev version=v0.9.1
pipenv run python scripts/evaluate_pytod.py split=test version=v0.9.1
use it to execute all the ground truth PyTOD transcripts for the development and test sets.§
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Directory for Sphinx documentation in rst or md.
├── models <- Trained and serialized models, model predictions,
│ or model summaries.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for
│ ordering), the creator's initials and a description,
│ e.g. `1.0-fw-initial-data-exploration`.
├── Pipfile <- For virtual environment management, contains abstract dependencies
├── Pipfile.lock <- Precise description of the dependency tree, enables recreating the environment elsewhere
├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .`
│ to install for development or to build `tox -e build`.
├── references <- Data dictionaries, manuals, and all other materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated plots and figures for reports.
├── resources
│ └── lexical <- String equivalence maps, including semantically aware ones
│ └── mining_results <- outputs of data mining steps necessary to understand annotation patterns
│ └── sgd
│ └── index <- Tree splitting conversations according to the conversation structure
├── scripts <- Analysis and production scripts which import the actual PYTHON_PKG,
│ └── data_mining <- Scripts for analysing annotation patterns
│ └── scratches <- quick & dirty ones
├── setup.cfg <- Declarative configuration of your project.
├── setup.py <- [DEPRECATED] Use `python setup.py develop` to install for
│ development or `python setup.py bdist_wheel` to build.
├── src
│ └── pytod
│ └── apps <- CLI entry points implementation
│ └── configs <- configurations library scripts and apps
│ └── apps <- endpoints configs
│ └── pytod <- PyTOD transcript generation configuration group
│ └── pytod_finetuning <- configs for converting data to text2text format for PyTOD finetuning
│ └── pytod_setup <- configs for scripts that prepare the simulation environment
│ └── evaluation <- DSTC8 evaluator code, adapted for PyTOD evaluation.
│ └── execution <- Implementation of the execution engine
│ └── interpreter <- PyTOD interpreter, converts SGD annotation to PyTOD transcripts
│ └── parser <- pydantic validators, used by the dialogue manager to ensure AP outputs are well-formed and syntactically correct
│ └── prompting <- formatter classes for converting transcripts from structured to text-to-text (source-target) format
│ └── simulation <- implements simulated environment for SGD, simulation of the policy graph
│ └── services <- concrete service implementations
│ └── toolbox <- Alternative SGD schema representation, used by the PyTOD interpreter
│ └── pytod_types <- `pydantic` classes, defining the data model used by the `pytod` interpreter
├── tests <- Unit tests which can be run with `pytest`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.