Skip to content
/ ml-pytod Public

A simulated environment for the Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2019) accompanying "PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback".

License

Notifications You must be signed in to change notification settings

apple/ml-pytod

Repository files navigation

Project generated with PyScaffold

pytod

pytod is a library which provides a simulated environment for the Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2019). It simulates SGD APIs, including database responses and API behavior, according to the complex policies inherent in the dataset, providing a resource for conversational tool-use and zero-shot end-to-end task-oriented dialogue research.

Accompanies the paper PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback.

Installation

In order to set up the necessary environment:

  1. While in the pytod root directory, run the following commands:
pipenv install

NOTE: The pipenv environment has pytod installed in editable mode. Some changes, e.g. in setup.cfg, might require you to run pipenv install. again. Alternatively, simply use pipenv update to add new dependencies to the virtual environment.

To activate the project's virtualenv run

pipenv shell

and you can run commands inside the virtual environment with

pipenv run

Add

eval "$(_PIPENV_COMPLETE=zsh_source pipenv)"

to ~/.zshrc to get shell completion.

  1. For contributing, install the developer requirements via
pipenv install --dev tox sphinx pre-commit pytest
  1. install several [pre-commit] git hooks with:
    pre-commit install
    # You might also want to run `pre-commit autoupdate`
    and checkout the configuration under .pre-commit-config.yaml. The -n, --no-verify flag of git commit can be used to deactivate pre-commit hooks temporarily.

Downloading the data

  1. Download the Schema Guided Dialogue (SGD) dataset [1].
chmod +X scripts/fetch_data.sh && bash scripts/fetch_data.sh

Setting up the pytod environment

This step should only be performed once upon installation. Run

chmod +X scripts/pytod_setup.sh && bash scripts/pytod_setup.sh

This will apply various corrections to the data, build databases and API simulations, normalisation tables and so on.

Converting SGD dialogue files to PyTOD transcripts

You can convert the SGD training set dialogue files to PyTOD transcripts, with the following command

pipenv run python scripts/create_pytod_dialogues.py \
filters=v0.6.1 version=v0.9.1 interpreter=v0.5 \
interpreter/backend/actions=v0.3.4 \
interpreter.backend.show_values_samples_in_slot_filling_hints=false \
interpreter.backend.include_slot_value_references_in_confirmation_hints=false \
interpreter.backend.include_slot_value_references_in_alternative_hints=false \
interpreter.backend.multiple_confirmation_hints=true \
interpreter.user.reference_command_for_wildcard_carryover=true \
interpreter.user.value_carryover='natural_language' \
interpreter.user.user_task_retry_instruction_format=call \
interpreter.user.user_query_retry_instruction_format=call \
interpreter.copy_selected_entities_to_search=true \
interpreter.resolve_carry_over_to_entity=true \
split=train --config-name=v3

Set split=dev and split=test to process the development and test sets.

To print the transcripts specify one or multiple PyTOD transcripts in human-readable format use the display_pytod endpoint:

pipenv run display_pytod split=dev ids="['1_00000']" version=v0.9.1

The ids are the transcript IDs, which match the dialogue_id in the SGD dialogue_*.json files. This command will display a transcript as follows:

user: I want to make a restaurant reservation for 2 people at half past 11 in the morning.
x1 restaurants_2_reserve_restaurant(time = 'half past 11 in the morning',
   number_of_seats = 2)
x2 Hint('request value: restaurant_name')
x3 Hint('request value: location')
x4 say(x2, x3)
agent: What city do you want to dine in? Do you have a preferred restaurant?
user: Please find restaurants in San Jose. Can you try Sino?
x5 x1.location = 'San Jose'; x1.restaurant_name = 'Sino'
x6 Hint('ask the user to confirm: date')
x7 Hint('ask the user to confirm: number_of_seats')
x8 Hint('ask the user to confirm: restaurant_name')
x9 Hint('ask the user to confirm: time')
x10 Hint('ask the user to confirm: location')
x11 say(x6, x7, x8, x9, x10)
agent: Confirming: I will reserve a table for 2 people at Sino in San Jose. The reservation time is 11:30 am today.
user: Yes, thanks. What's their phone number?
x12 confirm(x1)
x13 perform(x1)
x14 say(x1.phone_number, x13)
agent: Your reservation has been made. Their phone number is 408-247-8880.
user: What's their address? Do they have vegetarian options on their menu?
x15 say(x1.has_vegetarian_options, x1.address)
agent: The street address is 377 Santana Row #1000. They have good vegetarian options.
user: Thanks very much.
x16 conversation_pause()
x17 Hint('ask the user if they require further assistance')
x18 say(x17)
agent: Is there anything else I can help you with?
user: No, that's all. Thanks.
x19 say()
agent: Have a great day.

Fine-tuning PyTOD

If you wish to finetune PyTOD, you may run the following commands to obtain:

  1. Action Parser Examples
pipenv run python scripts/create_pytod_text2text_examples.py \
split=train version=v0.9.1 patch_version=1  \
text2text_conversion=rendered_entities_resolved_values \
history_processor=rendered_entities \
multidomain_prompts=true \
debug=true

and set split=dev and split=test to process the other splits.

  1. Parser supervisor examples
pipenv run python scripts/create_nlu_text2text_examples.py \
  split=train formatter.randomise_prompt_elements=true \
  formatter.unk_slot_probability=0.5 \
  formatter.similar_slot_probability=0.5 \
  version=v0.9.1 patch_version=1

You should then finetune your model on both datasets.

pytod simulation

The simulation environment is implemented in the simulation package. A service is defined as a collection of APIs which can be search APIs which provide interfaces to databases (e.g., FindBus) or APIs that allow users to execute transactions (BuyBusTicket). An SGD service is implemented as follows:

@register_command(service="Events_1")
class FindEvents(SearchCommand):
    category: SearchCommandArgument[str] = SearchCommandArgument()
    subcategory: SearchCommandArgument[str] = SearchCommandArgument()
    city_of_event: SearchCommandArgument[str] = SearchCommandArgument()
    date: SearchCommandArgument[str] = SearchCommandArgument()

    def __init__(self, dialogue_id: DialogueID):
        super().__init__(dialogue_id)


@register_command(service="Events_1")
class BuyEventTickets(ConfirmedCommand):
    event_name: ConfirmedCommandArgument[str] = ConfirmedCommandArgument()
    number_of_seats: ConfirmedCommandArgument[str] = ConfirmedCommandArgument()
    date: ConfirmedCommandArgument[str] = ConfirmedCommandArgument()
    city_of_event: ConfirmedCommandArgument[str] = ConfirmedCommandArgument()

    def __init__(self, dialogue_id: DialogueID):
        super().__init__(dialogue_id)

Here SearchCommand and ConfirmedCommand implement the SGD policy graph. In other words, when called, these objects return the system actions (e.g, Hints) which guide agents according to the SGD policy. SearchCommandArgument and ConfirmedCommandArgument are descriptors, which perform post-processing operations which convert LM outputs to schema compatible values (e.g., type coercion) and can be configured to provide feedback.

The package contains:

  • Simulations for the SGD transactional APIs (api_driver.py)
  • A mongoquery database simulation (database.py) for all the SGD databases
  • Code to parse entities from json dicts to python objects

pytod execution

This package implements the PyTOD agent. The execute_instructions method takes as input a list of program statements generated by the action parser, returning a list of system actions that guide the agent according to the SGD policy.

We provide the OfflineSessionExecutor helper which takes as an input a PyTOD transcript and returns the conversation state the format required by the official DSTC8 evaluator. The commands

pipenv run python scripts/evaluate_pytod.py split=dev version=v0.9.1 
pipenv run python scripts/evaluate_pytod.py split=test version=v0.9.1 

use it to execute all the ground truth PyTOD transcripts for the development and test sets.§

Project Organization

├── data
│   ├── external            <- Data from third party sources.
│   ├── interim             <- Intermediate data that has been transformed.
│   ├── processed           <- The final, canonical data sets for modeling.
│   └── raw                 <- The original, immutable data dump.
├── docs                    <- Directory for Sphinx documentation in rst or md.
├── models                  <- Trained and serialized models, model predictions,
│                              or model summaries.
├── notebooks               <- Jupyter notebooks. Naming convention is a number (for
│                              ordering), the creator's initials and a description,
│                              e.g. `1.0-fw-initial-data-exploration`.
├── Pipfile                 <- For virtual environment management, contains abstract dependencies
├── Pipfile.lock            <- Precise description of the dependency tree, enables recreating the environment elsewhere
├── pyproject.toml          <- Build configuration. Don't change! Use `pip install -e .`
│                              to install for development or to build `tox -e build`.
├── references              <- Data dictionaries, manuals, and all other materials.
├── reports                 <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures             <- Generated plots and figures for reports.
├── resources
│   └── lexical             <- String equivalence maps, including semantically aware ones
│      └── mining_results   <- outputs of data mining steps necessary to understand annotation patterns
│      └── sgd
│          └── index        <- Tree splitting conversations according to the conversation structure
├── scripts                 <- Analysis and production scripts which import the actual PYTHON_PKG,
│    └── data_mining        <- Scripts for analysing annotation patterns
│    └── scratches          <- quick & dirty ones
├── setup.cfg               <- Declarative configuration of your project.
├── setup.py                <- [DEPRECATED] Use `python setup.py develop` to install for
│                              development or `python setup.py bdist_wheel` to build.
├── src
│   └── pytod          
│       └── apps            <- CLI entry points implementation
│       └── configs         <- configurations library scripts and apps
│           └── apps        <- endpoints configs 
│           └── pytod       <- PyTOD transcript generation configuration group
│           └── pytod_finetuning <- configs for converting data to text2text format for PyTOD finetuning
│           └── pytod_setup <- configs for scripts that prepare the simulation environment 
│       └── evaluation      <- DSTC8 evaluator code, adapted for PyTOD evaluation.
│       └── execution       <- Implementation of the execution engine 
│       └── interpreter     <- PyTOD interpreter, converts SGD annotation to PyTOD transcripts
│       └── parser          <- pydantic validators, used by the dialogue manager to ensure AP outputs are well-formed and syntactically correct
│       └── prompting       <- formatter classes for converting transcripts from structured to text-to-text (source-target) format
│       └── simulation      <- implements simulated environment for SGD, simulation of the policy graph
│         └── services      <- concrete service implementations  
│       └── toolbox         <- Alternative SGD schema representation, used by the PyTOD interpreter 
│       └── pytod_types     <- `pydantic` classes, defining the data model used by the `pytod` interpreter
├── tests                   <- Unit tests which can be run with `pytest`.
├── .coveragerc             <- Configuration for coverage reports of unit tests.
├── .isort.cfg              <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.

About

A simulated environment for the Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2019) accompanying "PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback".

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Languages