Skip to content

Update Table Understanding conversion to accept updated schema #190

@frreiss

Description

@frreiss

Recent versions of Watson Discovery have made undocumented changes to the format of the output of the Table Understanding enrichment. The old column names are documented at https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-understanding_tables#table-output-schema

Rough translation of field names into the new naming convention:

new_name_to_old = {
    "row_min": "row_index_begin",
    "row_max": "row_index_end",
    "column_min": "column_index_begin",
    "column_max": "column_index_end",
    "cell_text": "text",
    "id": "cell_id"
}

Also, the field location at the top of the table record now appears to be optional.

Our conversion to Pandas needs to be updated to cover both the old schema and the new schema.

I recommend that we first determine which schema is the canonical one and convert non-canonical schemas to the canonical one as a preprocessing step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions