Skip to content

Conversation

@ritch
Copy link
Contributor

@ritch ritch commented Aug 13, 2025

No description provided.


return {}

def get_union_view(self, dataset_names):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queries across multiple datasets should probably be strictly limited to "full dataset" plots.

This implementation combines the current ctx.view with the entire content of the other datasets. Which is a weird behavior. I think "multiple dataset" queries should do one of two things:

  1. Strictly be limited to "full dataset" queries. IE use ctx.dataset.add_stage() instead of ctx.view.add_stage()
  2. Apply the current view's filters to all datasets. For some views, this would be as simple as injecting the Mongo() stage as the first stage in ctx.view. However, for views that involve things like limit/skip/take, then we'd need a version of the concat() stage that allowed combinations of multiple datasets instead.

Querying across multiple datasets is a bit dubious because there is no guarantee that other datasets will have the correct field names/types to be queried by whatever filter you've built based on the current dataset's schema.

TDLR: should we add guardrails here to ensure the user doesn't define an invalid plot? 🤔

Option 2 is tirc

return [ctx.dataset]

dataset_names = item.selected_datasets
if "all" in item.selected_datasets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALL datasets??? 🤯🤯🤯

These plots will surely take a loooong time to generate when the user has many datasets. Are we sure we can recommend this as a usable option?

Another consideration: how likely is it that any given aggregation would actually be valid across all datasets? Even if you are plotting a default field like metadata, image vs video datasets have different attributes (EG metadata.width for image datasets and metadata.frame_width for video datasets).

inputs = types.Object()

# Dataset selection tabs
dataset_mode_choices = types.TabsView()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd need to see it IRL to confirm, but I do think using tabs is a reasonable UX here 👍

@brimoor brimoor marked this pull request as draft September 29, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants