Advanced Benchmarking #43

jgyasu · 2025-07-01T10:19:02Z

Proposes dataset collections for simplifying benchmarking experiment reproducibility.

felipeangelimvieira · 2025-07-01T10:54:55Z

steps/25_dataset_collection/step.md

+    def __len__(self):
+        return len(self._collection.get_datasets()) + len(self._additional_datasets)
+
+    def get_collection_info(self):


Could this be part of a tag system for the dataset collection?

fkiraly · 2025-07-01T11:52:03Z

steps/25_dataset_collection/step.md

+        self.subset = subset
+        super().__init__()
+
+    def get_collection_name(self):


get_collection_name feels like it should be a tag, e.g., "info:name" : "TSC Bake-off 2017

fkiraly · 2025-07-01T11:53:29Z

steps/25_dataset_collection/step.md

+    A collection of datasets following the strategy pattern.
+    """
+
+    _collections = {


this design is not very extensible!

Why: for adding a collection, you need to:

add a new object

add it to this register

Generally, designs requiring a registry are less extensible than designs using the strategy pattern only.

fkiraly

Great proposal, I like the high-level ideas and design!

Some issues and comments I have:

the DatasetCollection is a kind of register, that also functions as a instance factory (via add). I feel this is a violation of the single responsibility principle and the strategy pattern as well. So I would go for sth different.
The use case of adding one more dataset is a nice one, I did not have it on the radar that clearly.
the add method belongs to the builder pattern, and violates the dataclass pattern. There is a way to have both: instead of modifying self, the add method returns a newly constructed CustomCollection (or similar), which is a dataclass.

"What if" ideas:

how about collections that are not just for datasets, but anything? It could be estimators! Example: the TSC bake-off had a collection of datasets, and a collection of estimators as well.
instead of using a registry in DatasetCollection, should we use craft from registry? This is a lookup entry point which gets class by name.

jgyasu · 2025-07-02T07:49:55Z

how about collections that are not just for datasets, but anything? It could be estimators! Example: the TSC bake-off had a collection of datasets, and a collection of estimators as well.

I will adress the other comments, but this is a nice idea. I thought about it too. I think we can just hace a base class for collections, say, BaseCollection and then extend it to different types of collections, EstimatorBaseCollection, DatasetBaseCollection.

If this sounds good then I will modify this proposal? @fkiraly

fkiraly · 2025-07-02T16:13:00Z

Sounds good, although I would only introduce subclasses if something implies they are needed, design-wise.

… into dataset-collection

Dataset Collections

3f16340

felipeangelimvieira reviewed Jul 1, 2025

View reviewed changes

fkiraly reviewed Jul 1, 2025

View reviewed changes

fkiraly requested changes Jul 1, 2025

View reviewed changes

phoeenniixx moved this to PR in progress in May - Sep 2025 mentee projects Jul 2, 2025

phoeenniixx added this to May - Sep 2025 mentee projects Jul 2, 2025

Update step.md

1e07479

fkiraly assigned jgyasu Jul 15, 2025

jgyasu changed the title ~~Dataset Collections~~ Advanced Benchmarking Aug 15, 2025

jgyasu added 3 commits August 15, 2025 15:14

Merge branch 'main' into dataset-collection

baba557

Merge remote-tracking branch 'refs/remotes/origin/dataset-collection'…

e1679a5

… into dataset-collection

update

7f7240f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advanced Benchmarking #43

Advanced Benchmarking #43

Uh oh!

jgyasu commented Jul 1, 2025

Uh oh!

felipeangelimvieira Jul 1, 2025

Uh oh!

fkiraly Jul 1, 2025

Uh oh!

fkiraly Jul 1, 2025

Uh oh!

fkiraly left a comment

Uh oh!

jgyasu commented Jul 2, 2025

Uh oh!

fkiraly commented Jul 2, 2025

Uh oh!

Uh oh!

Advanced Benchmarking #43

Are you sure you want to change the base?

Advanced Benchmarking #43

Uh oh!

Conversation

jgyasu commented Jul 1, 2025

Uh oh!

felipeangelimvieira Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

jgyasu commented Jul 2, 2025

Uh oh!

fkiraly commented Jul 2, 2025

Uh oh!

Uh oh!