-
Notifications
You must be signed in to change notification settings - Fork 30
Advanced Benchmarking #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
def __len__(self): | ||
return len(self._collection.get_datasets()) + len(self._additional_datasets) | ||
|
||
def get_collection_info(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be part of a tag system for the dataset collection?
self.subset = subset | ||
super().__init__() | ||
|
||
def get_collection_name(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_collection_name
feels like it should be a tag, e.g., "info:name" : "TSC Bake-off 2017
A collection of datasets following the strategy pattern. | ||
""" | ||
|
||
_collections = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this design is not very extensible!
Why: for adding a collection, you need to:
- add a new object
- add it to this register
Generally, designs requiring a registry are less extensible than designs using the strategy pattern only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great proposal, I like the high-level ideas and design!
Some issues and comments I have:
- the
DatasetCollection
is a kind of register, that also functions as a instance factory (viaadd
). I feel this is a violation of the single responsibility principle and the strategy pattern as well. So I would go for sth different. - The use case of adding one more dataset is a nice one, I did not have it on the radar that clearly.
- the
add
method belongs to the builder pattern, and violates the dataclass pattern. There is a way to have both: instead of modifyingself
, theadd
method returns a newly constructedCustomCollection
(or similar), which is a dataclass.
"What if" ideas:
- how about collections that are not just for datasets, but anything? It could be estimators! Example: the TSC bake-off had a collection of datasets, and a collection of estimators as well.
- instead of using a registry in
DatasetCollection
, should we usecraft
fromregistry
? This is a lookup entry point which gets class by name.
I will adress the other comments, but this is a nice idea. I thought about it too. I think we can just hace a base class for collections, say, If this sounds good then I will modify this proposal? @fkiraly |
Sounds good, although I would only introduce subclasses if something implies they are needed, design-wise. |
Proposes dataset collections for simplifying benchmarking experiment reproducibility.