Add runtime cluster/reducer registry #95

patcon · 2025-07-22T23:23:15Z

Until now, reducer and clusterer algorithm were added in the code of red-dwarf. Trying a new algorithm in the pipelines meant editing the code of red-dwarf. This was annoying for trying new algorithms.

Now, the builtin algorithms as well as any new ones are managed in their own runtime registries.

In other words, this is now possible: (same goes for clustering algos)

from reddwarf.utils.reducer.registry import register_reducer

@register_reducer('umap')
def make_umap(**kwargs):
    import umap

    # Set sensible defaults for UMAP
    defaults = {
        'n_components': 2,
        'n_neighbors': 15,
        'min_dist': 0.1,
        'metric': 'euclidean',
        'random_state': 42,
        'n_jobs': 1  # For reproducibility
    }

    # Override defaults with any provided kwargs
    defaults.update(kwargs)

    return umap.UMAP(**defaults)

from reddwarf.implementations.base import run_pipeline
from reddwarf.data_loader import Loader

result = run_pipeline(
        votes=votes,
        reducer="umap",                    # Use our registered UMAP reducer
        reducer_kwargs={
            'n_neighbors': 20,
        },
        clusterer="hdbscan",                # Default hdbscan clustering
    )

Runnable:
https://gist.github.com/patcon/33d849f528128fe21ee3fcedc0c2ac35#file-test-py-L6-L48

…mator, just past the best estimator.

reddwarf/sklearn/cluster.py

patcon added 12 commits July 22, 2025 15:44

Add barebones reducer registry for loading algos at runtime.

fb80185

Added improved documentation of reducer registry, including website.

c6d8881

Simplify reducer registry example a tiny bit.

45fb9f3

Added clusterer registry.

bb97463

Move BestPolisKMeans into sklearn section of codebase.

0b784cc

Re-organize the sklearn cluster transformers file.

63670b0

Remove find_best_kmeans() function in favour of estimator.

357ddef

Added estimators to docs website.

d2ecd10

Removed old run_kmeans() code paths.

b9c1afc

Remove typing feature that didn't work in 3.10.

8261901

Ensure init_centers_used_ shows up on BestPolisKMeans.

84522eb

Instead of copying attr from PolisKmeans to BestPolisKMeans meta-esti…

6816a6b

…mator, just past the best estimator.

patcon commented Jul 23, 2025

View reviewed changes

reddwarf/sklearn/cluster.py Outdated Show resolved Hide resolved

Revert nit.

455e05b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add runtime cluster/reducer registry #95

Add runtime cluster/reducer registry #95

Uh oh!

patcon commented Jul 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add runtime cluster/reducer registry #95

Are you sure you want to change the base?

Add runtime cluster/reducer registry #95

Uh oh!

Conversation

patcon commented Jul 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants