Skip to content

Conversation

@patcon
Copy link
Member

@patcon patcon commented Jul 22, 2025

Until now, reducer and clusterer algorithm were added in the code of red-dwarf. Trying a new algorithm in the pipelines meant editing the code of red-dwarf. This was annoying for trying new algorithms.

Now, the builtin algorithms as well as any new ones are managed in their own runtime registries.

In other words, this is now possible: (same goes for clustering algos)

from reddwarf.utils.reducer.registry import register_reducer

@register_reducer('umap')
def make_umap(**kwargs):
    import umap

    # Set sensible defaults for UMAP
    defaults = {
        'n_components': 2,
        'n_neighbors': 15,
        'min_dist': 0.1,
        'metric': 'euclidean',
        'random_state': 42,
        'n_jobs': 1  # For reproducibility
    }

    # Override defaults with any provided kwargs
    defaults.update(kwargs)

    return umap.UMAP(**defaults)

from reddwarf.implementations.base import run_pipeline
from reddwarf.data_loader import Loader

result = run_pipeline(
        votes=votes,
        reducer="umap",                    # Use our registered UMAP reducer
        reducer_kwargs={
            'n_neighbors': 20,
        },
        clusterer="hdbscan",                # Default hdbscan clustering
    )

Runnable:
https://gist.github.com/patcon/33d849f528128fe21ee3fcedc0c2ac35#file-test-py-L6-L48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants