SORTeD Rashomon Sets of Sparse Decision Trees: Scalable Anytime Enumeration

SORTD is a framework for Rashomon set of decision trees with totally ordered optimization tasks. Currently SORTD has the following tasks:

Cost-Complex Classification (Decision trees regularized with the number of leaf nodes)
Cost-Complex Regression (Regression trees regularized with the number of leaf nodes)

Paper

Elif Arslan, Jacobus G. M. van der Linden, Serge P. Hoogendoorn, Marco Rinaldi, Emir Demirović, “SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration”, The Thirty-Ninth Annual Conference on Neural Information Processing Systems. 2025.

Python usage

Install from source using pip

To compile the pysortd Python package, navigate to the codebase directory and run the following command:

cd pysortd
pip install .

Example usage

pysortd can be used, for example, as follows:

from pysortd import SORTDClassifier
import pandas as pd

df = pd.read_csv('data/accuracy/monk2.csv',delimiter=" ", header=None)
X, y = df.iloc[:, 1:], df.iloc[:, 0]

model = SORTDClassifier("cost-complex-accuracy",max_depth = 3, verbose=True,cost_complexity=0.01,
                        use_rashomon_multiplier=True,rashomon_multiplier=0.1, max_num_trees=100)
model.fit(X,y)
rashomon_set_size = model.rashomon_set_size

C++ usage

Compiling

The code can be compiled on Windows or Linux by using cmake. For Windows users, cmake support can be installed as an extension of Visual Studio and then this repository can be imported as a CMake project.

For Linux users, they can use the following commands:

mkdir build
cd build
cmake ..
cmake --build .

The compiler must support the C++17 standard.

Running

After SORTD is built, the following command can be used (for example):

./SORTD -task cost-complex-accuracy -file ../data/accuracy/monk2.csv -max-depth 3 -cost-complexity 0.01 -use-rashomon-multiplier true -rashomon-multiplier 0.1 -max-num-trees 100

Run the program without any parameters to see a full list of the available parameters.

Applications

Currently, SORTD implements the following optimization tasks:

cost-complex-accuracy: SORTDClassifier minimizes the misclassification score plus the cost for adding a leaf node by the parameter cost_complexity.
cost-complex-regression: SORTDRegressor minimizes the sum of squared errors plus the cost for adding a leaf node by the parameter cost_complexity.

See examples/regression_example.py for a regression example.

Parameters

SORTD can be configured by the following parameters:

max_depth : The maximum depth of the trees. Note that a tree of depth zero has a single leaf node. A tree of depth one has one branching node and two leaf nodes.
cost_complexity: The cost of adding one more leaf node to the tree.
max_num_trees: The limit on the number of trees in the Rashomon set.
use_rashomon_multiplier: Enables or disables the use of the Rashomon multiplier. When disabled, the Rashomon bound is set to the root leaf solution value. In this case, max_num_trees should also be set.
rashomon_multiplier: The Rashomon multiplier value to bound the Rashomon set.
ignore_trivial_extensions: Enables or disables discarding trees with trivial extensions.
min_leaf_node_size : The minimum number of samples required in each leaf node.
time_limit : The run time limit in seconds. If the time limit is exceeded a possibly non-optimal tree is returned.
feature_ordering : The order in which the features are considered for branching. Default is "gini" which sorts the features by gini-impurity decrease. The alternative (and default for regression and survival analysis) is "in-order" which considers the feature in order of appearance.
use_branch_caching : Enables or disables the use of branch caching.
use_dataset_caching : Enables or disables the use of dataset caching.
verbose : Enables or disables verbose output.
random_seed : The random seed.

Miscellaneous

SORTD assumes features and cost-complex classification labels are binary.
The label is in the first column in all datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
data		data
examples		examples
include		include
pysortd		pysortd
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
new optimization task.md		new optimization task.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SORTeD Rashomon Sets of Sparse Decision Trees: Scalable Anytime Enumeration

Paper

Python usage

Install from source using pip

Example usage

C++ usage

Compiling

Running

Applications

Parameters

Miscellaneous

About

Uh oh!

Releases

Packages

Languages

License

ConSol-Lab/pysortd

Folders and files

Latest commit

History

Repository files navigation

SORTeD Rashomon Sets of Sparse Decision Trees: Scalable Anytime Enumeration

Paper

Python usage

Install from source using pip

Example usage

C++ usage

Compiling

Running

Applications

Parameters

Miscellaneous

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages