Skip to content

Conversation

finnbar
Copy link

@finnbar finnbar commented Mar 3, 2025

The main change in this PR is the introduction of rearrangements, as detailed in Section 4 of Graded monads and type-level programming for dependence analysis (Keating and Gale, 2021) and in the README for the rearrangements library that spawned from that section. In short, this provides a generic value-level function rDel for arbitrary type-level sorts and filters by building the output type-indexed structure using elements of the input one. Consider the following example code, using the Set data type from Data.Set:

set :: Set '[Int, Bool, ()]
set = Ext 3 (Ext True (Ext () Empty))

set' :: Set '[Bool, Int, ()]
set' = rDel list

rDel set will use a Bool from set, then an Int, and finally a (), to construct its result to match its expected type. Notably, the type of set' can be defined by a type family applied to the type of set. This means that a function of type e.g. Set xs -> Set (Sorted xs) can be defined with just rDel.

This matches many of the definitions in this library, so much of the complicated value-level redefinitions of type families have been rewritten to use rDel. However, there are a few notes about this:

  • nub, and definitions which rely on it, cannot use rDel. This is because nub prefers elements later in a given Set than those earlier, and rDel assumes the opposite. (For example, nub (Ext 3 (Ext 2 Empty)) keeps 2, but an rDel implementation would keep 3.) Furthermore, nub allows users to define custom merging behaviour which rDel does not support. This affects the definitions of asSet, asMap and union.
  • split may be less efficient now, but I cannot find evidence of this being the case: my concerns were because it now performs two passes through the input Set (two calls to rDel) rather than one more complex pass like the original definition. Since there is no obvious performance regression and the new code is cleaner, I think there's nothing to worry about. (Feel free to revert split if this is a concern!)
  • The real win with these changes is the significantly simplified definition of quicksort!

These changes are designed to preserve functionality without any breaking changes. All of the tests pass, and performance seems to be mostly unaffected from a quick comparison of test runtime.

This PR also updates the library to use GHC 9.8.4, which is the latest currently available in an LTS Stackage release (at time of writing, lts-23.10). Annoyingly, HLS doesn't yet support GHC 9.8.4, but hopefully it will in the near future!

It may be worth investigating the performance of this version of split: it has been replaced with two different rDel calls rather than running in a single pass. I am not worried about the performance of the other functions which have been replaced by rearrangements.
As in Set, it may be worth investigating the performance change for split.
This is because its original implementation uses nub, which keeps later elements rather than earlier elements in the list. rDel does the opposite. This means that the behaviour of asSet e.g. on a set containing two Int changes between the original implementation and an rDel one. This also preserves any merging behaviour defined by user-provided instances of Nub.

asMap would have the same issue, but it turns out that I (correctly) forgot to change it in the first place. Nice!
Also notes that definitions that rely on nub cannot be "shortcutted" with rDel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant