Skip to content

Conversation

@willtebbutt
Copy link
Collaborator

@willtebbutt willtebbutt commented Sep 26, 2025

Please do not review -- need to see if CI still passes.

This seems to work nicely. Unlike #782 , this won't actually change what code gets loaded, because ChainRules depends on GPUArraysCore. This is therefore essentially just some rather trivial house keeping.

edit: this now also deals with some non-determinism in the benchmark suite where (I think) certain functions were getting run different amount of time on each run and, as a result, sometimes causing errors. We shall see.

edit2: the hand-written rule benchmarks now run quite a bit faster, because I've reduced the amount of time spent on each benchmark case. It looks like the test suite passed on this run, suggesting that doing this hasn't massively increased the variability in how long things take to run.

@github-actions
Copy link
Contributor

Mooncake.jl documentation for PR #785 is available at:
https://chalk-lab.github.io/Mooncake.jl/previews/PR785/

@codecov
Copy link

codecov bot commented Sep 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Contributor

github-actions bot commented Sep 26, 2025

Performance Ratio:
Ratio of time to compute gradient and time to compute function.
Warning: results are very approximate! See here for more context.

┌────────────────────────────┬──────────┬──────────┬─────────────┬─────────┬─────────────┬─────────┐
│                      Label │   Primal │ Mooncake │ MooncakeFwd │  Zygote │ ReverseDiff │  Enzyme │
│                     String │   String │   String │      String │  String │      String │  String │
├────────────────────────────┼──────────┼──────────┼─────────────┼─────────┼─────────────┼─────────┤
│                   sum_1000 │ 108.0 ns │     1.55 │        1.78 │    1.01 │        9.73 │    7.58 │
│                  _sum_1000 │  1.16 μs │     4.73 │        1.01 │  1200.0 │        23.3 │    1.46 │
│               sum_sin_1000 │  4.93 μs │     3.03 │        1.69 │    2.63 │        13.3 │    2.43 │
│              _sum_sin_1000 │   5.0 μs │     2.82 │        2.15 │   285.0 │        13.2 │    2.29 │
│                   kron_sum │ 507.0 μs │     31.6 │        2.75 │    6.15 │       191.0 │    6.62 │
│              kron_view_sum │ 576.0 μs │     28.0 │        2.83 │    7.52 │       155.0 │    5.16 │
│      naive_map_sin_cos_exp │  1.97 μs │     2.27 │        1.46 │ missing │        7.42 │    2.48 │
│            map_sin_cos_exp │   2.0 μs │     2.27 │         1.5 │    1.91 │        6.12 │    2.45 │
│      broadcast_sin_cos_exp │   2.1 μs │     2.12 │        1.44 │    2.51 │        1.55 │    2.36 │
│                 simple_mlp │ 177.0 μs │     6.17 │        2.92 │    2.16 │        13.7 │     3.8 │
│                     gp_lml │ 273.0 μs │     8.12 │        2.46 │    4.52 │     missing │    4.45 │
│ turing_broadcast_benchmark │  2.26 ms │     3.38 │        2.49 │ missing │        22.1 │ missing │
│         large_single_block │ 475.0 ns │     4.23 │        1.99 │  3720.0 │        22.6 │    1.89 │
└────────────────────────────┴──────────┴──────────┴─────────────┴─────────┴─────────────┴─────────┘

@willtebbutt willtebbutt marked this pull request as ready for review September 26, 2025 16:23
@willtebbutt willtebbutt requested a review from sunxd3 September 26, 2025 18:18
@yebai
Copy link
Member

yebai commented Sep 28, 2025

To clarify, does this fix the underministisism of gp_lml's performance?

Copy link
Collaborator

@sunxd3 sunxd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@willtebbutt
Copy link
Collaborator Author

To clarify, does this fix the underministisism of gp_lml's performance?

This will not. This should (hopefully) just stop the benchmark suite occassionally erroring on potrf! calls.

@willtebbutt willtebbutt merged commit 4d30644 into main Oct 1, 2025
90 checks passed
@willtebbutt willtebbutt deleted the wct/gpuarrayscore-ext-only branch October 1, 2025 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants