Skip to content

Conversation

tforest
Copy link
Collaborator

@tforest tforest commented Sep 9, 2025

Description

This is a WIP and replaces #2948 as a more specific PR regarding time-windows in AFS.

In fine, users will be able to use ts.allele_frequency_spectrum(time_windows=tw), with tw being a list of any non-overlapping and strictly increasing time intervals in [0, inf].

Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 97.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.80%. Comparing base (8b615e5) to head (33f990d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
c/tskit/trees.c 94.33% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3266      +/-   ##
==========================================
+ Coverage   89.75%   89.80%   +0.04%     
==========================================
  Files          29       29              
  Lines       32546    32602      +56     
  Branches     5962     5973      +11     
==========================================
+ Hits        29212    29278      +66     
+ Misses       1890     1884       -6     
+ Partials     1444     1440       -4     
Flag Coverage Δ
c-tests 86.85% <94.73%> (+0.08%) ⬆️
lwt-tests 80.38% <ø> (ø)
python-c-tests 88.25% <100.00%> (+0.01%) ⬆️
python-tests 98.79% <100.00%> (+<0.01%) ⬆️
python-tests-no-jit 32.95% <16.66%> (-0.08%) ⬇️
python-tests-numpy1 50.57% <16.66%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
c/tskit/core.c 95.54% <100.00%> (+0.01%) ⬆️
c/tskit/core.h 100.00% <ø> (ø)
python/_tskitmodule.c 88.25% <100.00%> (+0.01%) ⬆️
python/tskit/trees.py 98.84% <100.00%> (+0.01%) ⬆️
c/tskit/trees.c 91.01% <94.33%> (+0.19%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@petrelharp
Copy link
Contributor

Note: Along the way, this also allows one to call

ts.allele_frequency_spectrum(sample_sets=[0,1,2])

to produce the same thing as

ts.allele_frequency_spectrum(sample_sets=[[0,1,2]])

This slightly breaks with the convention for other stats but we'd like to do it anyhow because:

  1. for other stats, where this would produce differently-shaped output, there is a dimension to drop (that corresponding to indexes) that the AFS does not have
  2. it's annoying/confusing to have to do the two sets of brackets, and the error message isn't very informative, so this could be a sticking point for newbies

@petrelharp
Copy link
Contributor

There's one line with partial coverage that I've confirmed using tsk_bug_assert( )s is being covered by the python tests:
Screenshot From 2025-09-16 12-56-51

I've added another C test that might satisfy this? But this is fine.

@petrelharp
Copy link
Contributor

The last thing to do is to put in some tests to compare versions of the method (eg decap-based and not) for a few different time windows.

# MIT License
#
# Copyright (c) 2018-2024 Tskit Developers
# Copyright (c) 2018-2025 Tskit Developers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note here that the Copyright notice is meant to only be updated when the file itself is changed. So, the copyright years should (strictly speaking) be just those in which significant changes were made to the file.

Not that it matters in practise and there's no need to back out of these changes if you don't want to, but I just wanted to clarify my understanding of the copyright notices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah that makes sense, thank you for the note on that @jeromekelleher!

@petrelharp
Copy link
Contributor

I've run into a legit bug: running tests/test_tree_stats.py::TestBranchAlleleFrequencySpectrum::test_wright_fisher[no_deep_history] we encounter a situation where an element of the AFS is -1. The good news is that this is only in the C version; the 'naive' and 'python' versions both have a 0 in that spot. @tforest, might you have a look at this?

@tforest
Copy link
Collaborator Author

tforest commented Sep 17, 2025

I've run into a legit bug: running tests/test_tree_stats.py::TestBranchAlleleFrequencySpectrum::test_wright_fisher[no_deep_history] we encounter a situation where an element of the AFS is -1. The good news is that this is only in the C version; the 'naive' and 'python' versions both have a 0 in that spot. @tforest, might you have a look at this?

I'm having a look at this, thank you @petrelharp!

0.07┊ ┃ ┃ ┃ ┃ ┊ ┃ ┃ 4 ┊ ┃ ┃ 4 ┃ ┊
┊ ┃ ┃ ┃ ┃ ┊ ┃ ┃ ┏┻┓ ┊ ┃ ┃ ┏┻┓ ┃ ┊
0.00┊ 0 1 3 9 2 ┊ 0 1 2 3 9 ┊ 0 1 2 3 9 ┊ 0 1 2 3 ┊
0 2 7 10 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a non-sample leaf to this example, which we needed to get better coverage.

ret = "Time windows must start at zero and be strictly increasing. "
"(TSK_ERR_BAD_TIME_WINDOWS)";
break;
case TSK_ERR_BAD_TIME_WINDOWS_END:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we needed to disambiguate the various notions of "bad" as different stats have different requirements now


np.random.seed(5)

# Notes for refactoring:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are notes about what happens in this file, in particular for a near-future me in refactoring the tests here.

):
mode = "branch"

def generate_params(self, ts):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the main code testing the new time windows; this gets called on all the example tree sequences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that it also determines which combinations of windows and sample sets get tested.)

@petrelharp
Copy link
Contributor

This is (if tests pass) ready to merge, we think? I've done some refactoring of the tests in test_tree_stats.py and have more in mind (before adding time windows to other stats functions), but will keep that for a different PR.

@petrelharp
Copy link
Contributor

Ah one more thing - perhaps @tforest can benchmark this to see if we've affected performance for non-time-windowed AFS by a lot?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I haven't gone through the logic, but the changes all seems solid modulo a few nitpicks.

c/tskit/trees.c Outdated
goto out;
}

if (windows[0] < 0.0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could combine these two into one windows[0] <= 0.0 condition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so we could

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or rather just !=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, also I was just remembering that in fact times can be negative, so being < 0 is not necessarily an error

c/tskit/trees.c Outdated
* SOFTWARE.
*/

#include "tskit/core.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tskit includes should go after the std library ones

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, isn't this already included already by trees.h ?

c/tskit/trees.c Outdated
goto out;
}
if (stat_site
&& tsk_memcmp(time_windows, default_time_windows, 2 * sizeof(const double))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to parse this - so it's saying that only the default time windows are supported for STAT_SITE, yeah? Might be easier to read and less fragile to say

// Site mode does not support time windows
if (stat_site && !(time_windows[0] == 0 && isinf(time_windows[1]))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that would be MUCH better, thanks. I had thought there was some reason we couldn't do that, but it seems okay.

if (parse_windows(time_windows, &time_windows_array, &num_time_windows) != 0) {
goto out;
}
shape = PyMem_Malloc((num_sample_sets + 1 + 1) * sizeof(*shape));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The + 1 + 1s are confusing me - maybe just use + 2 and clarify why with a comment?

sample_sets = [sample_sets]
drop_dimension = True

if ll_method.__name__ != "allele_frequency_spectrum":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is temporary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not temporary (so maybe there's a better way to do it) - this is because of #3266 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see we're doing this control flow based on introspecting the method name in a few places. This does smell quite badly I'm afraid

Can we refactor this by adding some parameters to __one_way_sample_set_stat which default to the standard behaviour, but we set to "False" when called from the top-level allele_frequency_spectrum function?

@petrelharp
Copy link
Contributor

Thanks for the nitpicks, @jeromekelleher! I've done all of them except the "I assume this is temporary" one.

@petrelharp
Copy link
Contributor

petrelharp commented Sep 18, 2025

Hmph: this error now happens on Windows:

../tskit/trees.c:3756:70: error: conversion from 'double' to 'float' may change value [-Werror=float-conversion]
 3756 |         if (stat_site && !(time_windows[0] == 0 && isinf(time_windows[1]))) {
      |                                                          ~~~~~~~~~~~~^~~

Might you be able to see about how to deal with this, @tforest? (note: if debugging this is difficult, then making a temporary branch and PR with most of the testing code removed could be a good way to go)

@jeromekelleher
Copy link
Member

Try 0.0 instead of 0

@petrelharp
Copy link
Contributor

(whoops - missed the stuff in the comment above that clarified the problem is that the argument to isinf( ) is double but it's wanting a float)

@tforest
Copy link
Collaborator Author

tforest commented Sep 18, 2025

Yes! I'm trying something @petrelharp :)

@petrelharp
Copy link
Contributor

I think the problem is the argument: for Windows it looks like isinf requires a float, not a double?

@tforest
Copy link
Collaborator Author

tforest commented Sep 18, 2025

That was it, yes. 👍

@petrelharp
Copy link
Contributor

Nice. I'm squashing & rebasing.

@jeromekelleher
Copy link
Member

I do think we should figure out a different way of dealing with dimensions etc than by introspecting the function pointer argument --- this is a really fragile and opaque way of doing control flow. Fine for this to be logged as an issue and done in a follow-up, but it it will bite eventually if we don't fix it.

@petrelharp
Copy link
Contributor

Gotcha. I moved that AFS-specific pre-processing step to allele_frequency_spectrum, which seems the right way to do it.

and testing version of time windows for general stat

test fixture cleanup

remove not-run tests

small issue with Tajimas D tests
@jeromekelleher jeromekelleher added this pull request to the merge queue Sep 20, 2025
Merged via the queue into tskit-dev:main with commit 44a23e2 Sep 20, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants