-
Notifications
You must be signed in to change notification settings - Fork 77
Time windows in AFS #3266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time windows in AFS #3266
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3266 +/- ##
==========================================
+ Coverage 89.75% 89.80% +0.04%
==========================================
Files 29 29
Lines 32546 32602 +56
Branches 5962 5973 +11
==========================================
+ Hits 29212 29278 +66
+ Misses 1890 1884 -6
+ Partials 1444 1440 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
8a77af8
to
e32062e
Compare
Note: Along the way, this also allows one to call
to produce the same thing as
This slightly breaks with the convention for other stats but we'd like to do it anyhow because:
|
f2f1f71
to
fadb170
Compare
fadb170
to
0af7d01
Compare
The last thing to do is to put in some tests to compare versions of the method (eg decap-based and not) for a few different time windows. |
python/tests/test_util.py
Outdated
# MIT License | ||
# | ||
# Copyright (c) 2018-2024 Tskit Developers | ||
# Copyright (c) 2018-2025 Tskit Developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note here that the Copyright notice is meant to only be updated when the file itself is changed. So, the copyright years should (strictly speaking) be just those in which significant changes were made to the file.
Not that it matters in practise and there's no need to back out of these changes if you don't want to, but I just wanted to clarify my understanding of the copyright notices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah that makes sense, thank you for the note on that @jeromekelleher!
I've run into a legit bug: running |
I'm having a look at this, thank you @petrelharp! |
dfaa653
to
d2e47c0
Compare
0.07┊ ┃ ┃ ┃ ┃ ┃ ┊ ┃ ┃ 4 ┃ ┊ ┃ ┃ 4 ┃ ┊ ┊ | ||
┊ ┃ ┃ ┃ ┃ ┃ ┊ ┃ ┃ ┏┻┓ ┃ ┊ ┃ ┃ ┏┻┓ ┃ ┊ ┊ | ||
0.00┊ 0 1 3 9 2 ┊ 0 1 2 3 9 ┊ 0 1 2 3 9 ┊ 0 1 2 3 ┊ | ||
0 2 7 10 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a non-sample leaf to this example, which we needed to get better coverage.
ret = "Time windows must start at zero and be strictly increasing. " | ||
"(TSK_ERR_BAD_TIME_WINDOWS)"; | ||
break; | ||
case TSK_ERR_BAD_TIME_WINDOWS_END: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we needed to disambiguate the various notions of "bad" as different stats have different requirements now
d2e47c0
to
051481e
Compare
|
||
np.random.seed(5) | ||
|
||
# Notes for refactoring: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are notes about what happens in this file, in particular for a near-future me in refactoring the tests here.
): | ||
mode = "branch" | ||
|
||
def generate_params(self, ts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the main code testing the new time windows; this gets called on all the example tree sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note that it also determines which combinations of windows and sample sets get tested.)
This is (if tests pass) ready to merge, we think? I've done some refactoring of the tests in |
051481e
to
921c646
Compare
Ah one more thing - perhaps @tforest can benchmark this to see if we've affected performance for non-time-windowed AFS by a lot? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I haven't gone through the logic, but the changes all seems solid modulo a few nitpicks.
c/tskit/trees.c
Outdated
goto out; | ||
} | ||
|
||
if (windows[0] < 0.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could combine these two into one windows[0] <= 0.0
condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why so we could
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or rather just !=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, also I was just remembering that in fact times can be negative, so being < 0
is not necessarily an error
c/tskit/trees.c
Outdated
* SOFTWARE. | ||
*/ | ||
|
||
#include "tskit/core.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tskit includes should go after the std library ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, isn't this already included already by trees.h ?
c/tskit/trees.c
Outdated
goto out; | ||
} | ||
if (stat_site | ||
&& tsk_memcmp(time_windows, default_time_windows, 2 * sizeof(const double)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a while to parse this - so it's saying that only the default time windows are supported for STAT_SITE, yeah? Might be easier to read and less fragile to say
// Site mode does not support time windows
if (stat_site && !(time_windows[0] == 0 && isinf(time_windows[1]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that would be MUCH better, thanks. I had thought there was some reason we couldn't do that, but it seems okay.
python/_tskitmodule.c
Outdated
if (parse_windows(time_windows, &time_windows_array, &num_time_windows) != 0) { | ||
goto out; | ||
} | ||
shape = PyMem_Malloc((num_sample_sets + 1 + 1) * sizeof(*shape)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The + 1 + 1
s are confusing me - maybe just use + 2 and clarify why with a comment?
python/tskit/trees.py
Outdated
sample_sets = [sample_sets] | ||
drop_dimension = True | ||
|
||
if ll_method.__name__ != "allele_frequency_spectrum": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not temporary (so maybe there's a better way to do it) - this is because of #3266 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see we're doing this control flow based on introspecting the method name in a few places. This does smell quite badly I'm afraid
Can we refactor this by adding some parameters to __one_way_sample_set_stat
which default to the standard behaviour, but we set to "False" when called from the top-level allele_frequency_spectrum
function?
Thanks for the nitpicks, @jeromekelleher! I've done all of them except the "I assume this is temporary" one. |
Hmph: this error now happens on Windows:
Might you be able to see about how to deal with this, @tforest? (note: if debugging this is difficult, then making a temporary branch and PR with most of the testing code removed could be a good way to go) |
Try 0.0 instead of 0 |
(whoops - missed the stuff in the comment above that clarified the problem is that the argument to |
Yes! I'm trying something @petrelharp :) |
I think the problem is the argument: for Windows it looks like isinf requires a float, not a double? |
That was it, yes. 👍 |
Nice. I'm squashing & rebasing. |
bb3eb02
to
11c8daf
Compare
I do think we should figure out a different way of dealing with dimensions etc than by introspecting the function pointer argument --- this is a really fragile and opaque way of doing control flow. Fine for this to be logged as an issue and done in a follow-up, but it it will bite eventually if we don't fix it. |
Gotcha. I moved that AFS-specific pre-processing step to |
and testing version of time windows for general stat test fixture cleanup remove not-run tests small issue with Tajimas D tests
8a8c32e
to
33f990d
Compare
Description
This is a WIP and replaces #2948 as a more specific PR regarding time-windows in AFS.
In fine, users will be able to use
ts.allele_frequency_spectrum(time_windows=tw)
, withtw
being a list of any non-overlapping and strictly increasing time intervals in[0, inf]
.