-
Notifications
You must be signed in to change notification settings - Fork 77
Fix genetic_relatedness to allow single sample set with self-comparisons #3235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix genetic_relatedness to allow single sample set with self-comparisons #3235
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3235 +/- ##
=======================================
Coverage 89.75% 89.75%
=======================================
Files 29 29
Lines 32545 32545
Branches 5962 5962
=======================================
Hits 29211 29211
Misses 1890 1890
Partials 1444 1444
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
89397d5
to
cd830f7
Compare
I think we'll need to wait for @petrelharp to get informed review on this one. |
Bumping this one for you @petrelharp |
Ignore my previous comment (now deleted); I was being dense. |
Let's see. First, it's quite easy to modify the checking code that produces the error. I went ahead and did that (see this commit: 57ad49f). However, one can compute 'self' comparisons for other statistics, for instance, this works:
or this
These are perfectly valid things to compute, just the thought was they are probably not telling someone what they actually want to know. So, if the intention behind the restriction for "at least k sample sets for a k-way stat" was to keep people from computing things they probably didn't want to compute, then we're not actually doing that. We could require indexes be unique, but I don't think we should actually do that - for instance, How's that sound, @andrewkern? |
4b8755f
to
7f48542
Compare
LMK how this looks @andrewkern; if you like it I'll squash these commits down. |
…sons Previously, genetic_relatedness would fail with TSK_ERR_INSUFFICIENT_SAMPLE_SETS when given a single sample set and indexes referring to that set, e.g.: ts.genetic_relatedness([[0]], indexes=[(0,0)]) This was because the C API requires at least k=2 sample sets for k-way statistics, even when indexes only reference a single set for self-comparison. The fix is more general: it removes the restriction that there be at least k sample sets for k-way stats, because this wasn't a useful or consistently applied restriction. Closes tskit-dev#3055
931ce7e
to
34d6826
Compare
Rebased; added to CHANGELOG; tests are (should be) passing. Ready to merge when you approve, @andrewkern . |
Description
This PR fixes issue #3055 where
genetic_relatedness
would fail when computing self-comparisons with a single sample set. As @petrelharp pointed out there is no reason this shouldn't be allowed for certain statistics, includinggenetic_relatedness
The Problem
Previously, calling
genetic_relatedness
with a single sample set and indexes referring to that set would raise an error:This occurred because the underlying C API validates that at least k=2 sample sets are provided for k-way statistics, even when the indexes only reference a subset of those sets.
This PR
The fix adds an
allow_self_comparisons
parameter to the internal__k_way_sample_set_stat
method, which is only set toTrue
forgenetic_relatedness
. When enabled:This ensures that:
genetic_relatedness
behavior is changedFurther, this PR sets the stage for allowing for other appropriate k-way stats to be computed with self comparisons by setting a flag.
Testing
Added comprehensive tests for all three computation modes (site, branch, node) to verify:
All existing tests continue to pass, but @petrelharp this would benefit from you specifically looking over what I've done to be sure you're okay with the way I've implemented
allow_self_comparisons
Fixes #3055
PR Checklist:
I'm not sure if I should touch the changelogs here?