Fix genetic_relatedness to allow single sample set with self-comparisons #3235

andrewkern · 2025-06-26T21:54:28Z

Description

This PR fixes issue #3055 where genetic_relatedness would fail when computing self-comparisons with a single sample set. As @petrelharp pointed out there is no reason this shouldn't be allowed for certain statistics, including genetic_relatedness

The Problem

Previously, calling genetic_relatedness with a single sample set and indexes referring to that set would raise an error:

ts.genetic_relatedness([[0]], indexes=[(0,0)])
# TSK_ERR_INSUFFICIENT_SAMPLE_SETS: Insufficient sample sets provided

This occurred because the underlying C API validates that at least k=2 sample sets are provided for k-way statistics, even when the indexes only reference a subset of those sets.

This PR

The fix adds an allow_self_comparisons parameter to the internal __k_way_sample_set_stat method, which is only set to True for genetic_relatedness. When enabled:

The method validates that all indexes reference valid sample sets
If fewer than k sample sets are provided, it pads the list with dummy sample sets to satisfy the C API requirement
The dummy sets are never accessed during computation since only the sets referenced by the indexes are used

This ensures that:

Only genetic_relatedness behavior is changed
Other statistics (like Fst) continue to enforce the minimum sample set requirement
The C API contract is satisfied without modifying the C code

Further, this PR sets the stage for allowing for other appropriate k-way stats to be computed with self comparisons by setting a flag.

Testing

Added comprehensive tests for all three computation modes (site, branch, node) to verify:

Single sample set with self-comparison works correctly
Multiple samples within a single set work correctly
The fix doesn't affect other statistics

All existing tests continue to pass, but @petrelharp this would benefit from you specifically looking over what I've done to be sure you're okay with the way I've implemented allow_self_comparisons

Fixes #3055

PR Checklist:

Tests that fully cover new/changed functionality.
Documentation including tutorial content if appropriate.
[] Changelogs, if there are API changes.

I'm not sure if I should touch the changelogs here?

codecov · 2025-06-26T21:59:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.75%. Comparing base (1d8f453) to head (34d6826).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3235   +/-   ##
=======================================
  Coverage   89.75%   89.75%           
=======================================
  Files          29       29           
  Lines       32545    32545           
  Branches     5962     5962           
=======================================
  Hits        29211    29211           
  Misses       1890     1890           
  Partials     1444     1444

Flag	Coverage Δ
c-tests	`86.76% <100.00%> (ø)`
lwt-tests	`80.38% <ø> (ø)`
python-c-tests	`88.24% <ø> (ø)`
python-tests	`98.79% <ø> (ø)`
python-tests-no-jit	`33.02% <ø> (ø)`
python-tests-numpy1	`50.71% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
c/tskit/trees.c	`90.81% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jeromekelleher · 2025-06-29T19:52:49Z

I think we'll need to wait for @petrelharp to get informed review on this one.

jeromekelleher · 2025-08-06T13:56:54Z

Bumping this one for you @petrelharp

petrelharp · 2025-09-09T21:12:32Z

Ignore my previous comment (now deleted); I was being dense.

petrelharp · 2025-09-10T04:05:28Z

Let's see. First, it's quite easy to modify the checking code that produces the error. I went ahead and did that (see this commit: 57ad49f).

However, one can compute 'self' comparisons for other statistics, for instance, this works:

ts.divergence([[0], [1]], indexes=[(0, 0)])

or this

ts.f4([[0],[1],[2],[3]], indexes=[(0,0,0,0)])

These are perfectly valid things to compute, just the thought was they are probably not telling someone what they actually want to know. So, if the intention behind the restriction for "at least k sample sets for a k-way stat" was to keep people from computing things they probably didn't want to compute, then we're not actually doing that. We could require indexes be unique, but I don't think we should actually do that - for instance, f4(..., indexes=[(0,0,1,1)]) is actually a version of f2( ) that people compute sometimes (I think, or maybe it's (0, 1, 0, 1)). So, in the next commit I've just removed the requirement that there be at least k sample sets - instead, requiring there be at least 1 sample set in all cases.

How's that sound, @andrewkern?

petrelharp · 2025-09-12T00:59:07Z

LMK how this looks @andrewkern; if you like it I'll squash these commits down.

…sons Previously, genetic_relatedness would fail with TSK_ERR_INSUFFICIENT_SAMPLE_SETS when given a single sample set and indexes referring to that set, e.g.: ts.genetic_relatedness([[0]], indexes=[(0,0)]) This was because the C API requires at least k=2 sample sets for k-way statistics, even when indexes only reference a single set for self-comparison. The fix is more general: it removes the restriction that there be at least k sample sets for k-way stats, because this wasn't a useful or consistently applied restriction. Closes tskit-dev#3055

petrelharp · 2025-09-12T19:12:18Z

Rebased; added to CHANGELOG; tests are (should be) passing. Ready to merge when you approve, @andrewkern .

andrewkern mentioned this pull request Jun 26, 2025

genetic_relatedness thinks it needs 2 samples sets but is valid with only one #3055

Open

andrewkern requested review from benjeffery and petrelharp June 26, 2025 21:55

andrewkern force-pushed the fix-issue-3055-genetic-relatedness branch from 89397d5 to cd830f7 Compare June 26, 2025 22:12

petrelharp force-pushed the fix-issue-3055-genetic-relatedness branch from 4b8755f to 7f48542 Compare September 12, 2025 00:58

petrelharp force-pushed the fix-issue-3055-genetic-relatedness branch from 931ce7e to 34d6826 Compare September 12, 2025 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix genetic_relatedness to allow single sample set with self-comparisons #3235

Fix genetic_relatedness to allow single sample set with self-comparisons #3235

andrewkern commented Jun 26, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 26, 2025 •

edited

Loading

Uh oh!

jeromekelleher commented Jun 29, 2025

Uh oh!

jeromekelleher commented Aug 6, 2025

Uh oh!

petrelharp commented Sep 9, 2025

Uh oh!

petrelharp commented Sep 10, 2025

Uh oh!

petrelharp commented Sep 12, 2025

Uh oh!

petrelharp commented Sep 12, 2025

Uh oh!

Uh oh!

Fix genetic_relatedness to allow single sample set with self-comparisons #3235

Are you sure you want to change the base?

Fix genetic_relatedness to allow single sample set with self-comparisons #3235

Conversation

andrewkern commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

The Problem

This PR

Testing

PR Checklist:

Uh oh!

codecov bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher commented Jun 29, 2025

Uh oh!

jeromekelleher commented Aug 6, 2025

Uh oh!

petrelharp commented Sep 9, 2025

Uh oh!

petrelharp commented Sep 10, 2025

Uh oh!

petrelharp commented Sep 12, 2025

Uh oh!

petrelharp commented Sep 12, 2025

Uh oh!

Uh oh!

andrewkern commented Jun 26, 2025 •

edited

Loading

codecov bot commented Jun 26, 2025 •

edited

Loading