add Individual_ID as mandatory column #109

stschiff · 2025-09-05T09:13:47Z

As discussed, and perhaps continued to be discussed, here my suggestion: A mandatory and unique Individual_ID column. No lists, multiple values are not allowed. What do you think?

nevrome · 2025-09-08T12:38:01Z

Ok - I think that is OK.
We should imho also apply the ASCII-only constraint we introcuded for Poseidon_IDs and Group_Names.

nevrome · 2025-09-08T13:01:06Z

OK - I thought about this a bit longer:

Relation_To should from now on feature Individual_IDs (and potentially Alternative_IDs) not Poseidon_IDs.
Some .janno columns could now refer to the sampled individual in their description, not the sample itself, e.g. Species, Collection_ID, Date_*, Chromosomal_Anomalies, MT_Haplogroup, Y_Haplogroup. In Genetic_Sex we use the phrase of the individual derived from this sample or in Alternative_IDs the sampled individual. Maybe we should use these more often now.
Unrelated to the schema: Having a mandatory column that is NOT in the genotype data will require us to rewire some behaviour of trident. Probably the easiest would be to always use the Poseidon_ID in case the Individual_ID is missing. With a warning that this is happening.

stschiff · 2025-09-09T11:17:44Z

OK, we just made Individual_ID non-mandatory, after some discussion among the core team. Readers should feel free to comment. Our reasoning was that the schema should be ready to use also for work-in-progress projects, where Individual_ID might be an analysis result that comes in later only. Also, in some edge cases, there might not be Individual_IDs, for example with some sedimentary or residue samples.

Agree with 2)

any validation of Relation_To should by default only happen within a given package.

stschiff · 2025-09-12T14:13:41Z

OK, important update to @nevrome's comments, after having looked more into it. I have now only changed the Related_To column, as I agree with @nevrome that this should refer to Individual_Id. I don't think Alternative_ID should be permitted in the Related_To, though. I also am more careful about sample vs. individual. Here is the new description for Related_To:

other individuals (by Individual_ID) that are related to the individual this sample derived from, multiple entries separated by ;

Regarding the other fields you suggested, @nevrome, I do not think we should change them:

Species: No, this is the species that the sample derives from. Although it would not be wrong to refer to the Individual here, I think it's unnecessary. As Janno lists Samples, not Individuals, we should not add abstraction if it's not needed.
Collection_ID: Same
Date_*: No, this is an analysis result for the sample (at least in case of C14), not for the individual. In fact, there could be multiple dates for different samples of the same individual.
Chromosomal_Anomalies: This is an analysis result of the genetic data of the sample, so it should be the sample
MT_Haplogroup and Y_Haplogroup: Same, this is an analysis result so it refers to the entity of analysis, which is the sample.

One more important update: I propose to actually change the wording in Alternative_ID to refer to the sample, not the individual. I think this makes more sense: Since in Poseidon our principle entity of analysis is the sample, not the individual, the Alternative_ID should refer to other sample IDs.

nevrome · 2025-09-13T18:59:44Z

Ok - thanks for thinking all of this through and engaging thoroughly with the individual points. I think you convinced me and I agree with everything -- except for the very last change:

In my opinion the Alternative_ID must operate on the individual level. There we have a well-established need to document multiple different identifiers. An alternative name for a Poseidon_ID, on the other hand, is something that should occur very rarely if you think back to its exact definition (as written up in the paper - we should add this to the schema now!). Calling it a "sample" is a simplication that we only adopted for convenience. In fact it is much more specific.

stschiff · 2025-09-22T07:59:19Z

OK, I am not quite following, but certainly open to the suggestion. Let's discuss in person.

add Individual_ID as mandatory column

b0f3ffa

nevrome mentioned this pull request Sep 8, 2025

Add 2025_Lazaridis_IndoEuropeans poseidon-framework/community-archive#295

Merged

22 tasks

stschiff added 2 commits September 9, 2025 10:45

made Individual_ID optional again in README

2fd2807

Update Individual_ID description in janno_columns.tsv to make optional

ecfe343

stschiff added 2 commits September 12, 2025 16:05

updated field descriptions for Individual_ID

88ad934

further update to Related_To

3eddcb8

nevrome mentioned this pull request Sep 15, 2025

Making arbitrary columns in .janno and .ssf mandatory with a command line flag poseidon-framework/poseidon-hs#359

Merged

nevrome mentioned this pull request Sep 24, 2025

better conceptual specification of the Poseidon_ID #112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Individual_ID as mandatory column #109

add Individual_ID as mandatory column #109

Uh oh!

stschiff commented Sep 5, 2025

Uh oh!

nevrome commented Sep 8, 2025

Uh oh!

nevrome commented Sep 8, 2025

Uh oh!

stschiff commented Sep 9, 2025

Uh oh!

stschiff commented Sep 12, 2025

Uh oh!

nevrome commented Sep 13, 2025

Uh oh!

stschiff commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add Individual_ID as mandatory column #109

Are you sure you want to change the base?

add Individual_ID as mandatory column #109

Uh oh!

Conversation

stschiff commented Sep 5, 2025

Uh oh!

nevrome commented Sep 8, 2025

Uh oh!

nevrome commented Sep 8, 2025

Uh oh!

stschiff commented Sep 9, 2025

Uh oh!

stschiff commented Sep 12, 2025

Uh oh!

nevrome commented Sep 13, 2025

Uh oh!

stschiff commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants