Add 2025_Lazaridis_IndoEuropeans #295

Tlkhi · 2025-08-20T22:56:41Z

PR Checklist for a new package submission

The package does not exist already in the community archive, also not with a different name.
The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
The package is stored in a directory that is named like the package title.

Samples that already have been published previously, and got re-analysed (e.g. re-sequenced) for the now packaged publication, have a modified Poseidon_ID of the form <Original Poseidon_ID>_<Initials of the main author>_<Year>. Re-analysed versions of I1685 (Lazaridis et al. 2016) should, for example, be assigned the IDs I1685_IL22 (Lazaridis et al. 2022) and I1685_IL25 (Lazaridis et al. 2025).

The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
The .janno file does not include any empty columns or columns only filled with n/a.
The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

The package passes a validation with trident validate --fullGeno.

Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

nevrome · 2025-08-24T15:55:38Z

Thanks for preparing this - looks very well done at first glance! The only thing I noticed is the ENA: in the Genetic_Source_Accession_IDs column. The Poseidon schema only requires the ID here.

I'll try to find a reviewer for the package.

nevrome · 2025-09-01T14:15:36Z

@martynamolak offered to review the package. Thank you!

martynamolak · 2025-09-03T08:55:41Z

@thanks @Tlkhi for submitting this!
Here is my comments:

janno file:

Relation_To column should probably comprise Poseidon_ID of the related individuals, not the Alternative_IDs. I realise this is a bit tricky because the relation will (should) be the same regardless of the instance of a given individual's reanalysis. @nevrome , should we enforce this? (e.g. "I23568.AG" has Relation_To: "I23655" rather than to "I23655.AG")
Please change "ENA: PRJEB81467" to "PRJEB81467" in the Genetic_Source_Accession_IDs field
The Poseidon_ID and Group_Name have the ".SG", ".AG" and ".TW" suffixes that are, as far as I could find, not present in the original paper. I of course see the value of the suffixes but also Poseidon's policy is to keep the labels concordant with the original paper. That's something that should probably be discussed throughout the Poseidon dev team, @nevrome, @TCLamnidis, @stschiff; Anyway, I think at current stage it's probably better to leave the suffixes in; ie.: as currently is labelled in the package.
Y haplogroups are provided in the hierarchichal system (ISOGG) rather than the terminal-SNP-based system (Yfull) as required/encouraged by Poseidon. The original paper provides both so they should be easy enough to replace.
some individuals are merges of several libraries; some of these libraries differ in strandedness and damage removal procedure (e.g. I5124 is 4 libraries: ds.half,ds.half,ss.USER,ss.USER in original paper while in the package it's coded solely as: ds, half) - please verify all the samples and correct where necessary
It would be nice to also include contamination estimates in the janno file. The original supplementary table provides HapConX and ANGSD estimates. Unfortunately they would have to be transposed from ranges to the mean+stderr format for Poseidon. Also damage rate column could be added.

All the other files in the package seem ok as far as I could see.

Tlkhi · 2025-09-05T03:07:48Z

@thanks @Tlkhi for submitting this! Here is my comments:

janno file:

Relation_To column should probably comprise Poseidon_ID of the related individuals, not the Alternative_IDs. I realise this is a bit tricky because the relation will (should) be the same regardless of the instance of a given individual's reanalysis. @nevrome , should we enforce this? (e.g. "I23568.AG" has Relation_To: "I23655" rather than to "I23655.AG")

Please change "ENA: PRJEB81467" to "PRJEB81467" in the Genetic_Source_Accession_IDs field

The Poseidon_ID and Group_Name have the ".SG", ".AG" and ".TW" suffixes that are, as far as I could find, not present in the original paper. I of course see the value of the suffixes but also Poseidon's policy is to keep the labels concordant with the original paper. That's something that should probably be discussed throughout the Poseidon dev team, @nevrome, @TCLamnidis, @stschiff; Anyway, I think at current stage it's probably better to leave the suffixes in; ie.: as currently is labelled in the package.

Y haplogroups are provided in the hierarchichal system (ISOGG) rather than the terminal-SNP-based system (Yfull) as required/encouraged by Poseidon. The original paper provides both so they should be easy enough to replace.

some individuals are merges of several libraries; some of these libraries differ in strandedness and damage removal procedure (e.g. I5124 is 4 libraries: ds.half,ds.half,ss.USER,ss.USER in original paper while in the package it's coded solely as: ds, half) - please verify all the samples and correct where necessary

It would be nice to also include contamination estimates in the janno file. The original supplementary table provides HapConX and ANGSD estimates. Unfortunately they would have to be transposed from ranges to the mean+stderr format for Poseidon. Also damage rate column could be added.

All the other files in the package seem ok as far as I could see.

Thank you for your comments!

I’m waiting for @nevrome for now - it might look a bit complicated since there is no Individual_ID column yet
Fixed
I think we should keep the suffixes
Replaced ISOGG format with SNP-terminal based format - fixed
I will mark them as mixed and note the UDG types (e.g., half; USER) in the Note column
Added the Damage column, adding a Contamination column seems a bit challenging

P.S: I will update the janno in the PR after fixing all the issues

stschiff · 2025-09-05T08:48:23Z

For what it's worth, I am OK with the suffixes... I think we have no official policy on this, and since the original authors haven't submitted the package, but @Tlkhi has, they get to decide. I believe since it's a Boston Paper, AADR-like suffixes make sense.

Stephan

nevrome · 2025-09-08T13:16:45Z

Thanks for the review, @martynamolak, and thanks addressing it promptly, @Tlkhi.

[1]. I think we can leave it like this in anticipation of the changes planned for Poseidon v3.0.0 as discussed here poseidon-framework/poseidon-schema#74 and more concretely here poseidon-framework/poseidon-schema#109
[3]. I don't mind the suffixes as long as we have a Capture_Type column.

Tlkhi · 2025-09-09T22:15:29Z

fixed the issues

nevrome · 2025-09-14T18:49:44Z

Perfect - thanks! Sorry for the long delay. Will merge now.

Add 2025_Lazaridis_IndoEuropeans

301009a

nevrome requested a review from martynamolak September 1, 2025 14:14

Tlkhi added 2 commits September 10, 2025 01:33

Update janno

de41693

Update files for 2025 Lazaridis IndoEuropeans

df1c923

nevrome merged commit 254a0ea into poseidon-framework:master Sep 14, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 2025_Lazaridis_IndoEuropeans #295

Add 2025_Lazaridis_IndoEuropeans #295

Uh oh!

Tlkhi commented Aug 20, 2025 •

edited

Loading

Uh oh!

nevrome commented Aug 24, 2025

Uh oh!

nevrome commented Sep 1, 2025

Uh oh!

martynamolak commented Sep 3, 2025

Uh oh!

Tlkhi commented Sep 5, 2025 •

edited

Loading

Uh oh!

stschiff commented Sep 5, 2025

Uh oh!

nevrome commented Sep 8, 2025

Uh oh!

Tlkhi commented Sep 9, 2025

Uh oh!

nevrome commented Sep 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add 2025_Lazaridis_IndoEuropeans #295

Add 2025_Lazaridis_IndoEuropeans #295

Uh oh!

Conversation

Tlkhi commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist for a new package submission

Uh oh!

nevrome commented Aug 24, 2025

Uh oh!

nevrome commented Sep 1, 2025

Uh oh!

martynamolak commented Sep 3, 2025

Uh oh!

Tlkhi commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stschiff commented Sep 5, 2025

Uh oh!

nevrome commented Sep 8, 2025

Uh oh!

Tlkhi commented Sep 9, 2025

Uh oh!

nevrome commented Sep 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tlkhi commented Aug 20, 2025 •

edited

Loading

Tlkhi commented Sep 5, 2025 •

edited

Loading