Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ The `.janno` file is a tab-separated text file with a header line. It holds cont

- A set of strictly defined core variables (defined by column name) and their possible content are documented here: [janno_columns.tsv](https://github.com/poseidon-framework/poseidon-schema/blob/master/janno_columns.tsv)
- A `.janno` file MAY have all of these core variables, or only a subset of them.
- Only three columns MUST be present to make the file valid: **Poseidon_ID**, **Group_Name** and **Genetic_Sex**
- Only three columns MUST be present to make the file valid: **Poseidon_ID**, **Group_Name** and **Genetic_Sex**.
- Arbitrary columns not defined here MAY be added as long as their column names do not clash with the defined ones.
- The column order is irrelevant.
- If information is unknown or a variable does not apply for a certain sample, then the respective cell(s) MAY be filled with `n/a` or simply an empty string.
Expand Down
7 changes: 4 additions & 3 deletions janno_columns.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@ janno_column_name description data_type multi choice range choice_options range_
Poseidon_ID sample identifier as defined by the genetics laboratory (e.g. I1234, BOT001), must contain only the ASCII characters “A-Za-z0-9_-.”, must fit to the values in the Poseidon package .fam/.ind file, must be unique within one package, if multiple datasets exist for the same individual different Poseidon_IDs are required String FALSE FALSE FALSE TRUE TRUE
Genetic_Sex genetic sex of the individual derived from this sample, only F, M or U because the EIGENSTRAT and PLINK formats only support these three, edge cases (e.g. XXY, XYY, X0) are undefined and should be grouped as F, M or U, with a Note added Char FALSE TRUE FALSE F;M;U TRUE FALSE
Group_Name meaningful population/group identifiers for the sample, must contain only the ASCII characters “A-Za-z0-9_-.”, can follow the geographic-temporal nomenclature proposed by Eisenmann et al. 2018 (https://doi.org/10.1038/s41598-018-31123-z), or communicate additional categories that are meaningful for groupings in specific analyses, such as cultural labels, outlier status or relatedness to other samples, multiple entries separated by ;, the first value must be equal the group name in the .fam/.ind file String TRUE FALSE FALSE TRUE FALSE
Individual_ID identifier for the sampled individual String FALSE FALSE FALSE FALSE TRUE
Species Species name of the sample. Should follow binomial nomenclature as standard in Biology, e.g. Homo sapiens. String FALSE FALSE FALSE FALSE FALSE
Alternative_IDs alternative identifiers for the same sampled individual, e.g. IDs in other databases or popular names like Ötzi/Iceman String TRUE FALSE FALSE FALSE FALSE
Relation_To other samples (by Poseidon_ID) that are related/identical to this sample, multiple entries separated by ; String TRUE FALSE FALSE FALSE FALSE
Relation_Degree relationship degree for relatives mentioned in Related_To, multiple values separated by ; in the same order as Related_To in case of multiple relations String TRUE TRUE FALSE identical;first;second;thirdToFifth;sixthToTenth;unrelated;other FALSE FALSE
Alternative_IDs alternative identifiers for the same sample, e.g. IDs in other databases or popular names like Ötzi/Iceman String TRUE FALSE FALSE FALSE FALSE
Relation_To other individuals (by Individual_ID) that are related to the individual this sample derived from, multiple entries separated by ; String TRUE FALSE FALSE FALSE FALSE
Relation_Degree relationship degree for relatives mentioned in Related_To, multiple values separated by ; in the same order as Related_To in case of multiple relations. Here, "identical" refers to identical twins. Identical individuals should be encoded through a common Individual_ID. String TRUE TRUE FALSE identical;first;second;thirdToFifth;sixthToTenth;unrelated;other FALSE FALSE
Relation_Type relationship type for relatives mentioned in Related_To (e.g. sister_of, child_of, nephew_of), multiple values separated by ; in the same order as Related_To in case of multiple relations String TRUE FALSE FALSE FALSE FALSE
Relation_Note arbitrary comments about the genetic relationships of the sampled individual String FALSE FALSE FALSE FALSE FALSE
Collection_ID alternative sample identifiers shared by the provider/owner of the sample (e.g. grave 40 skeleton 2), multiple values separated by ; String TRUE FALSE FALSE FALSE FALSE
Expand Down