A probabilistic graphical model for estimating selection coefficient of nonsynonymous variants from human population sequence data
Zhao et al 2025, https://www.nature.com/articles/s41467-025-59937-2
- estimated S_gene for all human protein coding genes
- estimated selection coefficient (MisFit S) for all possible missense variants in human genome caused by SNVs: MisFit v1.5.1
- Note: In Misfit v1.5 released with Zhao et al 2025, 273 transcripts failed to map to canonical Ensembl IDs in the training data. We have patched these transcripts by directly using their non-canonical IDs in v1.5.1
pop_model simulate variants and construct PIG model
model_PTV only use PTVs, independent of other models
model_mis
used to find priors of d and s_gene, then initialize s_gene before MisFit training.
model_basic population data w./w.o. genes
model_logit population data + gene + ESM zero-shot as d
model_TF full MisFit model
*_analysis are used to combine data for different analysis
Note: model_selection directly given by the model may need to be transformed by a sigmoid function to get MisFit_S in the original scale
to be updated
- deep mutational scan GMM
- variant annotations