Skip to content
/ MisFit Public

Estimation of selection coefficient of missense variants using human population genomes and machine learning

Notifications You must be signed in to change notification settings

ShenLab/MisFit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MisFit

A probabilistic graphical model for estimating selection coefficient of nonsynonymous variants from human population sequence data

Zhao et al 2025, https://www.nature.com/articles/s41467-025-59937-2

MisFit version 1.5.1 data:

  • estimated S_gene for all human protein coding genes
  • estimated selection coefficient (MisFit S) for all possible missense variants in human genome caused by SNVs: MisFit v1.5.1
  • Note: In Misfit v1.5 released with Zhao et al 2025, 273 transcripts failed to map to canonical Ensembl IDs in the training data. We have patched these transcripts by directly using their non-canonical IDs in v1.5.1

population genetics model

pop_model simulate variants and construct PIG model

protein-truncating variants

model_PTV only use PTVs, independent of other models

prior of missense variants

model_mis used to find priors of d and s_gene, then initialize s_gene before MisFit training.

Baseline models

model_basic population data w./w.o. genes

model_logit population data + gene + ESM zero-shot as d

MisFit model

model_TF full MisFit model

*_analysis are used to combine data for different analysis

Note: model_selection directly given by the model may need to be transformed by a sigmoid function to get MisFit_S in the original scale

evaluation and figure-plotting

model_evaluate

data processing

to be updated

  • deep mutational scan GMM
  • variant annotations

About

Estimation of selection coefficient of missense variants using human population genomes and machine learning

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •