-
Notifications
You must be signed in to change notification settings - Fork 431
Add multivariate hypergeometric distribution #1963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1963 +/- ##
==========================================
+ Coverage 86.36% 86.46% +0.10%
==========================================
Files 146 147 +1
Lines 8786 8852 +66
==========================================
+ Hits 7588 7654 +66
Misses 1198 1198 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -0,0 +1,125 @@ | |||
| """ | |||
| The [Multivariate hypergeometric distribution](https://en.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julia docstrings should contain the type or function signature in the first line, indented by four spaces.
Generally, could you make this docstring consistent with docstrings of other existing distribution types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the function signature. The doc string is based on the multinomial distribution docstring. Is there a different distribution I should be following?
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
Co-authored-by: David Müller-Widmann <[email protected]>
|
@devmotion Thank you for the review. I have resolved most of them but left two open (one about documentation, another about computing the covariance matrix). The PR is ready for a second round of review. Thanks again. |
Overview
This pull requests adds the multivariate hypergeometric distribution. This is a generalization of the hypergeometric distribution. It includes:
src/multivariate/mvhypergeometric.jlthat implementsMvHypergeometricas a subtype ofDiscreteMultivariateDistribution.src/samplers/mvhypergeometric.jlthat implements sampling.test/multivariate/mvhypergeometric.jl.docs/src/multivariate.mdis also updated.Motivation
The multivariate hypergeometric distribution is an important distribution in statistics for testing independence in contingency tables. It is implemented in the
numpyandscipyPython packages but currently it is not supported inDistributions.jl.Implementation details
The type
MvHypergeometricis created as a subtype ofDiscreteMultivariateDistribution. Functions for the mean, variance and covariance matrix are implemented. Evaluation of the log pdf and sampling are also implemented. Sampling is implemented in the filescr/samplers/mvhypergeom.jl. The procedure is analogous to sampling from a multinomial distribution. The entries are sampled sequentially from univariate hypergeometric distributions.Testing
Tests are include in
test/multivariate/mvherpgeom.jl. The statistics, pdf and sampling are all tested. The pdf is also compared to the pdf of the hypergeometric distribution. Specifically, the marginal and conditional distributions of the multivariate hypergeometric are univariate hypergeometric (as used in the sampling).Dependencies
No new dependencies are added.