Hướng dẫn distfit python

distfit - Probability density fitting

Hướng dẫn distfit python

distfit is a python package for probability density fitting across 89 univariate distributions to non-censored data by residual sum of squares (RSS), and hypothesis testing. Probability density fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. distfit scores each of the 89 different distributions for the fit wih the empirical distribution and return the best scoring distribution.

⭐️ Star this repo if you like it ⭐️

Documentation pages

On the documentation pages you can find detailed information about the distfit library with many examples.

Installation

Install bnlearn from PyPI
pip install bnlearn
Install from github source (beta version)
 install git+https://github.com/erdogant/distfit#egg=master
Check version
import distfit
print(distfit.__version__)
The following functions are available after installation:
# Import library
from distfit import distfit

dist = distfit()        # Initialize 
dist.fit_transform(X)   # Fit distributions on empirical data X
dist.predict(y)         # Predict the probability of the resonse variables
dist.plot()             # Plot the best fitted distribution (y is included if prediction is made)

Examples

Example: Quick start to find best fit for your input data
# Prints the screen:
# [distfit] >fit..
# [distfit] >transform..
# [distfit] >[norm      ] [RSS: 0.0133619] [loc=-0.059 scale=2.031] 
# [distfit] >[expon     ] [RSS: 0.3911576] [loc=-6.213 scale=6.154] 
# [distfit] >[pareto    ] [RSS: 0.6755185] [loc=-7.965 scale=1.752] 
# [distfit] >[dweibull  ] [RSS: 0.0183543] [loc=-0.053 scale=1.726] 
# [distfit] >[t         ] [RSS: 0.0133619] [loc=-0.059 scale=2.031] 
# [distfit] >[genextreme] [RSS: 0.0115116] [loc=-0.830 scale=1.964] 
# [distfit] >[gamma     ] [RSS: 0.0111372] [loc=-19.843 scale=0.209] 
# [distfit] >[lognorm   ] [RSS: 0.0111236] [loc=-29.689 scale=29.561] 
# [distfit] >[beta      ] [RSS: 0.0113012] [loc=-12.340 scale=41.781] 
# [distfit] >[uniform   ] [RSS: 0.2481737] [loc=-6.213 scale=12.281] 

Example: Plot summary of the tested distributions

After we have a fitted model, we can make some predictions using the theoretical distributions. After making some predictions, we can plot again but now the predictions are automatically included.

Example: Make predictions using the fitted distribution

Example: Test for one specific distributions

The full list of distributions is listed here: https://erdogant.github.io/distfit/pages/html/Parametric.html

Example: Test for multiple distributions

The full list of distributions is listed here: https://erdogant.github.io/distfit/pages/html/Parametric.html

Example: Fit discrete distribution
from scipy.stats import binom
# Generate random numbers

# Set parameters for the test-case
n = 8
p = 0.5

# Generate 10000 samples of the distribution of (n, p)
X = binom(n, p).rvs(10000)
print(X)

# [5 1 4 5 5 6 2 4 6 5 4 4 4 7 3 4 4 2 3 3 4 4 5 1 3 2 7 4 5 2 3 4 3 3 2 3 5
#  4 6 7 6 2 4 3 3 5 3 5 3 4 4 4 7 5 4 5 3 4 3 3 4 3 3 6 3 3 5 4 4 2 3 2 5 7
#  5 4 8 3 4 3 5 4 3 5 5 2 5 6 7 4 5 5 5 4 4 3 4 5 6 2...]

# [distfit] >fit..
# [distfit] >transform..
# [distfit] >Fit using binomial distribution..
# [distfit] >[binomial] [SSE: 7.79] [n: 8] [p: 0.499959] [chi^2: 1.11]
# [distfit] >Compute confidence interval [discrete]

Example: Make predictions on unseen data for discrete distribution

Example: Generate samples based on the fitted distribution

Contribute

  • All kinds of contributions are welcome!

Citation

Please cite distfit in your publications if this is useful for your research. See column right for citation information.

Maintainer

  • Erdogan Taskesen, github: erdogant
  • Contributions are welcome.
  • If you wish to buy me a Coffee for this work, it is very appreciated :)