What is probability mass function in python?

In probability theory a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value.

It is also known as the discrete density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.


Calculating probability mass function for drawing marbles from a jar:

The following Python code shows probabilities and proportions calculation for case of drawing marbles of different colors - blue, yellow and orange - out of the box.



import matplotlib.pyplot as plt
import numpy as np

# colored marble counts
blue   = 40
yellow = 30
orange = 20
totalMarbs = blue + yellow + orange

# put them all in a jar
jar = np.hstack[[1*np.ones[blue],2*np.ones[yellow],3*np.ones[orange]]]

# now we draw 500 marbles [with replacement]
numDraws = 500
drawColors = np.zeros[numDraws]

for drawi in range[numDraws]:
    
    # generate a random integer to draw
    randmarble = int[np.random.rand[]*len[jar]]
    
    # store the color of that marble
    drawColors[drawi] = jar[randmarble]

# now we need to know the proportion of colors drawn
propBlue = sum[drawColors==1] / numDraws
propYell = sum[drawColors==2] / numDraws
propOran = sum[drawColors==3] / numDraws


# plot those against the theoretical probability
plt.bar[[1,2,3],[ propBlue, propYell, propOran ],label='Proportion']
plt.plot[[0.5, 1.5],[blue/totalMarbs, blue/totalMarbs],'b',linewidth=3,label='Probability']
plt.plot[[1.5, 2.5],[yellow/totalMarbs,yellow/totalMarbs],'b',linewidth=3]
plt.plot[[2.5, 3.5],[orange/totalMarbs,orange/totalMarbs],'b',linewidth=3]

plt.xticks[[1,2,3],labels=['Blue','Yellow','Orange']]
plt.xlabel['Marble color']
plt.ylabel['Proportion/probability']
plt.legend[]
plt.show[]

Calculating probability density [technically mass] function:

A probability density function [PDF] differes from probability mass function and associated with continuous rather than discrete random variables.



import matplotlib.pyplot as plt
import numpy as np
  
# continous signal [technically discrete!]
N = 10004
datats1 = np.cumsum[np.sign[np.random.randn[N]]]
datats2 = np.cumsum[np.sign[np.random.randn[N]]]

# let's see what they look like
plt.plot[np.arange[N],datats1,linewidth=2]
plt.plot[np.arange[N],datats2,linewidth=2]
plt.show[]


# discretize using histograms
nbins = 50

y,x = np.histogram[datats1,nbins]
x1 = [x[1:]+x[:-1]]/2
y1 = y/sum[y]

y,x = np.histogram[datats2,nbins]
x2 = [x[1:]+x[:-1]]/2
y2 = y/sum[y]


plt.plot[x1,y1, x2,y2,linewidth=3]
plt.legend[['ts1','ts2']]
plt.xlabel['Data value']
plt.ylabel['Probability']
plt.show[]


See also related topics:



Python Basics

This handout only goes over probability functions for Python. For a tutorial on the basics of python, there are many good online tutorials. CS109 has a good set of notes from our Python review session [including installation instructions]! Check out:
//github.com/yulingl/cs109_python_tutorial/blob/master/cs109_python_tutorial.ipynb. The functions in this tutorial come from the scipy python library. It is essential that you have this library installed!

Counting Functions

Factorial

Compute $n!$ as an Integer. This example computes $20!$

import math
print math.factorial[20]

Choose

Computes $n \choose m$ as a float. This example computes $10 \choose 5$

from scipy import special
print special.binom[10, 5]

Discrete Random Variables

Binomial

Make a Binomial Random variable $X$ and compute its probability mass function [PMF] or cumulative density function [CDF]. We love the scipy stats library because it defines all the functions you would care about for a random variable, including expectation, variance, and even things we haven't talked about in CS109, like entropy. This example declares $X \sim \text{Bin}[n = 10, p = 0.2]$. It calculates a few statistics on $X$. It then calculates $P[X = 3]$ and $P[X \leq 4]$. Finally it generates a few random samples from $X$:

from scipy import stats
X = stats.binom[10, 0.2] # Declare X to be a binomial random variable
print X.pmf[3]           # P[X = 3]
print X.cdf[4]           # P[X 

Chủ Đề