What is probability mass function in python?

In probability theory a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value.

It is also known as the discrete density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

Calculating probability mass function for drawing marbles from a jar:

The following Python code shows probabilities and proportions calculation for case of drawing marbles of different colors - blue, yellow and orange - out of the box.

import matplotlib.pyplot as plt
import numpy as np

# colored marble counts
blue   = 40
yellow = 30
orange = 20
totalMarbs = blue + yellow + orange

# put them all in a jar
jar = np.hstack((1*np.ones(blue),2*np.ones(yellow),3*np.ones(orange)))

# now we draw 500 marbles (with replacement)
numDraws = 500
drawColors = np.zeros(numDraws)

for drawi in range(numDraws):
    # generate a random integer to draw
    randmarble = int(np.random.rand()*len(jar))
    # store the color of that marble
    drawColors[drawi] = jar[randmarble]

# now we need to know the proportion of colors drawn
propBlue = sum(drawColors==1) / numDraws
propYell = sum(drawColors==2) / numDraws
propOran = sum(drawColors==3) / numDraws

# plot those against the theoretical probability
plt.bar([1,2,3],[ propBlue, propYell, propOran ],label='Proportion')
plt.plot([0.5, 1.5],[blue/totalMarbs, blue/totalMarbs],'b',linewidth=3,label='Probability')
plt.plot([1.5, 2.5],[yellow/totalMarbs,yellow/totalMarbs],'b',linewidth=3)
plt.plot([2.5, 3.5],[orange/totalMarbs,orange/totalMarbs],'b',linewidth=3)

plt.xlabel('Marble color')

Calculating probability density (technically mass) function:

A probability density function (PDF) differes from probability mass function and associated with continuous rather than discrete random variables.

import matplotlib.pyplot as plt
import numpy as np
# continous signal (technically discrete!)
N = 10004
datats1 = np.cumsum(np.sign(np.random.randn(N)))
datats2 = np.cumsum(np.sign(np.random.randn(N)))

# let's see what they look like

# discretize using histograms
nbins = 50

y,x = np.histogram(datats1,nbins)
x1 = (x[1:]+x[:-1])/2
y1 = y/sum(y)

y,x = np.histogram(datats2,nbins)
x2 = (x[1:]+x[:-1])/2
y2 = y/sum(y)

plt.plot(x1,y1, x2,y2,linewidth=3)
plt.xlabel('Data value')

Python Basics

This handout only goes over probability functions for Python. For a tutorial on the basics of python, there are many good online tutorials. CS109 has a good set of notes from our Python review session (including installation instructions)! Check out:
https://github.com/yulingl/cs109_python_tutorial/blob/master/cs109_python_tutorial.ipynb. The functions in this tutorial come from the scipy python library. It is essential that you have this library installed!

Counting Functions


Compute $n!$ as an Integer. This example computes $20!$

import math
print math.factorial(20)


Computes $n \choose m$ as a float. This example computes $10 \choose 5$

from scipy import special
print special.binom(10, 5)

Discrete Random Variables


Make a Binomial Random variable $X$ and compute its probability mass function (PMF) or cumulative density function (CDF). We love the scipy stats library because it defines all the functions you would care about for a random variable, including expectation, variance, and even things we haven't talked about in CS109, like entropy. This example declares $X \sim \text{Bin}(n = 10, p = 0.2)$. It calculates a few statistics on $X$. It then calculates $P(X = 3)$ and $P(X \leq 4)$. Finally it generates a few random samples from $X$:

from scipy import stats
X = stats.binom(10, 0.2) # Declare X to be a binomial random variable
print X.pmf(3)           # P(X = 3)
print X.cdf(4)           # P(X <= 4)
print X.mean()           # E[X]
print X.var()            # Var(X)
print X.std()            # Std(X)
print X.rvs()            # Get a random sample from X
print X.rvs(10)          # Get 10 random samples form X

From a terminal you can always use the "help" command to see a full list of methods defined on a variable (or for a package):

from scipy import stats
X = stats.binom(10, 0.2) # Declare X to be a binomial random variable
help(X)                  # List all methods defined for X


Make a Poisson Random variable $Y$. This example declares $Y \sim \text{Poi}(\lambda = 2)$. It then calculates $P(Y = 3)$:

from scipy import stats
Y = stats.poisson(2) # Declare Y to be a poisson random variable
print Y.pmf(3)       # P(Y = 3)
print Y.rvs()        # Get a random sample from Y


Make a Geometric Random variable $X$, the number of trials until a success. This example declares $X \sim \text{Geo}(p = 0.75)$:

from scipy import stats
X = stats.geom(0.75) # Declare X to be a geometric random variable
print X.pmf(3)       # P(X = 3)
print X.rvs()        # Get a random sample from Y

Continuous Random Variables


Make a Normal Random variable $A$. This example declares $A \sim N(\mu = 3, \sigma^2 = 16)$. It then calculates $f_Y(0)$ and $F_Y(0)$. Very Imporatant!!! In class the second parameter to a normal was the variance ($\sigma^2$). In the scipy library the second parameter is the standard deviation ($\sigma$):

import math
from scipy import stats
A = stats.norm(3, math.sqrt(16)) # Declare A to be a normal random variable
print A.pdf(4)       # f(3), the probability density at 3
print A.cdf(2)       # F(2), which is also P(Y < 2)
print A.rvs()        # Get a random sample from A


Make an Exponential Random variable $B$. This example declares $B \sim \text{Exp}(\lambda = 4)$:

from scipy import stats
B = stats.expon(4)   # Declare B to be a normal random variable
print B.pdf(1)       # f(1), the probability density at 1
print B.cdf(2)       # F(2) which is also P(B < 2)
print B.rvs()        # Get a random sample from B


Make an Beta Random variable $X$. This example declares $X \sim \text{Beta}(\alpha = 1, \beta = 3)$:

from scipy import stats
X = stats.beta(1, 3) # Declare X to be a beta random variable
print X.pdf(0.5)     # f(0.5), the probability density at 1
print X.cdf(0.7)     # F(0.7) which is also P(X < 0.7)
print X.rvs()        # Get a random sample from X

What do you mean by probability mass function?

Definition. A probability mass function (pmf) is a function over the sample space of a discrete random variable X which gives the probability that X is equal to a certain value. Let X be a discrete random variable on a sample space S . Then the probability mass function f(x) is defined as. f(x)=P[X=x].

What is probability mass function in machine learning?

A probability mass function (PMF) is a function that models the potential outcomes of a discrete random variable. For a discrete random variable X, we can theoretically list the range R of all potential outcomes since each outcome must be discrete and therefore countable.

Is there a probability function in Python?

Python Bernoulli Distribution is a case of binomial distribution where we conduct a single experiment. This is a discrete probability distribution with probability p for value 1 and probability q=1-p for value 0. p can be for success, yes, true, or one. Similarly, q=1-p can be for failure, no, false, or zero.

Is probability mass function a probability?

Probability mass function can be defined as the probability that a discrete random variable will be exactly equal to some particular value. In other words, the probability mass function assigns a particular probability to every possible value of a discrete random variable.