How do i fit a glm model in python?

Using Maximum Likelihood and Gradient Descent to fit GLMs from scratch in Python

How do i fit a glm model in python?

In ordinary linear regression, we treat our outcome variable as a linear combination of several input variables plus some random noise, typically assumed to be Normally distributed. While this modeling approach is easily interpreted, efficiently implemented, and capable of accurately capturing many linear relationships, it does come with several significant limitations. A simple extension of linear models, a Generalized Linear Model (GLM) is able to relax some of linear regression’s most strict assumptions. These assumptions include:

  1. Linearity between the outcome and input variables
  2. Normal distribution of error terms
  3. Constant variance of error terms.

Relaxing these assumptions allows us to fit much more flexible models to much broader data types.

GLMs can be easily fit with a few lines of code in languages like R or Python, but to understand how a model works, it’s always helpful to get under the hood and code it up yourself. This article shows how to implement GLMs from scratch using only Python’s Numpy package. For more on the basics and intuition on GLMs, check out this article or this book.

GLM Structure

Fitting a GLM first requires specifying two components: a random distribution for our outcome variable and a link function between the distribution’s mean parameter and its “linear predictor”.

The Random Component

The first step to building our GLM is identifying the distribution of the outcome variable. If the data has a binary response, we might want to use the Bernoulli or Binomial distributions. If we are working with count data, a Poisson model might be more useful. This distribution is typically assumed to come from the Exponential Family of distributions, which includes the Binomial, Poisson, Negative Binomial, Gamma, and Normal.

Each of these models can be expressed in terms of its mean parameter, 𝜇 = E(Y). For instance, we specify a binomial model as Y ~ Bin(n, p), which can also be written as Y ~ Bin(n, 𝜇/n). Once we estimate 𝜇, we model Y as coming from a distribution indexed by 𝜇̂ and our predicted value of Y is simply 𝜇̂.

In a GLM, we estimate 𝜇 as a non-linear function of a “linear predictor” 𝜂, which itself is a linear function of the data. The non-linear function connecting 𝜇 to 𝜂 is called the link function, and we determine it before model-fitting. The link function is written as a function of 𝜇, e.g. 𝜂 = g(𝜇).

Suppose we have the following training data where each x is a D-dimensional vector:

How do i fit a glm model in python?

We first write 𝜂 as a linear function of x for each observation n = 1, … , N:

How do i fit a glm model in python?

Then we connect 𝜂 to 𝜇 with the link function:

How do i fit a glm model in python?

Fitting a GLM

To fit the GLM, we are actually just finding estimates for the βs: from these, we obtain estimates of 𝜂, which leads immediately to an estimate for 𝜇, which then gives us an estimated distribution for Y! To estimate the βs, follow these steps:

  1. Specify the distribution of Y as a function of 𝜇.
  2. Specify the link function, 𝜂 = g(𝜇).
  3. Identify a loss function. Here, we use the negative log-likelihood. We can decompose the loss function into a function of each of the linear predictors and the corresponding true Y values as shown in the image below.
  4. Find the β values to minimize the loss function, either through a closed-form solution or with gradient descent.

How do i fit a glm model in python?

Linear Regression — A Special Case of the GLM

To reinforce our understanding of this structure, let’s first write out a typical linear regression model in GLM format. As step 1, let’s specify the distribution of Y. Recall that a typical linear model assumes

How do i fit a glm model in python?

where β is a length-D vector of coefficients (this assumes we’ve added a 1 to each x so the first element in β is the intercept term). Note that the mean of this distribution is a linear combination of the data, meaning we could write this model in terms of our linear predictor by letting

How do i fit a glm model in python?

Then for step 2, we need to find the function linking 𝜇 and 𝜂. In the case of linear regression, it’s simple. Since E(Y) = 𝜇 and the mean of our modeled Y is 𝜂, we have 𝜂 = g(𝜇) = 𝜇!

Step 3: let’s find the negative log-likelihood.

How do i fit a glm model in python?

Note that our loss function is proportional to the sum of the squared errors.

Finally for step 4, let’s see if we can minimize this loss function analytically. Start by taking the derivative with respect to β and setting it equal to 0.

How do i fit a glm model in python?

This gives the closed-form solution we know and love from ordinary linear regression.

Now for the simple coding. Let’s randomly generate some normally-distributed Y values and fit the model. The scatterplot below shows that our fitted values for β are quite close to the true values.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
np.random.seed(123)## GENERATE STUFF ### Generate X
N = 1000 # number of observations
D = 10 # dimension of each observation
X = np.random.randn(N, D-1) # (minus 1 because we have to add the intercept)
intercept = np.ones(N).reshape(N, 1)
X = np.concatenate((intercept, X), axis = 1)
# Generate true beta
beta = np.random.randn(D)
# Generate response as function of X and beta
y = X @ beta + np.random.randn(N)
## FIT STUFF ##
beta_hat = np.linalg.inv(X.T @ X) @ X.T @ y
## PREDICT STUFF ##
y_hat = X @ beta_hat
## PLOT ##
fig, ax = plt.subplots()
sns.scatterplot(beta, beta_hat)
ax.set(xlabel = 'True Betas', ylabel = 'Estimated Betas', title = 'Beta vs. Beta Hats');

How do i fit a glm model in python?

Logistic Regression — A GLM for Binary Data

In logistic regression, we model our outputs as independent Bernoulli trials. I.e. we assume

How do i fit a glm model in python?

That completes step 1. For step 2, we must find a way to relate our linear predictor 𝜂 to our parameter p. Since p is between 0 and 1 and 𝜂 can be any real number, a natural choice is the log-odds. I.e.,

How do i fit a glm model in python?

Inversely, we use the sigmoid function to get from 𝜂 to p (which I will call S):

How do i fit a glm model in python?

This wraps up step 2. For step 3, find the negative log likelihood.

How do i fit a glm model in python?

This gives us our loss function and finishes step 3. For step 4, we find the values of β to minimize this loss. Unfortunately, in the logistic regression case, there is no closed-form solution, so we must use gradient descent. So, let’s find the derivative of the loss function with respect to β. First, note that S’(x) = S(x)(1-S(x)):

How do i fit a glm model in python?

Then we can derive the entire gradient:

How do i fit a glm model in python?

To speed up calculations in Python, we can also write this as

How do i fit a glm model in python?

where

How do i fit a glm model in python?

and

How do i fit a glm model in python?

Now let’s fit the model using gradient descent. Again, the scatterplot below shows that our fitted values for β are quite close to the true values.

np.random.seed(123)## GENERATE Ys ### Generate response as a function of the same X and beta
# only now it's a Bernoulli r.v.
def sigmoid(x):
return 1/(1 + np.exp(-x))
p = sigmoid(X @ beta)
y = np.random.binomial(1, p)
## FIT WITH GD ##
nruns = 10**5
learning_rate = 0.0001
beta_hat = np.random.randn(D) # initialize beta_hat estimates
p_hat = sigmoid(X @ beta_hat) # initialize p_hats
for run in range(nruns):
# get gradient
a, b = y*(1-p_hat), (1-y)*p_hat
grad = X.T @ (b-a)
# adjust beta hats
beta_hat -= learning_rate*grad
# adjust p hats
p_hat = sigmoid(X @ beta_hat)
## PLOT ##
fig, ax = plt.subplots()
sns.scatterplot(beta, beta_hat)
ax.set(xlabel = 'True Betas', ylabel = 'Estimated Betas', title = 'Beta vs. Beta Hats');

How do i fit a glm model in python?

Poisson Regression — A GLM for Count Data

The Poisson is a great way to model data that occurs in counts, such as accidents on a highway or deaths-by-horse-kick.

Step 1: Suppose we have

How do i fit a glm model in python?

Step 2, we specify the link function. The link function must convert a non-negative rate parameter λ to the linear predictor η ∈ ℝ. A common function is

How do i fit a glm model in python?

which of course has inverse

How do i fit a glm model in python?

Now for step 3, find the negative log-likelihood.

How do i fit a glm model in python?

And step 4, optimize for β.

How do i fit a glm model in python?

Sadly, there is no closed-form solution, so again we turn to gradient descent, as implemented below. Once again, the estimated parameters are plotted against the true parameters and once again the model does pretty well.

np.random.seed(123)## GENERATE Ys ### Generate response as a function of the same X and beta
# only now it's a Poisson r.v.
beta = np.random.randn(D)/2
lam = np.exp(X @ beta)
y = np.random.poisson(lam = lam)
## FIT WITH GD ##
nruns = 10**5
learning_rate = 0.00001
beta_hat = np.random.randn(D)/2 # initialize beta_hat estimates
lam_hat = np.exp(X @ beta_hat) # initialize lam_hats
for run in range(nruns):
# get gradient
c = y - lam_hat
grad = X.T @ c
# adjust beta hats
beta_hat -= learning_rate*grad
# adjust lambda hats
lam_hat = np.exp(X @ beta_hat)
## PLOT ##
fig, ax = plt.subplots()
sns.scatterplot(beta, beta_hat)
ax.set(xlabel = 'True Betas', ylabel = 'Estimated Betas', title = 'Beta vs. Beta Hats');

How do i fit a glm model in python?

When building GLMs in practice, R’s glm command and statsmodels’ GLM function in Python are easily implemented and efficiently programmed. But becoming familiar with the structure of a GLM is essential for parameter tuning and model selection. When it comes to modeling, often the best way to understand what’s underneath the hood is to build the car yourself.

How do you fit a GLM in Python?

How to fit a GLM in Python?.
Importing statsmodels import statsmodels.api as sm..
Support for formulas import statsmodels.formula.api as smf..
Use glm() directly from statsmodels.formula.api import glm..

How do you fit a GLM?

Fitting a GLM.
Specify the distribution of Y as a function of 𝜇..
Specify the link function, 𝜂 = g(𝜇)..
Identify a loss function. Here, we use the negative log-likelihood. ... .
Find the β values to minimize the loss function, either through a closed-form solution or with gradient descent..

How do you build a GLM model?

GLM in R: Generalized Linear Model with Example.
What is Logistic regression?.
How to create Generalized Liner Model (GLM).
Step 1) Check continuous variables..
Step 2) Check factor variables..
Step 3) Feature engineering..
Step 4) Summary Statistic..
Step 5) Train/test set..
Step 6) Build the model..

What distribution should I use for GLM?

If your outcome is continuous and unbounded, then the most "default" choice is the Gaussian distribution (a.k.a. normal distribution), i.e. the standard linear regression (unless you use other link function then the default identity link).