Calculate p value logistic regression python

$\begingroup$

I am building a multinomial logistic regression with sklearn [LogisticRegression]. But after it finishes, how can I get a p-value and confident interval of my model? It only appears that sklearn only provides coefficient and intercept.

Thank you a lot.

asked Nov 28, 2016 at 17:10

$\endgroup$

$\begingroup$

answered Nov 28, 2016 at 17:23

HobbesHobbes

1,4098 silver badges15 bronze badges

$\endgroup$

0

$\begingroup$

One way to get confidence intervals is to bootstrap your data, say, $B$ times and fit logistic regression models $m_i$ to the dataset $B_i$ for $i = 1, 2, ..., B$. This gives you a distribution for the parameters you are estimating, from which you can find the confidence intervals.

answered Nov 28, 2016 at 19:00

darXiderdarXider

5831 gold badge4 silver badges12 bronze badges

$\endgroup$

$\begingroup$

This is still not implemented and not planned as it seems out of scope of sklearn, as per Github discussion #6773 and #13048.

However, the documentation on linear models now mention that [P-value estimation note]:

  • It is theoretically possible to get p-values and confidence intervals for coefficients in cases of regression without penalization.
  • The statsmodels package natively supports this.
  • Within sklearn, one could use bootstrapping.

It appears that it is possible to modify the LinearRegression class to calculate p-values from linear algebra, as per this Github code.

answered Mar 7, 2020 at 19:14

lcrmorinlcrmorin

2,2045 gold badges17 silver badges37 bronze badges

$\endgroup$

In this Python tutorial, we will learn about scikit-learn logistic regression and we will also cover different examples related to scikit-learn logistic regression. And, we will cover these topics.

  • Scikit-learn logistic regression
  • Scikit-learn logistic regression standard errors
  • Scikit-learn logistic regression coefficients
  • Scikit-learn logistic regression p value
  • Scikit-learn logistic regression feature importance
  • Scikit-learn logistic regression categorical variables
  • Scikit-learn logistic regression cross-validation
  • Scikit-learn logistic regression threshold

In this section, we will learn about how to work with logistic regression in scikit-learn.

  • Logistic regression is a statical method for preventing binary classes or we can say that logistic regression is conducted when the dependent variable is dichotomous.
  • Dichotomous means there are two possible classes like binary classes [0&1].
  • Logistic regression is used for classification as well as regression. It computes the probability of an event occurrence.

Code:

Here in this code, we will import the load_digits data set with the help of the sklearn library. The data is inbuilt in sklearn we do not need to upload the data.

from sklearn.datasets import load_digits
digits = load_digits[]

We can already import the data with the help of sklearn from this uploaded data from the below command we can see that there are 1797 images and 1797 labels in the dataset.


print['Image Data Shape' , digits.data.shape]

print["Label Data Shape", digits.target.shape

In the following output, we can see that the Image Data Shape value and Label Data Shape value is printing on the screen.

Importing dataset value

In this part, we will see that how our image and labels look like the images and help to evoke your data.

  • plot.figure[figsize=[30,4]] is used for plotting the figure on the screen.
  • for index, [image, label] in enumerate[zip[digits.data[5:10], digits.target[5:10]]]: is used to give the perfect size or label to the image.
  • plot.subplot[1, 5, index + 1] is used to plotting the index.
  • plot.imshow[np.reshape[image, [8,8]], cmap=plt.cm.gray] is used for reshaping the image.
  • plot.title[‘Set: %i\n’ % label, fontsize = 30] is used to give the title to the image.
import numpy as np 
import matplotlib.pyplot as plot
plot.figure[figsize=[30,4]]
for index, [image, label] in enumerate[zip[digits.data[5:10], digits.target[5:10]]]:
 plot.subplot[1, 5, index + 1]
 plot.imshow[np.reshape[image, [8,8]], cmap=plt.cm.gray]
 plot.title['Set: %i\n' % label, fontsize = 30]

After running the above code we get the following output we can see that the image is plotted on the screen in the form of Set5, Set6, Set7, Set8, Set9.

Enumerate digits target set

In the following code, we are splitting our data into two forms training data and testing data.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split[digits.data, digits.target, test_size=0.25, random_state=0]

Here we import logistic regression from sklearn .sklearn is used to just focus on modeling the dataset.

from sklearn.linear_model import LogisticRegression

In the below code we make an instance of the model. In here all parameters not specified are set to their defaults.

logisticRegression= LogisticRegression[]

Above we split the data into two sets training and testing data. We can train the model after training the data we want to test the data

logisticRegression.fit[x_train, y_train]

The model can be learned during the model training process and predict the data from one observation and return the data in the form of an array.

logisticRegression.predict[x_test[0].reshape[1,-1]

In the following output, we see the NumPy array is returned after predicting for one observation.

Return the array

From the below code we can predict that multiple observations at once.

logisticRegression.predict[x_test[0:10]]

From this code, we can predict the entire data.

logisticRegression.predict[x_test[0:10]]

After training and testing our model is ready or not to find that we can measure the accuracy of the model we can use the scoring method to get the accuracy of the model.

predictions = logisticRegression.predict[x_test]
score = logisticRegression.score[x_test, y_test]
print[score]

In this output, we can get the accuracy of a model by using the scoring method.

Predict the accuracy of a model

Also, check: Scikit learn Decision Tree

Scikit-learn logistic regression standard errors

As we know logistic regression is a statical method for preventing binary classes and we know the logistic regression is conducted when the dependent variable is dichotomous.

Here we can work on logistic standard error. The standard error is defined as the coefficient of the model are the square root of their diagonal entries of the covariance matrix.

Code:

In the following code, we will work on the standard error of logistic regression as we know the standard error is the square root of the diagonal entries of the covariance matrix.

from sklearn.metrics import mean_squared_error
y_true = [4, -0.6, 3, 8]
y_pred = [3.5, 0.1, 3, 9]
mean_squared_error[y_true, y_pred]
0.475
y_true = [4, -0.6, 3, 8]
y_pred = [3.5, 0.1, 3, 9]
mean_squared_error[y_true, y_pred, squared=False]
0.712
y_true = [[0.6, 2],[-2, 2],[8, -7]]
y_pred = [[1, 3],[-1, 3],[7, -6]]
mean_squared_error[y_true, y_pred]
0.808
mean_squared_error[y_true, y_pred, squared=False]
0.922
mean_squared_error[y_true, y_pred, multioutput='raw_values']
array=[[0.51666667, 2]]
mean_squared_error[y_true, y_pred, multioutput=[0.3, 0.7]]
0.925

Output:

After running the above code we get the following output in which we can see that the error value is generated and seen on the screen.

Scikit-learn logistic regression standard error

Read: Scikit learn Random Forest

Scikit-learn logistic regression coefficients

In this section, we will learn about how to work with logistic regression coefficients in scikit-learn.

The coefficient is defined as a number in which the value of the given term is multiplied by each other. Here the logistic regression expresses the size and direction of a variable.

Code:

In the following code, we are importing the libraries import pandas as pd, import numpy as np, import sklearn as sl.

  • The panda library is used for data manipulation and numpy is used for working with arrays.
  • The sklearn library is used for focusing on the modelling data not focusing on manipulating the data.
  • x = np.random.randint[0, 7, size=n] is used for generating the random function.
  • res_sd = sd.Logit[y, x].fit[method=”ncg”, maxiter=max_iter] is used for performing different statical task.
  • print[res_sl.coef_] is used for printing the coefficient on the screen.

import pandas as pd
import numpy as np
import sklearn as sl
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sd

n = 250

x = np.random.randint[0, 7, size=n]
y = [x > [0.10 + np.random.normal[0, 0.10, n]]].astype[int]

display[pd.crosstab[ y, x ]]


max_iter = 150


res_sd = sd.Logit[y, x].fit[method="ncg", maxiter=max_iter]
print[res_sd.params]


res_sl = LogisticRegression[ solver='newton-cg', multi_class='multinomial', max_iter=max_iter, fit_intercept=True, C=1e8 ]
res_sl.fit[ x.reshape[n, 1], y ]
print[res_sl.coef_]

Output:

After running the above code we get the following output in which we can see that the scikit learn logistic regression coefficient is printed on the screen.

scikit learn logistic regression coefficient

Read: Scikit learn Feature Selection

Scikit-learn logistic regression p value

In this section, we will learn about how to calculate the p-value of logistic regression in scikit learn.

Logistic regression pvalue is used to test the null hypothesis and its coefficient is equal to zero. The lowest pvalue is

Chủ Đề