Calculate p value logistic regression python

$\begingroup$

I am building a multinomial logistic regression with sklearn [LogisticRegression]. But after it finishes, how can I get a p-value and confident interval of my model? It only appears that sklearn only provides coefficient and intercept.

Thank you a lot.

asked Nov 28, 2016 at 17:10

$\endgroup$

$\begingroup$

answered Nov 28, 2016 at 17:23

HobbesHobbes

1,4098 silver badges15 bronze badges

$\endgroup$

$\begingroup$

One way to get confidence intervals is to bootstrap your data, say, $B$ times and fit logistic regression models $m_i$ to the dataset $B_i$ for $i = 1, 2, ..., B$. This gives you a distribution for the parameters you are estimating, from which you can find the confidence intervals.

answered Nov 28, 2016 at 19:00

darXiderdarXider

5831 gold badge4 silver badges12 bronze badges

$\endgroup$

$\begingroup$

This is still not implemented and not planned as it seems out of scope of sklearn, as per Github discussion #6773 and #13048.

However, the documentation on linear models now mention that [P-value estimation note]:

It is theoretically possible to get p-values and confidence intervals for coefficients in cases of regression without penalization.
The statsmodels package natively supports this.
Within sklearn, one could use bootstrapping.

It appears that it is possible to modify the LinearRegression class to calculate p-values from linear algebra, as per this Github code.

answered Mar 7, 2020 at 19:14

lcrmorinlcrmorin

2,2045 gold badges17 silver badges37 bronze badges

$\endgroup$

In this Python tutorial, we will learn about scikit-learn logistic regression and we will also cover different examples related to scikit-learn logistic regression. And, we will cover these topics.

Scikit-learn logistic regression
Scikit-learn logistic regression standard errors
Scikit-learn logistic regression coefficients
Scikit-learn logistic regression p value
Scikit-learn logistic regression feature importance
Scikit-learn logistic regression categorical variables
Scikit-learn logistic regression cross-validation
Scikit-learn logistic regression threshold

In this section, we will learn about how to work with logistic regression in scikit-learn.

Logistic regression is a statical method for preventing binary classes or we can say that logistic regression is conducted when the dependent variable is dichotomous.
Dichotomous means there are two possible classes like binary classes [0&1].
Logistic regression is used for classification as well as regression. It computes the probability of an event occurrence.

Code:

Here in this code, we will import the load_digits data set with the help of the sklearn library. The data is inbuilt in sklearn we do not need to upload the data.

from sklearn.datasets import load_digits
digits = load_digits[]

We can already import the data with the help of sklearn from this uploaded data from the below command we can see that there are 1797 images and 1797 labels in the dataset.


print['Image Data Shape' , digits.data.shape]

print["Label Data Shape", digits.target.shape

In the following output, we can see that the Image Data Shape value and Label Data Shape value is printing on the screen.

Importing dataset value

In this part, we will see that how our image and labels look like the images and help to evoke your data.

plot.figure[figsize=[30,4]] is used for plotting the figure on the screen.
for index, [image, label] in enumerate[zip[digits.data[5:10], digits.target[5:10]]]: is used to give the perfect size or label to the image.
plot.subplot[1, 5, index + 1] is used to plotting the index.
plot.imshow[np.reshape[image, [8,8]], cmap=plt.cm.gray] is used for reshaping the image.
plot.title[‘Set: %i\n’ % label, fontsize = 30] is used to give the title to the image.

import numpy as np 
import matplotlib.pyplot as plot
plot.figure[figsize=[30,4]]
for index, [image, label] in enumerate[zip[digits.data[5:10], digits.target[5:10]]]:
 plot.subplot[1, 5, index + 1]
 plot.imshow[np.reshape[image, [8,8]], cmap=plt.cm.gray]
 plot.title['Set: %i\n' % label, fontsize = 30]

After running the above code we get the following output we can see that the image is plotted on the screen in the form of Set5, Set6, Set7, Set8, Set9.

Enumerate digits target set

In the following code, we are splitting our data into two forms training data and testing data.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split[digits.data, digits.target, test_size=0.25, random_state=0]

Here we import logistic regression from sklearn .sklearn is used to just focus on modeling the dataset.

from sklearn.linear_model import LogisticRegression

In the below code we make an instance of the model. In here all parameters not specified are set to their defaults.

logisticRegression= LogisticRegression[]

Above we split the data into two sets training and testing data. We can train the model after training the data we want to test the data

logisticRegression.fit[x_train, y_train]

The model can be learned during the model training process and predict the data from one observation and return the data in the form of an array.

logisticRegression.predict[x_test[0].reshape[1,-1]

In the following output, we see the NumPy array is returned after predicting for one observation.

Return the array

From the below code we can predict that multiple observations at once.

logisticRegression.predict[x_test[0:10]]

From this code, we can predict the entire data.

logisticRegression.predict[x_test[0:10]]

After training and testing our model is ready or not to find that we can measure the accuracy of the model we can use the scoring method to get the accuracy of the model.

predictions = logisticRegression.predict[x_test]
score = logisticRegression.score[x_test, y_test]
print[score]

In this output, we can get the accuracy of a model by using the scoring method.

Predict the accuracy of a model

Also, check: Scikit learn Decision Tree

Scikit-learn logistic regression standard errors

As we know logistic regression is a statical method for preventing binary classes and we know the logistic regression is conducted when the dependent variable is dichotomous.

Here we can work on logistic standard error. The standard error is defined as the coefficient of the model are the square root of their diagonal entries of the covariance matrix.

Code:

In the following code, we will work on the standard error of logistic regression as we know the standard error is the square root of the diagonal entries of the covariance matrix.

from sklearn.metrics import mean_squared_error
y_true = [4, -0.6, 3, 8]
y_pred = [3.5, 0.1, 3, 9]
mean_squared_error[y_true, y_pred]
0.475
y_true = [4, -0.6, 3, 8]
y_pred = [3.5, 0.1, 3, 9]
mean_squared_error[y_true, y_pred, squared=False]
0.712
y_true = [[0.6, 2],[-2, 2],[8, -7]]
y_pred = [[1, 3],[-1, 3],[7, -6]]
mean_squared_error[y_true, y_pred]
0.808
mean_squared_error[y_true, y_pred, squared=False]
0.922
mean_squared_error[y_true, y_pred, multioutput='raw_values']
array=[[0.51666667, 2]]
mean_squared_error[y_true, y_pred, multioutput=[0.3, 0.7]]
0.925

Output:

After running the above code we get the following output in which we can see that the error value is generated and seen on the screen.

Scikit-learn logistic regression standard error

Read: Scikit learn Random Forest

Scikit-learn logistic regression coefficients

In this section, we will learn about how to work with logistic regression coefficients in scikit-learn.

The coefficient is defined as a number in which the value of the given term is multiplied by each other. Here the logistic regression expresses the size and direction of a variable.

Code:

In the following code, we are importing the libraries import pandas as pd, import numpy as np, import sklearn as sl.

The panda library is used for data manipulation and numpy is used for working with arrays.
The sklearn library is used for focusing on the modelling data not focusing on manipulating the data.
x = np.random.randint[0, 7, size=n] is used for generating the random function.
res_sd = sd.Logit[y, x].fit[method=”ncg”, maxiter=max_iter] is used for performing different statical task.
print[res_sl.coef_] is used for printing the coefficient on the screen.


import pandas as pd
import numpy as np
import sklearn as sl
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sd

n = 250

x = np.random.randint[0, 7, size=n]
y = [x > [0.10 + np.random.normal[0, 0.10, n]]].astype[int]

display[pd.crosstab[ y, x ]]


max_iter = 150


res_sd = sd.Logit[y, x].fit[method="ncg", maxiter=max_iter]
print[res_sd.params]


res_sl = LogisticRegression[ solver='newton-cg', multi_class='multinomial', max_iter=max_iter, fit_intercept=True, C=1e8 ]
res_sl.fit[ x.reshape[n, 1], y ]
print[res_sl.coef_]

Output:

After running the above code we get the following output in which we can see that the scikit learn logistic regression coefficient is printed on the screen.

scikit learn logistic regression coefficient

Read: Scikit learn Feature Selection

Scikit-learn logistic regression p value

In this section, we will learn about how to calculate the p-value of logistic regression in scikit learn.

Logistic regression pvalue is used to test the null hypothesis and its coefficient is equal to zero. The lowest pvalue is

Bài Viết Liên Quan

Hướng dẫn dùng sql platforms trong PHP

Tuyển sinh đại học từ xa 2023

Hướng dẫn python3 iterate over enum

Hướng dẫn dùng absolute function trong PHP

Hướng dẫn dùng numpy mod python

Hướng dẫn intersection multiple sets python

Hướng dẫn aes/cbc/nopadding php

Hướng dẫn arguments length javascript

Hướng dẫn view code trong excel

Hướng dẫn new array javascript

How check if string is uppercase in php?

Giá vé đầm sen nước 1/2/2023

Hướng dẫn before css là gì

Cấu trúc các the trong html

Hướng dẫn laravel không nhận css

How do i increase php max input vars in wordpress?

Hướng dẫn pandas vs excel reddit

Hướng dẫn dùng define methods trong PHP

Hướng dẫn instant wordpress là gì

Thời gian nộp hồ sơ xét tuyển đại học 2023

Toplist mới

#1

Top 7 sự tích hồ gươm - ngữ văn lớp 6 2023
5 tháng trước

#2

Top 7 gdcd 6 bài 1 kết nối tri thức 2023
5 tháng trước

#3

Top 7 ý nghĩa của xây dựng gia đình văn hóa 2023
5 tháng trước

#4

Top 6 mẫu hợp đồng mượn đất làm nhà xưởng 2023
5 tháng trước

#5

Top 3 tổng tài biến thái tôi yêu anh tập 27 2023
5 tháng trước

#6

Top 6 kết thực phim mỹ nhân vô lệ 2023
5 tháng trước

#7

Top 9 trong những câu thơ sau câu nào sử dụng thành ngữ 2023
5 tháng trước

#8

Top 8 đề tài và chủ de của tác phẩm tắt đèn 2023
5 tháng trước

#9

Top 5 tiểu sử của thầy thích pháp hòa 2023
5 tháng trước

Bài mới nhất

Có xét ngành nghề khi nhập khẩu hàng hóa năm 2024

Phần mềm khắc phụ lỗi wifi trên an năm 2024

Banner cỡ lớn treo ngoài đường tiếng anh là gì năm 2024

Top hãng mặt nạ nội địa trung quốc năm 2024

Học hỏi những điểm tốt tiếng anh là gì năm 2024

Cường hóa lên thẳng 15 trong nháy mắt năm 2024

Phòng khám trung nguyện ở bình đại bến tre năm 2024

Cải lương chi bảo là gì năm 2024

Bài tập hỗn hợp kim loại tác dụng với hno3 năm 2024

Chủ Đề

programming Hỏi Đáp Toplist Là gì Bài Tập Địa Điểm Hay Mẹo Hay Học Tốt Nghĩa của từ Công Nghệ Khỏe Đẹp bao nhiêu Top List Tiếng anh Bao nhiêu Sản phẩm tốt Xây Đựng Ngôn ngữ javascript Ở đâu Đại học Hướng dẫn Bài tập Tại sao Dịch So Sánh Máy tính Món Ngon mẹo hay Bao lâu Thế nào So sánh Khoa Học Vì sao Lớp 9 Lớp 10