Hướng dẫn welch t-test in python

What is it?

Welch’s t-test is a nonparametric univariate test that tests for a significant difference between the mean of two unrelated groups. It is an alternative to the independent t-test when there is a violation in the assumption of equality of variances.

Nội dung chính Show

What is it?
Welch’s t-test Assumptions
Welch’s t-test Example
Welch’s t-test Interpretation

The hypothesis being tested is:

Null hypothesis (H0): u1 = u2, which translates to the mean of sample 1 is equal to the mean of sample 2
Alternative hypothesis (HA): u1 ≠ u2, which translates to the mean of sample 1 is not equal to the mean of sample 2

If the p-value is less than what is tested at, most commonly 0.05, one can reject the null hypothesis.

Welch’s t-test Assumptions

Like every test, this inferential statistic test has assumptions. The assumptions that the data must meet in order for the test results to be valid are:

The independent variable (IV) is categorical with at least two levels (groups)
The dependent variable (DV) is continuous which is measured on an interval or ratio scale
The distribution of the two groups should follow the normal distribution

If any of these assumptions are violated then another test should be used.

Data used in this example

The data used in this example is from Kaggle.com and was posted by the user Web IR. The link to the data set is here. The data set contains the sepal and petal length and width of various floral species. We will be testing to see if there is a significant difference in the petal lenght between the species Iris-setosa and Iris-virginica which are variables “petal_length” and “species” respectively.

Let’s import pandas as pd, the data, and then take a look at what we will be working with!

import pandas as pd

df= pd.read_csv("Iris_Data.csv")

df.groupby("species")['petal_length'].describe()

species	count	mean	std	min	25%	50%	75%	max
Iris-setosa	50.0	1.464	0.173511	1.0	1.4	1.50	1.575	1.9
Iris-versicolor	50.0	4.260	0.469911	3.0	4.0	4.35	4.600	5.1
Iris-virginica	50.0	5.552	0.551895	4.5	5.1	5.55	5.875	6.9

To make the code in the next steps a bit cleaner to read, I will create 2 data frames that are subsets of the original data where each data frame only contains data for a respective flower species.

setosa = df[(df['species'] == 'Iris-setosa')]
virginica = df[(df['species'] == 'Iris-virginica')]

Welch’s t-test Example

The first thing we need to do is import scipy.stats as stats and then test our assumptions. We can test the assumption of normality using the stats.shapiro(). Unfortunately, the output is not labeled. The first value in the tuple is the W test statistic, and the second value is the p-value.

from scipy import stats

stats.shapiro(setosa['petal_length'])

(0.9549458622932434, 0.05464918911457062)

stats.shapiro(virginica['petal_length'])

(0.9621862769126892, 0.10977369546890259)

Neither of the variables of interest violates the assumption of normality so we can continue with our analysis plan. To conduct a Welch’s t-test, one needs to use the stats.ttest_ind() method while passing “False” in the “equal_var=” argument.

stats.ttest_ind(setosa['petal_length'], virginica['petal_length'], equal_var = False)

Ttest_indResult(statistic=-49.965703359355636, pvalue=9.7138670616970964e-50)

The p-value is significant, therefore one can reject the null hypothesis in support of the alternative.

Another piece of information you will need to report is the degrees of freedom (DoF). However, there is not a built-in method for this currently. Below are 2 functions that will give you what you need. The first, only calculates the DoF as a two tail test and returns it. The second, conducts the Welch’s test, calculates the DoF as a two tail test, and returns all the needed information.

def welch_dof(x,y):
        dof = (x.var()/x.size + y.var()/y.size)**2 / ((x.var()/x.size)**2 / (x.size-1) + (y.var()/y.size)**2 / (y.size-1))
        print(f"Welch-Satterthwaite Degrees of Freedom= {dof:.4f}")
        
welch_dof(setosa['petal_length'], virginica['petal_length'])

Welch-Satterthwaite Degrees of Freedom= 58.5928

def welch_ttest(x, y): 
    ## Welch-Satterthwaite Degrees of Freedom ##
    dof = (x.var()/x.size + y.var()/y.size)**2 / ((x.var()/x.size)**2 / (x.size-1) + (y.var()/y.size)**2 / (y.size-1))
   
    t, p = stats.ttest_ind(x, y, equal_var = False)
    
    print("\n",
          f"Welch's t-test= {t:.4f}", "\n",
          f"p-value = {p:.4f}", "\n",
          f"Welch-Satterthwaite Degrees of Freedom= {dof:.4f}")

welch_ttest(setosa['petal_length'], virginica['petal_length'])

Welch’s t-test= -49.9657
p-value = 0.0000
Welch-Satterthwaite Degrees of Freedom= 58.5928

Welch’s t-test Interpretation

The current study aimed to test if there was a significant difference in the petal length between the floral species Setosa and Virginica. Setosa has shorter petal length (M= 1.464 units, SD= 0.174 units) compared to Virginica (M= 5.552 units, SD= 0.552 units). Welch’s t-test was selected to analyze the data because Levene’s test for homogeneity of variances indicated unequal variances between groups (F= 39.977, p< 0.0001). The difference in petal length between the two species is significantly different (Welch's t(-49.966)= 58.593, p< 0.0001).

Hướng dẫn welch t-test in python

What is it?

Welch’s t-test Assumptions

Data used in this example

Welch’s t-test Example

Welch’s t-test Interpretation

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội