Hướng dẫn dummy python - trăn giả

Xem thảo luận

Cải thiện bài viết

Lưu bài viết

  • Đọc
  • Bàn luận
  • Xem thảo luận

    Cải thiện bài viết

    Lưu bài viết

    Đọc

    Explanation:

    Hướng dẫn dummy python - trăn giả

    Bàn luận

    Một bộ dữ liệu có thể chứa các loại giá trị khác nhau, đôi khi nó bao gồm các giá trị phân loại. Vì vậy, theo đơn đặt hàng để sử dụng các giá trị phân loại đó để lập trình một cách hiệu quả, chúng tôi tạo ra các biến giả. Một biến giả là một biến nhị phân cho biết liệu một biến phân loại riêng biệt có vào một giá trị cụ thể hay không. & NBSP;pandas.get_dummies(data, prefix=None, prefix_sep=’_’,)pandas.get_dummies(data, prefix=None, prefix_sep=’_’,)

    Parameters:

    • Như bạn có thể thấy ba biến giả được tạo cho ba giá trị phân loại của thuộc tính nhiệt độ. Chúng ta có thể tạo các biến giả trong python bằng phương thức get_dummies ().
    • Cú pháp: pandas.get_dummies (dữ liệu, tiền tố = none, prefix_sep = xông _,)
    • Dữ liệu = dữ liệu đầu vào, tức là nó bao gồm khung dữ liệu gấu trúc. danh sách . bộ . Mảng numpy, v.v.

    tiền tố = giá trị ban đầu Dummy variables. Dummy variables.

    prefix_sep = phân tách giá trị dữ liệu.

    • Loại trả về: Biến giả.
    • Cách tiếp cận từng bước:
    • Nhập các mô -đun cần thiết

    Xem xét dữ liệu

    Python3

    Thực hiện các hoạt động trên dữ liệu để lấy người giả

    Ví dụ 1: & nbsp;

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    4
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    0
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    1
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    2
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    3
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    4

    Output:

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    63
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    6

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    7
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    8
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    9
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    20
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    21
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    22223
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    24
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    26
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    24____29

    Python3

    Thực hiện các hoạt động trên dữ liệu để lấy người giả

    Ví dụ 1: & nbsp;

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    4
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    0
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    1
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    2
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    3
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    4

    Output:

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    63
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    6
     

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    7
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    8
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    9
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    20
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    21
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    22223
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    24
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    26
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    24____29

    Python3

    Thực hiện các hoạt động trên dữ liệu để lấy người giả

    Ví dụ 1: & nbsp;

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    4
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    0
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    1
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    2
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    3
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    4

    Output:


    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    6

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    7
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    8
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    9
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    20
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    21
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    22223
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    24
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    26
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    24____29

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    4
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    2
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    37
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    38

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    7
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    8
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    9
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    20
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    21
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    22223
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    24
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    26
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    24____295 trong gấu trúc hiện hoạt động hoàn toàn tốt. Điều này có nghĩa là những điều sau đây sẽ hoạt động:: Since others seem to be coming across this, the

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    3
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    4
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    0
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    15
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    21
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    10
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    58
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    58
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    29
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    0
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    04
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    21
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    06
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    08
    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    60
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    61
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    2
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    3
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    4

    Tôi đang cố gắng tạo ra một loạt các biến giả từ một biến phân loại sử dụng gấu trúc trong Python. Tôi đã bắt gặp hàm

    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    65, nhưng bất cứ khi nào tôi cố gắng gọi nó, tôi nhận được một lỗi mà tên không được xác định.

    Bất kỳ suy nghĩ hoặc cách khác để tạo ra các biến giả sẽ được đánh giá cao.53 gold badges444 silver badges586 bronze badges

    Chỉnh sửa: Vì những người khác dường như đang gặp phải điều này, chức năng Jul 20, 2012 at 22:33

    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    25 trong gấu trúc hiện hoạt động hoàn toàn tốt. Điều này có nghĩa là những điều sau đây sẽ hoạt động:: Since others seem to be coming across this, the

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    

    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    65 function in pandas now works perfectly fine. This means the following should work:
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    4May 29, 2014 at 3:26

    Xem http://blog.yhathq.com/posts/logistic-regression-and-python.html để biết thêm thông tin.Akavall

    Pirsquared47 gold badges199 silver badges244 bronze badges

    5

    Phù bằng vàng 273K53444 Huy hiệu bạc586 Huy hiệu đồng53 gold badges444 silver badges586 bronze badges

    Đã hỏi ngày 20 tháng 7 năm 2012 lúc 22:33Jul 20, 2012 at 22:33

    Khi tôi nghĩ về các biến giả, tôi nghĩ đến việc sử dụng chúng trong bối cảnh hồi quy OLS, và tôi sẽ làm điều gì đó như thế này:

    Đã trả lời ngày 29 tháng 5 năm 2014 lúc 3:26May 29, 2014 at 3:26Dec 24, 2015 at 21:07

    1

    AkavallakavallAkavall

    78.5K47 Huy hiệu vàng199 Huy hiệu bạc244 Huy hiệu đồng47 gold badges199 silver badges244 bronze badges

    Dựa trên tài liệu chính thức:

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    2

    Ngoài ra còn có một bài viết hay trong blog FASTML.

    Đã trả lời ngày 24 tháng 12 năm 2015 lúc 21:07Dec 24, 2015 at 21:07

    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    3

    Thật khó để suy luận những gì bạn đang tìm kiếm từ câu hỏi, nhưng dự đoán tốt nhất của tôi là như sau.

    Nếu chúng tôi giả sử bạn có một khung dữ liệu trong đó một số cột là 'danh mục' và chứa các số nguyên (hoặc các định danh duy nhất) cho các danh mục, thì chúng ta có thể làm như sau.

    Gọi DataFrame Jul 21, 2012 at 2:29

    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    67 và giả sử rằng với mỗi hàng,
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    68 là một số giá trị trong tập hợp các số nguyên từ 1 đến N. sau đó,
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    ely

    Bây giờ sẽ có một cột chỉ báo mới cho mỗi danh mục đúng/sai tùy thuộc vào việc dữ liệu trong hàng đó có trong danh mục đó hay không.32 gold badges143 silver badges220 bronze badges

    1

    Nếu bạn muốn kiểm soát các tên danh mục, bạn có thể tạo một từ điển, chẳng hạn như

    Để dẫn đến việc có các cột có tên được chỉ định, thay vì chỉ chuyển đổi chuỗi của các giá trị danh mục. Trong thực tế, đối với một số loại,

    Đã trả lời ngày 23 tháng 9 năm 2016 lúc 18:06Sep 23, 2016 at 18:06Sep 23, 2016 at 18:06

    Erdem Kayaerdem KayaErdem KAYAErdem KAYA

    4291 Huy hiệu vàng4 Huy hiệu bạc13 Huy hiệu đồng1 gold badge4 silver badges13 bronze badges1 gold badge4 silver badges13 bronze badges

    Bạn có thể tạo các biến giả để xử lý dữ liệu phân loại

    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    1

    Điều này sẽ giảm các cột ban đầu trong TrainDF và nối cột với các biến giả ở cuối DataFdummies DataFrame.trainDf and append the column with dummy variables at the end of the trainDfDummies dataframe.trainDf and append the column with dummy variables at the end of the trainDfDummies dataframe.

    Nó tự động tạo tên cột bằng cách nối thêm các giá trị ở cuối tên cột gốc.

    Đã trả lời ngày 21 tháng 5 năm 2017 lúc 23:28May 21, 2017 at 23:28May 21, 2017 at 23:28

    rzskhrrzskhrrzskhrrzskhr

    87111 Huy hiệu bạc9 Huy hiệu đồng11 silver badges9 bronze badges11 silver badges9 bronze badges

    Một cách tiếp cận rất đơn giản mà không sử dụng get_dummies nếu bạn có biến rất ít phân loại bằng cách sử dụng numpy và gấu trúc.without using get_dummies if you have very less categorical variable using NumPy and Pandas.without using get_dummies if you have very less categorical variable using NumPy and Pandas.

    Đặt, tôi có một cột có tên và nó có 3 biến phân loại và chúng tôi muốn gán 0 và 1 cho tương ứng.

    Chúng ta có thể làm điều đó với mã đơn giản sau đây.

    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    0

    Ở trên, chúng tôi tạo ba cột mới để lưu trữ các giá trị "NewYork_State", "California_state", "Florida_state".

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    23
    for elem in dfrm['Category'].unique():
        dfrm[str(elem)] = dfrm['Category'] == elem
    
    6

    Đã trả lời ngày 23 tháng 7 năm 2021 lúc 6:56Jul 23, 2021 at 6:56Jul 23, 2021 at 6:56

    Vì vậy, tôi thực sự cần một câu trả lời cho câu hỏi này ngày hôm nay (25/7/2013), vì vậy tôi đã viết điều này sớm hơn. Tôi đã thử nghiệm nó với một số ví dụ đồ chơi, hy vọng bạn sẽ nhận được một số dặm từ nó

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    2

    Đã trả lời ngày 25 tháng 7 năm 2013 lúc 0:12Jul 25, 2013 at 0:12Jul 25, 2013 at 0:12

    1

    Tôi đã tạo một biến giả cho mọi trạng thái bằng cách sử dụng mã này.

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    0

    Tổng quát hơn, tôi sẽ chỉ sử dụng. Apply và chuyển nó một chức năng ẩn danh với sự bất bình đẳng xác định danh mục của bạn.

    (Cảm ơn @prpl.mnky.dshwshr cho cái nhìn sâu sắc .unique ())

    Đã trả lời ngày 20 tháng 12 năm 2014 lúc 5:51Dec 20, 2014 at 5:51Dec 20, 2014 at 5:51

    userfoguserfoguserFoguserFog

    9.8711 Huy hiệu vàng14 Huy hiệu bạc7 Huy hiệu đồng1 gold badge14 silver badges7 bronze badges1 gold badge14 silver badges7 bronze badges

    Xử lý các tính năng phân loại Scikit-Learn mong đợi tất cả các tính năng sẽ có số. Vậy làm thế nào để chúng tôi bao gồm một tính năng phân loại trong mô hình của chúng tôi?

    Các loại được đặt hàng: Chuyển đổi chúng thành các giá trị số hợp lý (ví dụ: Small = 1, Medium = 2, Lớn = 3) Danh mục không được đặt hàng: Sử dụng mã hóa giả (0/1) Các tính năng phân loại trong bộ dữ liệu của chúng tôi là gì?

    Các loại được đặt hàng: thời tiết (đã được mã hóa với các giá trị số hợp lý) Các loại không được đặt hàng: mùa (nhu cầu mã hóa giả), kỳ nghỉ (đã được mã hóa giả), ngày làm việc (đã được mã hóa giả) cho mùa , 2 = mùa hè, 3 = mùa thu và 4 = mùa đông, bởi vì đó sẽ ngụ ý một mối quan hệ theo thứ tự. Thay vào đó, chúng tôi tạo nhiều biến giả:

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    1

    Đã trả lời ngày 5 tháng 4 năm 2018 lúc 7:38Apr 5, 2018 at 7:38Apr 5, 2018 at 7:38

    Một cách đơn giản và mạnh mẽ để tạo hình nộm dựa trên một cột với các giá trị danh mục của bạn:

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    
    my_data = np.array([[5, 'a', 1],
                        [3, 'b', 3],
                        [1, 'b', 2],
                        [3, 'a', 1],
                        [4, 'b', 2],
                        [7, 'c', 1],
                        [7, 'c', 1]])                
    
    
    df = pd.DataFrame(data=my_data, columns=['y', 'dummy', 'x'])
    just_dummies = pd.get_dummies(df['dummy'])
    
    step_1 = pd.concat([df, just_dummies], axis=1)      
    step_1.drop(['dummy', 'c'], inplace=True, axis=1)
    # to run the regression we want to get rid of the strings 'a', 'b', 'c' (obviously)
    # and we want to get rid of one dummy variable to avoid the dummy variable trap
    # arbitrarily chose "c", coefficients on "a" an "b" would show effect of "a" and "b"
    # relative to "c"
    step_1 = step_1.applymap(np.int) 
    
    result = sm.OLS(step_1['y'], sm.add_constant(step_1[['x', 'a', 'b']])).fit()
    print result.summary()
    
    2

    Nhưng coi chừng khi thực hiện một số hồi quy OLS vì bạn sẽ cần loại trừ một trong các danh mục để bạn không rơi vào biến Bẫy Dummie

    Đã trả lời ngày 6 tháng 11 năm 2021 lúc 12:55Nov 6, 2021 at 12:55Nov 6, 2021 at 12:55

    RamonramonRamonRamon

    356 Huy hiệu Đồng6 bronze badges6 bronze badges

    Làm thế nào để bạn tạo một biến giả trong Python?

    Chúng ta có thể tạo các biến giả trong python bằng phương thức get_dummies () ....

    Cú pháp: pandas.get_dummies (dữ liệu, tiền tố = none, prefix_sep = '_',).

    Parameters:.

    Loại trả về: Biến giả ..

    Làm thế nào để bạn tạo một biến giả mới?

    Có hai bước để thiết lập thành công các biến giả trong hồi quy bội: (1) tạo các biến giả đại diện cho các loại của biến độc lập phân loại của bạn;và (2) nhập các giá trị vào các biến giả này - được gọi là mã hóa giả - để thể hiện các loại của độc lập phân loại ...create dummy variables that represent the categories of your categorical independent variable; and (2) enter values into these dummy variables – known as dummy coding – to represent the categories of the categorical independent ...create dummy variables that represent the categories of your categorical independent variable; and (2) enter values into these dummy variables – known as dummy coding – to represent the categories of the categorical independent ...

    Lệnh nào được sử dụng để có được biến giả trong Python?

    Hàm get_dummies () được sử dụng để chuyển đổi biến phân loại thành các biến giả/chỉ báo.get_dummies() function is used to convert categorical variable into dummy/indicator variables.get_dummies() function is used to convert categorical variable into dummy/indicator variables.

    Các biến giả là gì?

    Một biến giả trong gấu trúc là một biến chỉ báo chỉ lấy giá trị, 0 hoặc, 1, để cho biết liệu một biến phân loại riêng biệt có thể lấy một giá trị cụ thể hay không.Để tạo một biến giả trong một khung dữ liệu nhất định trong gấu trúc, chúng tôi sử dụng hàm get_dummies ().an indicator variable that takes only the value, 0 , or, 1 , to indicate whether a separate categorical variable can take a specific value or not. To create a dummy variable in a given DataFrame in pandas, we make use of the get_dummies() function.an indicator variable that takes only the value, 0 , or, 1 , to indicate whether a separate categorical variable can take a specific value or not. To create a dummy variable in a given DataFrame in pandas, we make use of the get_dummies() function.