Hướng dẫn does not contain python

I've done some searching and can't figure out how to filter a dataframe by

df["col"].str.contains(word)

however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of

!(df["col"].str.contains(word))

Can this be done through a DataFrame method?

asked Jun 13, 2013 at 21:43

You can use the invert (~) operator (which acts like a not for boolean data):

new_df = df[~df["col"].str.contains(word)]

where new_df is the copy returned by RHS.

contains also accepts a regular expression...


If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False:

new_df = df[~df["col"].str.contains(word, na=False)]

Or,

new_df = df[df["col"].str.contains(word) == False]

fantabolous

19.6k6 gold badges52 silver badges46 bronze badges

answered Jun 13, 2013 at 21:51

Hướng dẫn does not contain python

Andy HaydenAndy Hayden

337k96 gold badges603 silver badges517 bronze badges

2

I was having trouble with the not (~) symbol as well, so here's another way from another StackOverflow thread:

df[df["col"].str.contains('this|that')==False]

Shaido

26.3k21 gold badges68 silver badges72 bronze badges

answered Dec 15, 2016 at 21:10

Hướng dẫn does not contain python

nanselm2nanselm2

1,27710 silver badges11 bronze badges

3

You can use Apply and Lambda :

df[df["col"].apply(lambda x: word not in x)]

Or if you want to define more complex rule, you can use AND:

df[df["col"].apply(lambda x: word_1 not in x and word_2 not in x)]

answered Jan 14, 2019 at 3:13

ArashArash

8241 gold badge8 silver badges17 bronze badges

6

I hope the answers are already posted

I am adding the framework to find multiple words and negate those from dataFrame.

Here 'word1','word2','word3','word4' = list of patterns to search

df = DataFrame

column_a = A column name from DataFrame df

values_to_remove = ['word1','word2','word3','word4'] 

pattern = '|'.join(values_to_remove)

result = df.loc[~df['column_a'].str.contains(pattern, case=False)]

answered Feb 8, 2019 at 13:37

Hướng dẫn does not contain python

NursnaazNursnaaz

1,89319 silver badges26 bronze badges

1

I had to get rid of the NULL values before using the command recommended by Andy above. An example:

df = pd.DataFrame(index = [0, 1, 2], columns=['first', 'second', 'third'])
df.ix[:, 'first'] = 'myword'
df.ix[0, 'second'] = 'myword'
df.ix[2, 'second'] = 'myword'
df.ix[1, 'third'] = 'myword'
df

    first   second  third
0   myword  myword   NaN
1   myword  NaN      myword 
2   myword  myword   NaN

Now running the command:

~df["second"].str.contains(word)

I get the following error:

TypeError: bad operand type for unary ~: 'float'

I got rid of the NULL values using dropna() or fillna() first and retried the command with no problem.

answered Nov 22, 2016 at 22:06

Hướng dẫn does not contain python

ShoreshShoresh

2,40315 silver badges9 bronze badges

2

Additional to nanselm2's answer, you can use 0 instead of False:

df["col"].str.contains(word)==0

answered Oct 16, 2018 at 7:01

Hướng dẫn does not contain python

U12-ForwardU12-Forward

66k12 gold badges76 silver badges95 bronze badges

1

To negate your query use ~. Using query has the advantage of returning the valid observations of df directly:

df.query('~col.str.contains("word").values')

answered Apr 16 at 21:09

rachwarachwa

7904 silver badges15 bronze badges

To compliment to the above question, if someone wants to remove all the rows with strings, one could do:

df_new=df[~df['col_name'].apply(lambda x: isinstance(x, str))]

answered Aug 5, 2021 at 14:28

vasanthvasanth

331 silver badge5 bronze badges

somehow '.contains' didn't work for me but when I tried with '.isin' as mentioned by @kenan in the answer (How to drop rows from pandas data frame that contains a particular string in a particular column?) it works. Adding further, if you want to look at the entire dataframe and remove those rows which has the specific word (or set of words) just use the loop below

for col in df.columns:
    df = df[~df[col].isin(['string or string list separeted by comma'])]

just remove ~ to get the dataframe that contains the word

answered Jun 15 at 12:03

Hướng dẫn does not contain python

Bhanu ChanderBhanu Chander

3381 gold badge5 silver badges15 bronze badges

Not the answer you're looking for? Browse other questions tagged python pandas contains or ask your own question.