I've done some searching and can't figure out how to filter a dataframe by
df["col"].str.contains[word]
however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of
![df["col"].str.contains[word]]
Can this be done through a DataFrame
method?
asked Jun 13, 2013 at 21:43
You can use the invert [~] operator [which acts like a not for boolean data]:
new_df = df[~df["col"].str.contains[word]]
where new_df
is the copy returned by RHS.
contains also accepts a regular expression...
If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False
:
new_df = df[~df["col"].str.contains[word, na=False]]
Or,
new_df = df[df["col"].str.contains[word] == False]
fantabolous
19.6k6 gold badges52 silver badges46 bronze badges
answered Jun 13, 2013 at 21:51
Andy HaydenAndy Hayden
337k96 gold badges603 silver badges517 bronze badges
2
I was having trouble with the not [~] symbol as well, so here's another way from another StackOverflow thread:
df[df["col"].str.contains['this|that']==False]
Shaido
26.3k21 gold badges68 silver badges72 bronze badges
answered Dec 15, 2016 at 21:10
nanselm2nanselm2
1,27710 silver badges11 bronze badges
3
You can use Apply and Lambda :
df[df["col"].apply[lambda x: word not in x]]
Or if you want to define more complex rule, you can use AND:
df[df["col"].apply[lambda x: word_1 not in x and word_2 not in x]]
answered Jan 14, 2019 at 3:13
ArashArash
8241 gold badge8 silver badges17 bronze badges
6
I hope the answers are already posted
I am adding the framework to find multiple words and negate those from dataFrame.
Here 'word1','word2','word3','word4'
= list of patterns to search
df
= DataFrame
column_a
= A column name from DataFrame df
values_to_remove = ['word1','word2','word3','word4']
pattern = '|'.join[values_to_remove]
result = df.loc[~df['column_a'].str.contains[pattern, case=False]]
answered Feb 8, 2019 at 13:37
NursnaazNursnaaz
1,89319 silver badges26 bronze badges
1
I had to get rid of the NULL values before using the command recommended by Andy above. An example:
df = pd.DataFrame[index = [0, 1, 2], columns=['first', 'second', 'third']]
df.ix[:, 'first'] = 'myword'
df.ix[0, 'second'] = 'myword'
df.ix[2, 'second'] = 'myword'
df.ix[1, 'third'] = 'myword'
df
first second third
0 myword myword NaN
1 myword NaN myword
2 myword myword NaN
Now running the command:
~df["second"].str.contains[word]
I get the following error:
TypeError: bad operand type for unary ~: 'float'
I got rid of the NULL values using dropna[] or fillna[] first and retried the command with no problem.
answered Nov 22, 2016 at 22:06
ShoreshShoresh
2,40315 silver badges9 bronze badges
2
Additional to nanselm2's answer, you can use 0
instead of False
:
df["col"].str.contains[word]==0
answered Oct 16, 2018 at 7:01
U12-ForwardU12-Forward
66k12 gold badges76 silver badges95 bronze badges
1
To negate your query use ~
. Using query
has the advantage of returning the valid observations of df
directly:
df.query['~col.str.contains["word"].values']
answered Apr 16 at 21:09
rachwarachwa
7904 silver badges15 bronze badges
To compliment to the above question, if someone wants to remove all the rows with strings, one could do:
df_new=df[~df['col_name'].apply[lambda x: isinstance[x, str]]]
answered Aug 5, 2021 at 14:28
vasanthvasanth
331 silver badge5 bronze badges
somehow '.contains' didn't work for me but when I tried with '.isin' as mentioned by @kenan in the answer [How to drop rows from pandas data frame that contains a particular string in a particular column?] it works. Adding further, if you want to look at the entire dataframe and remove those rows which has the specific word [or set of words] just use the loop below
for col in df.columns:
df = df[~df[col].isin[['string or string list separeted by comma']]]
just remove ~ to get the dataframe that contains the word
answered Jun 15 at 12:03
Bhanu ChanderBhanu Chander
3381 gold badge5 silver badges15 bronze badges