Hướng dẫn check if nan python

Comparison pd.isna, math.isnan and np.isnan and their flexibility dealing with different type of objects.

The table below shows if the type of object can be checked with the given method:


+------------+-----+---------+------+--------+------+
|   Method   | NaN | numeric | None | string | list |
+------------+-----+---------+------+--------+------+
| pd.isna    | yes | yes     | yes  | yes    | yes  |
| math.isnan | yes | yes     | no   | no     | no   |
| np.isnan   | yes | yes     | no   | no     | yes  | <-- # will error on mixed type list
+------------+-----+---------+------+--------+------+

pd.isna

The most flexible method to check for different types of missing values.


None of the answers cover the flexibility of pd.isna. While math.isnan and np.isnan will return True for NaN values, you cannot check for different type of objects like None or strings. Both methods will return an error, so checking a list with mixed types will be cumbersom. This while pd.isna is flexible and will return the correct boolean for different kind of types:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: missing_values = [3, None, np.NaN, pd.NA, pd.NaT, '10']

In [4]: pd.isna(missing_values)
Out[4]: array([False,  True,  True,  True,  True, False])

Nội dung chính

  • How to check if a single value is NaN in python. There are approaches are using libraries (pandas, math and numpy) and without using libraries.
  • Method 1: Using Pandas Library
  • Method 2: Using Numpy Library
  • Method 3: Using math library
  • Method 4: Comparing with itself
  • Method 5: Checking the range
  • Become a Member

How to check if a single value is NaN in python. There are approaches are using libraries (pandas, math and numpy) and without using libraries.

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float.

NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get the desired results.

Finding and dealing with NaN within an array, series or dataframe is easy. However, identifying a stand alone NaN value is tricky. In this article I explain five methods to deal with NaN in python. The first three methods involves in-built functions from libraries. The last two relies on properties of NaN for finding NaN values.

Method 1: Using Pandas Library

isna() in pandas library can be used to check if the value is null/NaN. It will return True if the value is NaN/null.

import pandas as pd
x = float("nan")
print(f"It's pd.isna : {pd.isna(x)}")
OutputIt's pd.isna : True

Method 2: Using Numpy Library

isnan() in numpy library can be used to check if the value is null/NaN. It is similar to isna() in pandas.

import numpy as np
x = float("nan")
print(f"It's np.isnan : {np.isnan(x)}")
OutputIt's np.isnan : True

Method 3: Using math library

Math library provides has built-in mathematical functions. The library is applicable to all real numbers. cmath library can be used if dealing with complex numbers.
Math library has built in function isnan() to check null/NaN values.

import math
x = float("nan")
print(f"It's math.isnan : {math.isnan(x)}")
OutputIt's math.isnan : True

Method 4: Comparing with itself

When I started my career working with big IT company, I had to undergo a training for the first month. The trainer, when introducing the concept of NaN values mentioned that they are like aliens we know nothing about. These aliens are constantly shapeshifting, and hence we cannot compare NaN value against itself.
The most common method to check for NaN values is to check if the variable is equal to itself. If it is not, then it must be NaN value.

def isNaN(num):
return num!= num
x=float("nan")
isNaN(x)
OutputTrue

Method 5: Checking the range

Another property of NaN which can be used to check for NaN is the range. All floating point values fall within the range of minus infinity to infinity.

infinity < any number< infinity

However, NaN values does not come within this range. Hence, NaN can be identified if the value does not fall within the range from minus infinity to infinity.

This can be implemented as below:

def isNaN(num):
if float('-inf') < float(num) < float('inf'):
return False
else:
return True
x=float("nan")
isNaN(x)
OutputTrue

I hope you have found the above article helpful. I am sure there would be many other techniques to check for NaN values based on various other logics. Please share the other methods you have come across to check for NaN/ Null values.

Cheers!

Become a Member

I hope you like the article, I would highly recommend signing up for Medium Membership to read more articles by me or stories by thousands of other authors on variety of topics.
Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

Bạn có một cặp đôi tùy chọn.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Bây giờ khung dữ liệu trông giống như thế này:

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810
  • Tùy chọn 1 : df.isnull().any().any()- Điều này trả về giá trị boolean

Bạn biết cái isnull()nào sẽ trả về một khung dữ liệu như thế này:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

Nếu bạn tạo nó df.isnull().any(), bạn chỉ có thể tìm thấy các cột có NaNgiá trị:

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

Một người nữa .any()sẽ cho bạn biết nếu có bất kỳ điều nào ở trênTrue

> df.isnull().any().any()
True
  • Tùy chọn 2 : df.isnull().sum().sum()- Điều này trả về một số nguyên của tổng số NaNgiá trị:

Điều này hoạt động theo cách tương tự như trước .any().any(), bằng cách trước tiên đưa ra tổng của số lượng NaNgiá trị trong một cột, sau đó là tổng của các giá trị đó:

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

Cuối cùng, để có được tổng số giá trị NaN trong DataFrame:

df.isnull().sum().sum()
5

128 hữu ích 0 bình luận chia sẻ