Hướng dẫn dùng pyspark explode python
In this article, I will explain how to explode array or list and map columns to rows using different PySpark DataFrame functions (explode(), explore_outer(), posexplode(), posexplode_outer()) with Python example. Show Before we start, let’s create a DataFrame with array and map fields, below snippet, creates a DataFrame with columns “name” as StringType, “knownLanguage” as ArrayType and “properties” as MapType.
Outputs:
PySpark
function This will ignore elements that have null or empty. from the above example, Washington and Jefferson have null or empty values in array and map, hence the following snippet out does not contain these rows. 1.1 explode – array column example
Outputs
1.2 explode – map column example
Outputs:
2. explode_outer() – Create rows for each element in an array or map.PySpark SQL
3. posexplode() – explode array or map elements to rows
This will ignore elements that have null or empty. Since the Washington and Jefferson have null or empty values in array and map, the following snippet out does not contain these.
4. posexplode_outer() – explode array or map columns to rows.Spark
ConclusionIn this article, you have learned how to how to explode or convert array or map DataFrame columns to rows using explode and posexplode PySpark SQL functions and their’s respective outer functions and also learned differences between these functions using python example. |