The pivot()
command in Spark is used to transform rows into columns, effectively rotating data from a long format to a wide format. This is particularly useful for creating summary tables or pivot tables, where you want to aggregate data and display it in a more readable format.
PySpark:
Spark SQL:
sum()
, count()
, avg()
).PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
pivot()
judiciously on large datasets, as it involves shuffling and sorting.pivot()
command is used to transform rows into columns, creating a pivot table.CASE
statements and aggregation functions.