The filter()
or where()
command in Spark is used to filter rows from a DataFrame based on a specified condition. Both filter()
and where()
are interchangeable and can be used to achieve the same result. The primary purpose of these commands is to select a subset of rows that meet a given condition.
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
PySpark:
Spark SQL:
Output:
filter()
and where()
commands are essential for data manipulation in Spark, allowing you to select specific rows based on conditions.filter()
and where()
are used to filter rows based on a condition.&
, |
, ~
), or even SQL-like expressions.