map
, filter
, flatMap
, join
, groupBy
, sort
, distinct
, union
, intersection
, except
, etc.count
, collect
, take
, first
, reduce
, saveAsTextFile
, show
, write.parquet
, etc.Feature | Transformations | Actions |
---|---|---|
Nature | Lazy (deferred computation) | Eager (immediate computation) |
Execution | Builds a lineage of operations; doesn’t execute until an action is called | Triggers the execution of the entire lineage |
Return Value | Returns a new RDD or DataFrame | Returns a value to the driver program (e.g., count, collected data, etc.) |
Effect on Data | Creates a new dataset; original dataset remains unchanged | May modify data (e.g., writing to a file) but primarily retrieves results |
Examples | map , filter , flatMap , join , groupBy , select , withColumn | count , collect , take , first , reduce , show , saveAsTextFile , write.parquet |
Memory Usage | Generally lower memory usage until an action is triggered | Can consume significant memory, especially with collect on large datasets |
collect
on a massive dataset) can lead to performance issues or application crashes. Always consider the size of your data and choose actions carefully.