Spark: Datasets vs DataFrames vs RDDs
Dataset | DataFrame | RDD |
---|---|---|
Spark v1.6 in 2015 | Spark v1.3 in 2013 | Spark v1.0 in 2011 |
Type-safe | Not type-safe | Type-safe ?? |
High level API | High level API | Low level API |
Encoders | Catalyst | Tungsten? |
Optimization | Catalyst Optimizer | No Optimization |
OOPS Style API | SQL Style API | OOPS Style API |
What to do Approach | What to do Approach | How to do Approach |
Scala, Java | Scala, Java, Python, R | Scala, Java, Python, R |
Structured Schema | Structured Schema | No Schema |
Compile Time Error | Run Time Error | Compile Time Error |
Serialization can be avoided | Serialization can be avoided | Serialization canβt be avoided |