select()
.
filter()
or where()
.
groupBy()
and perform aggregations (e.g., sum()
, avg()
, count()
, max()
, min()
) using agg()
.
join()
.
orderBy()
.
withColumn()
and remove columns using drop()
.
write()
.
Q: What is a DataFrame in Spark?
Q: How is a DataFrame different from an RDD?
Q: Can I use DataFrame with PySpark?
Q: How do I create a DataFrame in Spark?
spark.read
APIs.Q: Can DataFrames be used with Spark SQL?
Q: Are DataFrames mutable?
Q: How does a DataFrame improve performance?
Q: Can I convert between DataFrame and RDD?
.rdd
property:
.toDF()
method or createDataFrame()
:
Q: What file formats can DataFrames read and write?
spark.read.format
and df.write.format
.