broadcast(df)
function in Spark is used to explicitly broadcast a DataFrame or Dataset to all nodes in the cluster. Broadcasting is a technique used to optimize join operations by sending a small DataFrame to all worker nodes, reducing the amount of data shuffled across the network. This is particularly useful when joining a large DataFrame with a small DataFrame.
BROADCAST
hint in SQL queries.spark.sql.autoBroadcastJoinThreshold
), but you can use broadcast()
to explicitly control broadcasting.spark.sql.autoBroadcastJoinThreshold
), but you can use broadcast()
to explicitly control broadcasting.broadcast(df)
function is used to explicitly broadcast a DataFrame or Dataset to all nodes in the cluster.