union()
command in Spark is used to combine two DataFrames with the same schema (i.e., the same column names and data types) into a single DataFrame. It appends the rows of one DataFrame to another, similar to the SQL UNION ALL
operation. If you want to remove duplicates, you can use distinct()
after the union.
df1
.union()
judiciously on large datasets, as it can increase the size of the DataFrame.distinct()
after union()
if you need to remove duplicates, but be aware that it involves shuffling and sorting.union()
command is used to combine two DataFrames with the same schema into a single DataFrame.UNION ALL
.UNION ALL
or UNION
(to remove duplicates).