col()
function in Spark is used to reference a column in a DataFrame. It is part of the pyspark.sql.functions
module and is commonly used in DataFrame transformations, such as filtering, sorting, and aggregations. The col()
function allows you to refer to columns dynamically and is particularly useful when working with complex expressions or when column names are stored in variables.
Column
object that represents the specified column.col()
in an Expressioncol()
with Aliasescol()
in Aggregationscol()
with Conditional Logiccol()
with String Functionscol()
with Mathematical Operationsfilter()
, select()
, withColumn()
).col()
is a metadata operation and does not involve data movement, making it very efficient.col()
with other functions (e.g., sum()
, avg()
) for advanced transformations.col()
function is used to reference a column in a DataFrame.col()
is lightweight and does not impact performance.