Reference
Spark: lit function
The lit()
function in Spark is used to create a new column with a constant or literal value. It is part of the pyspark.sql.functions
module and is particularly useful when you need to add a column with a fixed value to a DataFrame. This function is often used in combination with other transformations, such as withColumn()
.
1. Syntax
PySpark:
2. Parameters
- value: The constant value to be added as a new column. This can be a string, number, boolean, or any other literal value.
3. Return Type
- Returns a
Column
object representing the constant value.
4. Examples
Example 1: Adding a Constant Column to a DataFrame
PySpark:
Spark SQL:
Output:
Example 2: Adding a Numeric Constant Column
PySpark:
Spark SQL:
Output:
Example 3: Using lit()
in an Expression
PySpark:
Spark SQL:
Output:
Example 4: Adding a Boolean Constant Column
PySpark:
Spark SQL:
Output:
Example 5: Using lit()
with Null Values
PySpark:
Spark SQL:
Output:
Example 6: Using lit()
with Conditional Logic
PySpark:
Spark SQL:
Output:
Example 7: Using lit()
with String Concatenation
PySpark:
Spark SQL:
Output:
Example 8: Using lit()
with Date and Timestamp Values
PySpark:
Spark SQL:
Output:
5. Common Use Cases
- Adding metadata columns (e.g., country, status, created_date).
- Creating derived columns with fixed values (e.g., bonuses, default values).
- Using constant values in complex expressions or transformations.
6. Performance Considerations
- Using
lit()
is a metadata operation and does not involve data movement, making it very efficient. - Combine
lit()
with other functions (e.g.,withColumn()
,select()
) for advanced transformations.
7. Key Takeaways
- The
lit()
function is used to create a new column with a constant or literal value. - It can be used to add columns with string, numeric, boolean, or null values.
- In Spark SQL, similar functionality can be achieved using literal values directly in
SELECT
statements. - Using
lit()
is lightweight and does not impact performance. - Works efficiently on large datasets.