> ## Documentation Index
> Fetch the complete documentation index at: https://rajanand.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Spark: withColumnRenamed() function

The `withColumnRenamed()` command in Spark is used to rename an existing column in a DataFrame. This is particularly useful when you need to standardize column names, make them more descriptive, or align them with a specific naming convention.

***

## 1. **Syntax**

**PySpark:**

```python theme={"system"}
df.withColumnRenamed(existing_col_name, new_col_name)
```

**Spark SQL:**

```sql theme={"system"}
SELECT existing_col_name AS new_col_name FROM table_name;
```

## 2. **Parameters**

* **existing\_col\_name**: The current name of the column you want to rename.
* **new\_col\_name**: The new name you want to assign to the column.

## 3. **Return Type**

* Returns a new DataFrame with the renamed column.

## 4. **Examples**

### **Example 1: Renaming a Single Column**

**PySpark:**

```python theme={"system"}
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("WithColumnRenamedExample").getOrCreate()

data = [("Anand", 25), ("Bala", 30), ("Kavitha", 28), ("Raj", 35)]
columns = ["Name", "Age"]

df = spark.createDataFrame(data, columns)

# Renaming the 'Age' column to 'Years'
df_renamed = df.withColumnRenamed("Age", "Years")
df_renamed.show()
```

**Spark SQL:**

```sql theme={"system"}
SELECT Name, Age AS Years FROM people;
```

**Output:**

```
+-------+-----+
|   Name|Years|
+-------+-----+
|  Anand|   25|
|   Bala|   30|
| Kavitha|   28|
|    Raj|   35|
+-------+-----+
```

### **Example 2: Renaming Multiple Columns**

**PySpark:**

```python theme={"system"}
# Renaming multiple columns
df_renamed = df.withColumnRenamed("Name", "FullName") \
               .withColumnRenamed("Age", "Years")
df_renamed.show()
```

**Spark SQL:**

```sql theme={"system"}
SELECT Name AS FullName, Age AS Years FROM people;
```

**Output:**

```
+--------+-----+
|FullName|Years|
+--------+-----+
|   Anand|   25|
|    Bala|   30|
|  Kavitha|   28|
|     Raj|   35|
+--------+-----+
```

### **Example 3: Renaming Columns with Special Characters**

**PySpark:**

```python theme={"system"}
# Renaming a column with special characters
df_with_special_chars = df.withColumnRenamed("Name", "Full Name")
df_with_special_chars.show()
```

**Spark SQL:**

```sql theme={"system"}
SELECT Name AS `Full Name`, Age FROM people;
```

**Output:**

```
+--------+---+
|Full Name|Age|
+--------+---+
|   Anand| 25|
|    Bala| 30|
|  Kavitha| 28|
|     Raj| 35|
+--------+---+
```

### **Example 4: Renaming Columns in a DataFrame with Multiple Columns**

**PySpark:**

```python theme={"system"}
data = [("Anand", 25, "M"), ("Bala", 30, "F"), ("Kavitha", 28, "F"), ("Raj", 35, "M")]
columns = ["Name", "Age", "Gender"]

df = spark.createDataFrame(data, columns)

# Renaming multiple columns
df_renamed = df.withColumnRenamed("Name", "FullName") \
               .withColumnRenamed("Age", "Years") \
               .withColumnRenamed("Gender", "Sex")
df_renamed.show()
```

**Spark SQL:**

```sql theme={"system"}
SELECT Name AS FullName, Age AS Years, Gender AS Sex FROM people;
```

**Output:**

```
+--------+-----+---+
|FullName|Years|Sex|
+--------+-----+---+
|   Anand|   25|  M|
|    Bala|   30|  F|
|  Kavitha|   28|  F|
|     Raj|   35|  M|
+--------+-----+---+
```

### **Example 5: Renaming Columns in a DataFrame with Nested Structures**

**PySpark:**

```python theme={"system"}
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

schema = StructType([
    StructField("Name", StringType(), True),
    StructField("Age", IntegerType(), True),
    StructField("Address", StructType([
        StructField("City", StringType(), True),
        StructField("State", StringType(), True)
    ]), True)
])

data = [("Anand", 25, ("Chennai", "Tamil Nadu")), 
        ("Bala", 30, ("Bangalore", "Karnataka")), 
        ("Kavitha", 28, ("Hyderabad", "Telangana"))]

df = spark.createDataFrame(data, schema)

# Renaming nested columns
df_renamed = df.withColumnRenamed("Address.City", "City") \
               .withColumnRenamed("Address.State", "State")
df_renamed.show(truncate=False)
```

**Output:**

```
+-------+---+------------+
|Name   |Age|Address     |
+-------+---+------------+
|Anand  |25 |{Chennai, Tamil Nadu}|
|Bala   |30 |{Bangalore, Karnataka}|
|Kavitha|28 |{Hyderabad, Telangana}|
+-------+---+------------+
```

### **Example 6: Renaming Columns with Dynamic Names**

**PySpark:**

```python theme={"system"}
# Renaming columns dynamically
new_column_names = {"Name": "FullName", "Age": "Years"}

for old_name, new_name in new_column_names.items():
    df = df.withColumnRenamed(old_name, new_name)

df.show()
```

**Output:**

```
+--------+-----+
|FullName|Years|
+--------+-----+
|   Anand|   25|
|    Bala|   30|
|  Kavitha|   28|
|     Raj|   35|
+--------+-----+
```

## 5. **Common Use Cases**

* Renaming columns to match a specific schema or data model.
* Preparing data for joins or merges by ensuring consistent column names.
* Improving the readability of column names for reporting or analysis.

## 6. **Performance Considerations**

* Renaming columns is a lightweight operation as it only changes metadata and does not involve data transformation.
* Use this command to avoid data duplication or unnecessary transformations.

## 7. **Key Takeaways**

1. The `withColumnRenamed()` command is used to rename one or more columns in a DataFrame.
2. It allows you to rename columns to make them more descriptive or align them with a specific naming convention.
3. Renaming columns is a metadata operation and does not involve data movement, making it very efficient.
4. In Spark SQL, similar renaming can be achieved using `AS` in `SELECT` statements.
5. Works efficiently on large datasets as it does not involve data transformation.
