The display() function is commonly used in Databricks notebooks to render DataFrames, charts, and other visualizations in an interactive and user-friendly format. It is not a native Spark function but is specific to Databricks. The display() function provides a rich set of features for data exploration, including tabular views, charts, and custom visualizations.


1. Syntax

  • Databricks:
    display(df)
    
  • PySpark (outside Databricks):
    • Use df.show() or df.toPandas() for similar functionality.

2. Key Features

  • Interactive Tables: Displays DataFrames in an interactive table with sorting, filtering, and pagination.
  • Visualizations: Supports built-in charts (e.g., bar charts, line charts, pie charts) for data exploration.
  • Custom Visualizations: Allows custom visualizations using libraries like Matplotlib, Plotly, or Seaborn.
  • Rich Output: Can display images, HTML, and other rich content.

3. Examples

Example 1: Displaying a DataFrame as a Table

  • Databricks:
    from pyspark.sql import SparkSession
    
    spark = SparkSession.builder.appName("DisplayExample").getOrCreate()
    
    # Create DataFrame
    data = [("Anand", 25, 3000), ("Bala", 30, 4000), ("Kavitha", 28, 3500), ("Raj", 35, 4500)]
    columns = ["Name", "Age", "Salary"]
    
    df = spark.createDataFrame(data, columns)
    
    # Display the DataFrame
    display(df)
    

Output:

  • An interactive table with columns Name, Age, and Salary.

Example 2: Displaying a Chart

  • Databricks:
    # Display a bar chart of Salary by Name
    display(df)
    
    • After running the above code, click on the Chart button in the Databricks notebook to visualize the data as a bar chart.

Output:

  • A bar chart showing Salary on the y-axis and Name on the x-axis.

Example 3: Displaying a Pie Chart

  • Databricks:
    # Display a pie chart of Age distribution
    display(df)
    
    • After running the above code, click on the Chart button and select Pie Chart to visualize the data.

Output:

  • A pie chart showing the distribution of Age.

Example 4: Displaying Custom Visualizations

  • Databricks:
    import matplotlib.pyplot as plt
    import pandas as pd
    
    # Convert Spark DataFrame to Pandas DataFrame
    pandas_df = df.toPandas()
    
    # Create a custom bar chart
    plt.bar(pandas_df["Name"], pandas_df["Salary"])
    plt.xlabel("Name")
    plt.ylabel("Salary")
    plt.title("Salary by Name")
    plt.show()
    
    # Display the chart
    display()
    

Output:

  • A custom bar chart created using Matplotlib.

Example 5: Displaying a DataFrame with Filters

  • Databricks:
    # Display the DataFrame with filters
    display(df)
    
    • After running the above code, use the filter options in the interactive table to filter rows.

Output:

  • An interactive table with filter options.

Example 6: Displaying a Line Chart

  • Databricks:
    # Display a line chart of Salary by Age
    display(df)
    
    • After running the above code, click on the Chart button and select Line Chart to visualize the data.

Output:

  • A line chart showing Salary on the y-axis and Age on the x-axis.

Example 7: Displaying HTML Content

  • Databricks:
    # Display HTML content
    html_content = "<h1>Hello, Databricks!</h1>"
    displayHTML(html_content)
    

Output:

  • Rendered HTML content in the notebook.

4. Common Use Cases

  • Exploring and analyzing data interactively in Databricks notebooks.
  • Creating visualizations for data insights and reporting.
  • Sharing results with stakeholders in a user-friendly format.

5. Performance Considerations

  • display() is optimized for Databricks notebooks and works efficiently with large datasets.
  • Use it judiciously for very wide DataFrames (many columns), as it processes all specified columns.

6. Key Takeaways

  1. Purpose: The display() function is used in Databricks notebooks to render DataFrames, charts, and visualizations interactively.
  2. Interactive Tables: Provides sorting, filtering, and pagination for DataFrames.
  3. Visualizations: Supports built-in charts and custom visualizations.
  4. Common Use Cases:
    • Exploring and analyzing data interactively.
    • Creating visualizations for data insights.
    • Sharing results in a user-friendly format.
  5. Performance: display() is optimized for Databricks notebooks and works efficiently with large datasets.