Skip to main content

Documentation Index

Fetch the complete documentation index at: https://rajanand.org/llms.txt

Use this file to discover all available pages before exploring further.

The display() function is commonly used in Databricks notebooks to render DataFrames, charts, and other visualizations in an interactive and user-friendly format. It is not a native Spark function but is specific to Databricks. The display() function provides a rich set of features for data exploration, including tabular views, charts, and custom visualizations.

1. Syntax

  • Databricks:
    display(df)
    
  • PySpark (outside Databricks):
    • Use df.show() or df.toPandas() for similar functionality.

2. Key Features

  • Interactive Tables: Displays DataFrames in an interactive table with sorting, filtering, and pagination.
  • Visualizations: Supports built-in charts (e.g., bar charts, line charts, pie charts) for data exploration.
  • Custom Visualizations: Allows custom visualizations using libraries like Matplotlib, Plotly, or Seaborn.
  • Rich Output: Can display images, HTML, and other rich content.

3. Examples

Example 1: Displaying a DataFrame as a Table

  • Databricks:
    from pyspark.sql import SparkSession
    
    spark = SparkSession.builder.appName("DisplayExample").getOrCreate()
    
    # Create DataFrame
    data = [("Anand", 25, 3000), ("Bala", 30, 4000), ("Kavitha", 28, 3500), ("Raj", 35, 4500)]
    columns = ["Name", "Age", "Salary"]
    
    df = spark.createDataFrame(data, columns)
    
    # Display the DataFrame
    display(df)
    
Output:
  • An interactive table with columns Name, Age, and Salary.

Example 2: Displaying a Chart

  • Databricks:
    # Display a bar chart of Salary by Name
    display(df)
    
    • After running the above code, click on the Chart button in the Databricks notebook to visualize the data as a bar chart.
Output:
  • A bar chart showing Salary on the y-axis and Name on the x-axis.

Example 3: Displaying a Pie Chart

  • Databricks:
    # Display a pie chart of Age distribution
    display(df)
    
    • After running the above code, click on the Chart button and select Pie Chart to visualize the data.
Output:
  • A pie chart showing the distribution of Age.

Example 4: Displaying Custom Visualizations

  • Databricks:
    import matplotlib.pyplot as plt
    import pandas as pd
    
    # Convert Spark DataFrame to Pandas DataFrame
    pandas_df = df.toPandas()
    
    # Create a custom bar chart
    plt.bar(pandas_df["Name"], pandas_df["Salary"])
    plt.xlabel("Name")
    plt.ylabel("Salary")
    plt.title("Salary by Name")
    plt.show()
    
    # Display the chart
    display()
    
Output:
  • A custom bar chart created using Matplotlib.

Example 5: Displaying a DataFrame with Filters

  • Databricks:
    # Display the DataFrame with filters
    display(df)
    
    • After running the above code, use the filter options in the interactive table to filter rows.
Output:
  • An interactive table with filter options.

Example 6: Displaying a Line Chart

  • Databricks:
    # Display a line chart of Salary by Age
    display(df)
    
    • After running the above code, click on the Chart button and select Line Chart to visualize the data.
Output:
  • A line chart showing Salary on the y-axis and Age on the x-axis.

Example 7: Displaying HTML Content

  • Databricks:
    # Display HTML content
    html_content = "<h1>Hello, Databricks!</h1>"
    displayHTML(html_content)
    
Output:
  • Rendered HTML content in the notebook.

4. Common Use Cases

  • Exploring and analyzing data interactively in Databricks notebooks.
  • Creating visualizations for data insights and reporting.
  • Sharing results with stakeholders in a user-friendly format.

5. Performance Considerations

  • display() is optimized for Databricks notebooks and works efficiently with large datasets.
  • Use it judiciously for very wide DataFrames (many columns), as it processes all specified columns.

6. Key Takeaways

  1. Purpose: The display() function is used in Databricks notebooks to render DataFrames, charts, and visualizations interactively.
  2. Interactive Tables: Provides sorting, filtering, and pagination for DataFrames.
  3. Visualizations: Supports built-in charts and custom visualizations.
  4. Common Use Cases:
    • Exploring and analyzing data interactively.
    • Creating visualizations for data insights.
    • Sharing results in a user-friendly format.
  5. Performance: display() is optimized for Databricks notebooks and works efficiently with large datasets.