The display()
function is commonly used in Databricks notebooks to render DataFrames, charts, and other visualizations in an interactive and user-friendly format. It is not a native Spark function but is specific to Databricks. The display()
function provides a rich set of features for data exploration, including tabular views, charts, and custom visualizations.
1. Syntax
- Databricks:
- PySpark (outside Databricks):
- Use
df.show()
or df.toPandas()
for similar functionality.
2. Key Features
- Interactive Tables: Displays DataFrames in an interactive table with sorting, filtering, and pagination.
- Visualizations: Supports built-in charts (e.g., bar charts, line charts, pie charts) for data exploration.
- Custom Visualizations: Allows custom visualizations using libraries like Matplotlib, Plotly, or Seaborn.
- Rich Output: Can display images, HTML, and other rich content.
3. Examples
Example 1: Displaying a DataFrame as a Table
- Databricks:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("DisplayExample").getOrCreate()
# Create DataFrame
data = [("Anand", 25, 3000), ("Bala", 30, 4000), ("Kavitha", 28, 3500), ("Raj", 35, 4500)]
columns = ["Name", "Age", "Salary"]
df = spark.createDataFrame(data, columns)
# Display the DataFrame
display(df)
Output:
- An interactive table with columns
Name
, Age
, and Salary
.
Example 2: Displaying a Chart
- Databricks:
# Display a bar chart of Salary by Name
display(df)
- After running the above code, click on the Chart button in the Databricks notebook to visualize the data as a bar chart.
Output:
- A bar chart showing
Salary
on the y-axis and Name
on the x-axis.
Example 3: Displaying a Pie Chart
- Databricks:
# Display a pie chart of Age distribution
display(df)
- After running the above code, click on the Chart button and select Pie Chart to visualize the data.
Output:
- A pie chart showing the distribution of
Age
.
Example 4: Displaying Custom Visualizations
- Databricks:
import matplotlib.pyplot as plt
import pandas as pd
# Convert Spark DataFrame to Pandas DataFrame
pandas_df = df.toPandas()
# Create a custom bar chart
plt.bar(pandas_df["Name"], pandas_df["Salary"])
plt.xlabel("Name")
plt.ylabel("Salary")
plt.title("Salary by Name")
plt.show()
# Display the chart
display()
Output:
- A custom bar chart created using Matplotlib.
Example 5: Displaying a DataFrame with Filters
- Databricks:
# Display the DataFrame with filters
display(df)
- After running the above code, use the filter options in the interactive table to filter rows.
Output:
- An interactive table with filter options.
Example 6: Displaying a Line Chart
- Databricks:
# Display a line chart of Salary by Age
display(df)
- After running the above code, click on the Chart button and select Line Chart to visualize the data.
Output:
- A line chart showing
Salary
on the y-axis and Age
on the x-axis.
Example 7: Displaying HTML Content
- Databricks:
# Display HTML content
html_content = "<h1>Hello, Databricks!</h1>"
displayHTML(html_content)
Output:
- Rendered HTML content in the notebook.
4. Common Use Cases
- Exploring and analyzing data interactively in Databricks notebooks.
- Creating visualizations for data insights and reporting.
- Sharing results with stakeholders in a user-friendly format.
display()
is optimized for Databricks notebooks and works efficiently with large datasets.
- Use it judiciously for very wide DataFrames (many columns), as it processes all specified columns.
6. Key Takeaways
- Purpose: The
display()
function is used in Databricks notebooks to render DataFrames, charts, and visualizations interactively.
- Interactive Tables: Provides sorting, filtering, and pagination for DataFrames.
- Visualizations: Supports built-in charts and custom visualizations.
- Common Use Cases:
- Exploring and analyzing data interactively.
- Creating visualizations for data insights.
- Sharing results in a user-friendly format.
- Performance:
display()
is optimized for Databricks notebooks and works efficiently with large datasets.
Responses are generated using AI and may contain mistakes.