Reference
Spark: display() function
The display()
function is commonly used in Databricks notebooks to render DataFrames, charts, and other visualizations in an interactive and user-friendly format. It is not a native Spark function but is specific to Databricks. The display()
function provides a rich set of features for data exploration, including tabular views, charts, and custom visualizations.
1. Syntax
- Databricks:
- PySpark (outside Databricks):
- Use
df.show()
ordf.toPandas()
for similar functionality.
- Use
2. Key Features
- Interactive Tables: Displays DataFrames in an interactive table with sorting, filtering, and pagination.
- Visualizations: Supports built-in charts (e.g., bar charts, line charts, pie charts) for data exploration.
- Custom Visualizations: Allows custom visualizations using libraries like Matplotlib, Plotly, or Seaborn.
- Rich Output: Can display images, HTML, and other rich content.
3. Examples
Example 1: Displaying a DataFrame as a Table
- Databricks:
Output:
- An interactive table with columns
Name
,Age
, andSalary
.
Example 2: Displaying a Chart
- Databricks:
- After running the above code, click on the Chart button in the Databricks notebook to visualize the data as a bar chart.
Output:
- A bar chart showing
Salary
on the y-axis andName
on the x-axis.
Example 3: Displaying a Pie Chart
- Databricks:
- After running the above code, click on the Chart button and select Pie Chart to visualize the data.
Output:
- A pie chart showing the distribution of
Age
.
Example 4: Displaying Custom Visualizations
- Databricks:
Output:
- A custom bar chart created using Matplotlib.
Example 5: Displaying a DataFrame with Filters
- Databricks:
- After running the above code, use the filter options in the interactive table to filter rows.
Output:
- An interactive table with filter options.
Example 6: Displaying a Line Chart
- Databricks:
- After running the above code, click on the Chart button and select Line Chart to visualize the data.
Output:
- A line chart showing
Salary
on the y-axis andAge
on the x-axis.
Example 7: Displaying HTML Content
- Databricks:
Output:
- Rendered HTML content in the notebook.
4. Common Use Cases
- Exploring and analyzing data interactively in Databricks notebooks.
- Creating visualizations for data insights and reporting.
- Sharing results with stakeholders in a user-friendly format.
5. Performance Considerations
display()
is optimized for Databricks notebooks and works efficiently with large datasets.- Use it judiciously for very wide DataFrames (many columns), as it processes all specified columns.
6. Key Takeaways
- Purpose: The
display()
function is used in Databricks notebooks to render DataFrames, charts, and visualizations interactively. - Interactive Tables: Provides sorting, filtering, and pagination for DataFrames.
- Visualizations: Supports built-in charts and custom visualizations.
- Common Use Cases:
- Exploring and analyzing data interactively.
- Creating visualizations for data insights.
- Sharing results in a user-friendly format.
- Performance:
display()
is optimized for Databricks notebooks and works efficiently with large datasets.