Databricks for business leaders - Notes
Unity Catalog
- It is a new governance solution for the databricks platform
What is Unity catalog?
-
Centralized governance solution across all your workspaces on any cloud.
-
Unify governance for all data and AI assets
- files, talbes, ML models and dashboards
- based on SQL
Architecture
Before unity catalog:
-
Workspace 1
- user/group management
- Hive Metastore
- Access controls
- Compute resources
-
Workspace 2
- user/group management
- Hive Metastore
- Access controls
- Compute resources
After unity catalog:
-
Unity Catalog
- User/group management
- UC metastores
- Access controls
-
Workspace 1
- Compute resources
-
Workspace 2
- Compute resources
Workspaces are connected to the unity catalog.
3 level name space
SELECT * FROM schema.table
==> SELECT * FROM catalog.schema.table
Unity catalog hierarchy:
UC metastore => catalog => schema(database) => It contains table, view, function.
Unity catalog metastore != Hive metastore
- The hive metastore is the default metastore linked to each databricks workspace.
- UC metastore offers improved security and advanced features.
- UC metastore can have as many catalogs as desired.
- UC supports authn to the underlying storage directories through storage credentials.