Databricks for business leaders - Notes
Unity Catalog
- It is a new governance solution for the databricks platform
What is Unity catalog?
-
Centralized governance solution across all your workspaces on any cloud.
-
Unify governance for all data and AI assets
- files, talbes, ML models and dashboards
- based on SQL
Architecture
Before unity catalog:
-
Workspace 1
- user/group management
- Hive Metastore
- Access controls
- Compute resources
-
Workspace 2
- user/group management
- Hive Metastore
- Access controls
- Compute resources
After unity catalog:
-
Unity Catalog
- User/group management
- UC metastores
- Access controls
-
Workspace 1
- Compute resources
-
Workspace 2
- Compute resources
Workspaces are connected to the unity catalog.
3 level name space
SELECT * FROM schema.table
==> SELECT * FROM catalog.schema.table
Unity catalog hierarchy:
- UC metastore
- Storage credentials
- External locations
- Share
- Recipient
- Catalog
- Schema (database)
- Table
- View
- Function
- Schema (database)
Unity catalog metastore != Hive metastore
- The hive metastore is the default metastore linked to each databricks workspace.
- UC metastore offers improved security and advanced features.
- UC metastore can have as many catalogs as desired.
- UC supports authn to the underlying storage directories through storage credentials.
Three types of identities/principles:
- Users: identified by e-mail addresses. Ex. account administrator
- Groups: grouping of users and service principles. Ex. data engineers, data scientists, data analysts
- Nested groups is possible
- Service principals
- Service principals are used to authenticate applications and services to access data in UC.
- It is identified by Application id
- It is recommended to use service principals instead of users for applications and services.
Identity Federation:
- Identities are created at Account console level. Then identities can be assigned to different workspaces.
Privileges:
- Privileges are assigned to users/groups/service principals at the catalog/schema/table level.
- Create - Create a new object
- Usage - Read the object
- Select - Read the data in the object
- Modify - Update the object
- Read Files
- Write Files
- Execute
Security model (Unity catalog)
GRANT <Privilege> ON <Securable object> TO <Principle>
- Privileges
- Securable objects
- Principles
Unity catalog uses different security mode than hive metastore for granting the Privileges.
- Once UC is enabled, the hive metastore is still accessible.
- UC is not a replacement for the hive metastore. It is an additional layer of security and governance on top of the hive metastore.
- Regardless of the UC metastore assigned to the workspace, the catalog named hive_metastore provides access to the hive metastore local to that workspace.
UC features:
- Centralized governance
- Automated lineage
- Built-in data search and discovery
- No hard migration required
Account console: https://accounts.cloud.databricks.com
- Account admin can create UC metastores and assign them to workspaces.
- Account admin can create catalogs and assign them to UC metastores.
- Account admin can create external locations and assign them to UC metastores.
- Account admin can create storage credentials and assign them to UC metastores.
- Account admin can create shares and assign them to UC metastores.
- Account admin can create recipients and assign them to UC metastores.