Unity Catalog

  • It is a new governance solution for the databricks platform

What is Unity catalog?

  • Centralized governance solution across all your workspaces on any cloud.

  • Unify governance for all data and AI assets

    • files, talbes, ML models and dashboards
    • based on SQL

Architecture

Before unity catalog:

  • Workspace 1

    • user/group management
    • Hive Metastore
    • Access controls
    • Compute resources
  • Workspace 2

    • user/group management
    • Hive Metastore
    • Access controls
    • Compute resources

After unity catalog:

  • Unity Catalog

    • User/group management
    • UC metastores
    • Access controls
  • Workspace 1

    • Compute resources
  • Workspace 2

    • Compute resources

Workspaces are connected to the unity catalog.

3 level name space

SELECT * FROM schema.table ==> SELECT * FROM catalog.schema.table

Unity catalog hierarchy:

  • UC metastore
    • Storage credentials
    • External locations
    • Share
    • Recipient
    • Catalog
      • Schema (database)
        • Table
        • View
        • Function

Unity catalog metastore != Hive metastore

  • The hive metastore is the default metastore linked to each databricks workspace.
  • UC metastore offers improved security and advanced features.
  • UC metastore can have as many catalogs as desired.
  • UC supports authn to the underlying storage directories through storage credentials.

Three types of identities/principles:

  • Users: identified by e-mail addresses. Ex. account administrator
  • Groups: grouping of users and service principles. Ex. data engineers, data scientists, data analysts
    • Nested groups is possible
  • Service principals
    • Service principals are used to authenticate applications and services to access data in UC.
    • It is identified by Application id
    • It is recommended to use service principals instead of users for applications and services.

Identity Federation:

  • Identities are created at Account console level. Then identities can be assigned to different workspaces.

Privileges:

  • Privileges are assigned to users/groups/service principals at the catalog/schema/table level.
  • Create - Create a new object
  • Usage - Read the object
  • Select - Read the data in the object
  • Modify - Update the object
  • Read Files
  • Write Files
  • Execute

Security model (Unity catalog)

GRANT <Privilege> ON <Securable object> TO <Principle>

  1. Privileges
  2. Securable objects
  3. Principles

Unity catalog uses different security mode than hive metastore for granting the Privileges.

  • Once UC is enabled, the hive metastore is still accessible.
  • UC is not a replacement for the hive metastore. It is an additional layer of security and governance on top of the hive metastore.
  • Regardless of the UC metastore assigned to the workspace, the catalog named hive_metastore provides access to the hive metastore local to that workspace.

UC features:

  • Centralized governance
  • Automated lineage
  • Built-in data search and discovery
  • No hard migration required

Account console: https://accounts.cloud.databricks.com

  • Account admin can create UC metastores and assign them to workspaces.
  • Account admin can create catalogs and assign them to UC metastores.
  • Account admin can create external locations and assign them to UC metastores.
  • Account admin can create storage credentials and assign them to UC metastores.
  • Account admin can create shares and assign them to UC metastores.
  • Account admin can create recipients and assign them to UC metastores.