Unity Catalog

  • It is a new governance solution for the databricks platform

What is Unity catalog?

  • Centralized governance solution across all your workspaces on any cloud.

  • Unify governance for all data and AI assets

    • files, talbes, ML models and dashboards
    • based on SQL

Architecture

Before unity catalog:

  • Workspace 1

    • user/group management
    • Hive Metastore
    • Access controls
    • Compute resources
  • Workspace 2

    • user/group management
    • Hive Metastore
    • Access controls
    • Compute resources

After unity catalog:

  • Unity Catalog

    • User/group management
    • UC metastores
    • Access controls
  • Workspace 1

    • Compute resources
  • Workspace 2

    • Compute resources

Workspaces are connected to the unity catalog.

3 level name space

SELECT * FROM schema.table ==> SELECT * FROM catalog.schema.table

Unity catalog hierarchy:

UC metastore => catalog => schema(database) => It contains table, view, function.

Unity catalog metastore != Hive metastore

  • The hive metastore is the default metastore linked to each databricks workspace.
  • UC metastore offers improved security and advanced features.
  • UC metastore can have as many catalogs as desired.
  • UC supports authn to the underlying storage directories through storage credentials.