> ## Documentation Index
> Fetch the complete documentation index at: https://rajanand.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks for business leaders - Notes

# Unity Catalog

* It is a new governance solution for the databricks platform

What is Unity catalog?

* Centralized governance solution across all your workspaces on any cloud.

* Unify governance for all data and AI assets
  * files, talbes, ML models and dashboards
  * based on SQL

### Architecture

Before unity catalog:

* Workspace 1
  * user/group management
  * Hive Metastore
  * Access controls
  * Compute resources

* Workspace 2
  * user/group management
  * Hive Metastore
  * Access controls
  * Compute resources

After unity catalog:

* Unity Catalog
  * User/group management
  * UC metastores
  * Access controls

* Workspace 1
  * Compute resources

* Workspace 2
  * Compute resources

Workspaces are connected to the unity catalog.

### 3 level name space

`SELECT * FROM schema.table` ==> `SELECT * FROM catalog.schema.table`

Unity catalog hierarchy:

* UC metastore
  * Storage credentials
  * External locations
  * Share
  * Recipient
  * Catalog
    * Schema (database)
      * Table
      * View
      * Function

Unity catalog metastore !=  Hive metastore

* The hive metastore is the default metastore linked to each databricks workspace.
* UC metastore offers improved security and advanced features.
* UC metastore can have as many catalogs as desired.
* UC supports authn to the underlying storage directories through storage credentials.

Three types of identities/principles:

* Users: identified by e-mail addresses. Ex. account administrator
* Groups: grouping of users and service principles. Ex. data engineers, data scientists, data analysts
  * Nested groups is possible
* Service principals
  * Service principals are used to authenticate applications and services to access data in UC.
  * It is identified by Application id
  * It is recommended to use service principals instead of users for applications and services.

Identity Federation:

* Identities are created at Account console level. Then identities can be assigned to different workspaces.

Privileges:

* Privileges are assigned to users/groups/service principals at the catalog/schema/table level.
* Create - Create a new object
* Usage - Read the object
* Select - Read the data in the object
* Modify - Update the object
* Read Files
* Write Files
* Execute

### Security model (Unity catalog)

`GRANT <Privilege> ON <Securable object> TO <Principle>`

1. Privileges
2. Securable objects
3. Principles

Unity catalog uses different security mode than hive metastore for granting the Privileges.

* Once UC is enabled, the hive metastore is still accessible.
* UC is not a replacement for the hive metastore. It is an additional layer of security and governance on top of the hive metastore.
* Regardless of the UC metastore assigned to the workspace, the catalog named hive\_metastore provides access to the hive metastore local to that workspace.

UC features:

* Centralized governance
* Automated lineage
* Built-in data search and discovery
* No hard migration required

Account console: [https://accounts.cloud.databricks.com](https://accounts.cloud.databricks.com)

* Account admin can create UC metastores and assign them to workspaces.
* Account admin can create catalogs and assign them to UC metastores.
* Account admin can create external locations and assign them to UC metastores.
* Account admin can create storage credentials and assign them to UC metastores.
* Account admin can create shares and assign them to UC metastores.
* Account admin can create recipients and assign them to UC metastores.
