Feature | Data Warehouse | Data Lakehouse |
---|---|---|
Data Types | Structured only | Structured, semi-structured, unstructured |
Cost | Expensive storage | Cost-effective (object storage) |
Performance | Optimized for SQL | Optimized for SQL + ML + Streaming |
Schema | Schema-on-write | Schema-on-read & schema enforcement |
RESTORE TABLE
).CHECK constraints
).Table Type | Use Case | Source | Example |
---|---|---|---|
Bronze | Raw ingestion | Kafka, Files | sales_raw |
Silver | Cleaning & Enrichment | Bronze | sales_cleaned |
Gold | Reporting & Aggregation | Silver | sales_monthly_metrics |
Feature | All-Purpose Cluster | Jobs Cluster |
---|---|---|
Use Case | Interactive (Notebooks) | Scheduled/Automated Jobs |
Cost | More expensive (always-on) | Cheaper (terminates after job) |
Access | Multiple users | Single job execution |
Management | Manual start/stop | Auto-terminates |
DBR 10.4 LTS
, DBR 11.3
(latest).Compute
→ Filter by “Can Attach To”.GET /api/2.0/clusters/list
→ Check can_manage
flag.DELETE /api/2.0/clusters/terminate
).%python
, %sql
, %scala
, %r
.%run
Command:
dbutils.notebook.run()
(for jobs):
.dbc
or .ipynb
format.Can View
, Can Edit
, Can Run
.git checkout feature-branch
.main
.git clone
, pull
, push
, commit
, branch
.git rebase
, submodules
.