Data & AI Summit Recap - Governance & Unity Catalog OSS - Part 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2024 02:17 AM - edited 06-25-2024 02:22 AM
Author: @youssefmrini
Data & AI Summit 2024 is a wrap! With 16k attendees and 40k virtual, vibrant parties, it was the most exciting edition so far. Did you miss the keynotes & all the announcements? Don’t worry, we’re giving you a full recap.
Here are the main takeaways:
-
Open Data/Table format war is over - Delta & Iceberg will work together toward unification with the Tabular.io acquisition
-
Open Governance is at hand with Unity Catalog being Open Sourced
-
Data is becoming more accessible than ever, with business users able to directly get insights or Analysts to build powerful pipelines—all assisted by AI.
-
Compound AI is the future - the single large/huge language model is a niche!
Announcement overview that we’ll detail in these next posts:
Unity Catalog is now Open Source
Under the hood, open Lakehouses are based on Open Format (Delta+Iceberg with Uniform - more detail on that in our next post). While this simplifies your data layer and doesn’t require external systems (such as hive metastore) to understand your data layout, Open Formats alone don’t provide modern requirements around security, ACL, discoverability and interoperability.
To solve these challenges, Databricks provides Unity Catalog, a unified Data+AI governance layer. However, with each vendor implementing its own catalog, the ecosystem is fragmented and not open, making it harder to build and deploy interoperable systems.
To solve this challenge and accelerate the entire ecosystem, Databricks released unitycatalog.io , an OSS catalog implementation for Data + AI. In a nutshell, Unity Catalog (Apache2, Linux Foundation project) provides:
-
Multi Engine: you can write your table with Spark and it will appear when you LIST the catalog in another engine such as duckDB
-
Data+AI: UC provides a single namespace for organizing and sharing tables, but also unstructured data, and AI assets. This means that you can LIST your tables, but also files or models from any systems supporting UC.
-
Support Iceberg REST catalog: the first release is already compatible with Iceberg catalog to access your tables
-
Supports credential vending to gate clients' access to the underlying cloud storage for tables and volumes
More details: https://www.databricks.com/blog/open-sourcing-unity-catalog, or watch Matei open source the project live (without waiting 90 days like others would do - in case you missed the troll/drama)
Join the community now: https://www.unitycatalog.io
Attribute-Based Access Control (ABAC) on Databricks UC!
One of my favorite features. Databricks now provides a very easy way to define policies based on tags. Just add tags on any column (ex: pii) and all the columns/rows will automatically be masked/ filtered. This is super easy to setup, and will be available soon!
Metrics added to Databricks UC
UC will now support metrics. You can think about a metric as a function computing some outcome, like: what are my revenues, what is my churn, what is the EMEA region etc. These definitions differ for each business. When you define them within your catalog, it’s then easier to standardize your org to make sure they all use the same definition.
Furthermore, because the Data Intelligence Platform analyzes and understands these metrics, the engine can better generate certified answers, including using BI/AI capabilities and Genie Spaces.
If a business user asks about Churn in EMEA, the engine will know what this means for your business and generate a proper SQL query to answer accordingly!
Metrics are like a user manual for your platform to understand your data!
Databricks Clean Rooms will soon be in public preview
Clean rooms make it easy for companies to collaborate on data while not directly sharing the underlying data. This provides a safe, cross-cloud, cross-data platform environment to collaborate on any data while enforcing privacy.
As with all Databricks capabilities, Clean Rooms let you share data, notebooks, code, and AI models!
Delta Sharing for query federation
You’ll be able to share your data with any other system leveraging the Delta Sharing OSS protocol. This will make it easier to build and interact with open systems supporting the Delta Sharing protocol!
That’s it for this first update on Governance & Unity Catalog.