Stay up-to-date with the latest announcements from Databricks. Learn about product updates, new features, and important news that impact your data analytics workflow.
Data & AI Summit 2024 is a wrap! With 16k attendees and 40k virtual, vibrant parties, it was the most exciting edition so far. Did you miss the keynotes & all the announcements? Don’t worry, we’re giving you a full recap.
Here are the main takeaways:
Open Data/Table format war is over - Delta & Iceberg will work together toward unification with the Tabular.io acquisition
Open Governance is at hand with Unity Catalog being Open Sourced
Data is becoming more accessible than ever, with business users able to directly get insights or Analysts to build powerful pipelines—all assisted by AI.
Compound AI is the future - the single large/huge language model is a niche!
Announcement overview that we’ll detail in these next posts:
Unity Catalog is now Open Source
Under the hood, open Lakehouses are based on Open Format (Delta+Iceberg with Uniform - more detail on that in our next post). While this simplifies your data layer and doesn’t require external systems (such as hive metastore) to understand your data layout, Open Formats alone don’t provide modern requirements around security, ACL, discoverability and interoperability.
To solve these challenges, Databricks provides Unity Catalog, a unified Data+AI governance layer. However, with each vendor implementing its own catalog, the ecosystem is fragmented and not open, making it harder to build and deploy interoperable systems.
To solve this challenge and accelerate the entire ecosystem, Databricks released unitycatalog.io , an OSS catalog implementation for Data + AI. In a nutshell, Unity Catalog (Apache2, Linux Foundation project) provides:
Multi Engine: you can write your table with Spark and it will appear when you LIST the catalog in another engine such as duckDB
Data+AI: UC provides a single namespace for organizing and sharing tables, but also unstructured data, and AI assets. This means that you can LIST your tables, but also files or models from any systems supporting UC.
Support Iceberg REST catalog: the first release is already compatible with Iceberg catalog to access your tables
Supports credentialvending to gate clients' access to the underlying cloud storage for tables and volumes
Attribute-Based Access Control (ABAC) on Databricks UC!
One of my favorite features. Databricks now provides a very easy way to define policies based on tags. Just add tags on any column (ex: pii) and all the columns/rows will automatically be masked/ filtered. This is super easy to setup, and will be available soon!
Metrics added to Databricks UC
UC will now support metrics. You can think about a metric as a function computing some outcome, like: what are my revenues, what is my churn, what is the EMEA region etc. These definitions differ for each business. When you define them within your catalog, it’s then easier to standardize your org to make sure they all use the same definition.
Furthermore, because the Data Intelligence Platform analyzes and understands these metrics, the engine can better generate certified answers, including using BI/AI capabilities and Genie Spaces. If a business user asks about Churn in EMEA, the engine will know what this means for your business and generate a proper SQL query to answer accordingly!
Metrics are like a user manual for your platform to understand your data!
Databricks Clean Rooms will soon be in public preview
Clean rooms make it easy for companies to collaborate on data while not directly sharing the underlying data. This provides a safe, cross-cloud, cross-data platform environment to collaborate on any data while enforcing privacy.
As with all Databricks capabilities, Clean Rooms let you share data, notebooks, code, and AI models!
Delta Sharing for query federation
You’ll be able to share your data with any other system leveraging the Delta Sharing OSS protocol. This will make it easier to build and interact with open systems supporting the Delta Sharing protocol!
That’s it for this first update on Governance & Unity Catalog.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.