Announced at the Data + AI Summit in June 2023, Lakehouse Federation in Databricks is a groundbreaking new capability that allows you to query data across external data sources - including Snowflake, Synapse, many others and even Databricks itself - without having to move or copy the data. This is done by using Databricks’ Unity Catalog, which provides a unified metadata layer for all of your data.
Lakehouse Federation is a game-changer for data teams, as it breaks down the silos that have traditionally kept data locked away in different systems. With Lakehouse Federation, you can finally access all of your data in one place, making it easier to get the insights you need to make better business decisions.
As always though, not one solution is a silver bullet to your data integration and querying needs. See below for when Federation is a good fit, and for when you’d prefer to bring your data into your solution and process as part of your lakehouse platform pipelines.
A few of the benefits of using lakehouse federation in Databricks are:
If you are looking for a way to improve the way you access and manage your data across your analytics estate, then Lakehouse Federation in Databricks is a top choice.
Whilst Lakehouse Federation is a powerful tool, it is not a good fit for all use cases. There are some specific examples of use cases when lakehouse federation is not a good choice:
Therefore, whilst Lakehouse Federation is a great option for certain use cases as highlighted above, it’s not a silver bullet for all scenarios. Consider it an augmentation of your analytics capability that allows for additional use cases that need agility and direct source access for creating a holistic view of your data estate, all controlled through one governance layer.
With that in mind, let’s get started on setting up your first federated Lakehouse in Databricks using Lakehouse Federation.
For this example, we will be using a familiar sample database - Adventure Works - running on an Azure SQL Database. We will be walking you through how to set up your connection to Azure SQL and how to add it as a foreign catalog inside Databricks.
To set up lakehouse federation in Databricks, you will need the following prerequisites:
Setting up federation is essentially a three step process, as follows: –
We are going to use Azure SQL Database as the test data source with the sample database AdventureWorksLT database already installed and ready to query:
Example query on the source database
We want to add this database as a foreign catalog in Databricks to be able to query it alongside other data sources. To connect to the database, we need a username, password and hostname, obtained from my Azure SQL Instance.
With these details ready, we can now go into Databricks and add the connection there as our first step.
First, expand the Catalog view, go to Connections and click “Create Connection”:
To add your new connection, give it a name, choose your connection type and then add the relevant login details for that data source:
Test your connection and verify all is well. From there, go back to the Catalog view and go to Create Catalog:
From there, populate the relevant details (choosing Type as “Foreign”), including choosing the connection you created in the first step, and specifying the database you want to add as an external catalog:
Once added, you can have the option of adding the relevant user permissions to the objects here, all governed by Unity Catalog (Skipped this in this article as there are no other users using this database):
Our external catalog is now available for querying as you would any other catalog inside Databricks, bringing our broader data estate into our lakehouse:
We can now access our federated Azure SQL Database as normal, straight from our Databricks SQL Warehouse:
And query it as we would any other object:
Or even join it to a local delta table inside our Unity Catalog:
What we’ve shown here is just scratching the surface of what Lakehouse Federation can do with a simple connection and query. By leveraging this offering, combined with the governance and capabilities of Unity Catalog, you can extend the range of your lakehouse estate, ensuring consistent permissions and controls across all of your data sources and thus enabling a plethora of new use cases and opportunities.
Introducing Lakehouse Federation
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.