cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Reading data in Azure Databricks Delta Lake from AWS Redshift

playermanny2
New Contributor II

We have Databricks set up and running on Azure. Now we want to connect it with Redshift (AWS) to perform further downstream analysis for our redshift users.

I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but not cross cloud.

So was wondering what would be the best approach to allow redshift to read the delta lake hosted in azure. I was hoping some sort of glue catalog could be set-up and that could allow reading from redshift as an external table

Highly appreciate the help.

2 REPLIES 2

Anonymous
Not applicable

@Manny Cato​ :

To allow Redshift to read data from Delta Lake hosted on Azure, you can use AWS Glue Data Catalog as an intermediary. The Glue Data Catalog is a fully managed metadata catalog that integrates with a variety of data sources, including Delta Lake and Redshift, to enable cross-cloud data integration.

Here are the high-level steps you can follow to set up this integration:

  1. Create an AWS Glue Data Catalog in your AWS account. This will serve as the metadata repository for your data.
  2. Set up a Glue Crawler to discover the schema and metadata for your Delta Lake table(s) hosted on Azure.
  3. Configure a Glue ETL job to extract the data from your Delta Lake table(s) and load it into a Redshift cluster.
  4. Define an external schema in Redshift that points to the Glue Data Catalog.
  5. Create external tables in Redshift that reference the data in the Glue Data Catalog.
  6. Query the data in Redshift as needed.

Note that there may be additional setup required for network connectivity between Azure and AWS, such as configuring VPC peering or VPN connections.

Overall, the approach of using AWS Glue Data Catalog as an intermediary allows you to seamlessly integrate data between cloud environments, while maintaining control over your data and maintaining a consistent metadata repository.

Thank you -- would you happen to know the details on how to set-up at that crawler? There is an option for delta lake, but for the url it askes for an s3 location. Would i just plug in a azure data lake storage location, and how would authentication work?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.