<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reading data in Azure Databricks Delta Lake from AWS Redshift in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-data-in-azure-databricks-delta-lake-from-aws-redshift/m-p/5274#M1737</link>
    <description>&lt;P&gt;@Manny Cato​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To allow Redshift to read data from Delta Lake hosted on Azure, you can use AWS Glue Data Catalog as an intermediary. The Glue Data Catalog is a fully managed metadata catalog that integrates with a variety of data sources, including Delta Lake and Redshift, to enable cross-cloud data integration.&lt;/P&gt;&lt;P&gt;Here are the high-level steps you can follow to set up this integration:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Create an AWS Glue Data Catalog in your AWS account. This will serve as the metadata repository for your data.&lt;/LI&gt;&lt;LI&gt;Set up a Glue Crawler to discover the schema and metadata for your Delta Lake table(s) hosted on Azure.&lt;/LI&gt;&lt;LI&gt;Configure a Glue ETL job to extract the data from your Delta Lake table(s) and load it into a Redshift cluster.&lt;/LI&gt;&lt;LI&gt;Define an external schema in Redshift that points to the Glue Data Catalog.&lt;/LI&gt;&lt;LI&gt;Create external tables in Redshift that reference the data in the Glue Data Catalog.&lt;/LI&gt;&lt;LI&gt;Query the data in Redshift as needed.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Note that there may be additional setup required for network connectivity between Azure and AWS, such as configuring VPC peering or VPN connections.&lt;/P&gt;&lt;P&gt;Overall, the approach of using AWS Glue Data Catalog as an intermediary allows you to seamlessly integrate data between cloud environments, while maintaining control over your data and maintaining a consistent metadata repository.&lt;/P&gt;</description>
    <pubDate>Thu, 27 Apr 2023 04:52:21 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2023-04-27T04:52:21Z</dc:date>
    <item>
      <title>Reading data in Azure Databricks Delta Lake from AWS Redshift</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-data-in-azure-databricks-delta-lake-from-aws-redshift/m-p/5273#M1736</link>
      <description>&lt;P&gt;We have Databricks set up and running on Azure. Now we want to connect it with Redshift (AWS) to perform further downstream analysis for our redshift users.&lt;/P&gt;&lt;P&gt;I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but not cross cloud.&lt;/P&gt;&lt;P&gt;So was wondering what would be the best approach to allow redshift to read the delta lake hosted in azure.  I was hoping some sort of  glue catalog could be set-up and that could allow reading from redshift as an external table&lt;/P&gt;&lt;P&gt;Highly appreciate the help.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Apr 2023 19:51:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-data-in-azure-databricks-delta-lake-from-aws-redshift/m-p/5273#M1736</guid>
      <dc:creator>playermanny2</dc:creator>
      <dc:date>2023-04-25T19:51:23Z</dc:date>
    </item>
    <item>
      <title>Re: Reading data in Azure Databricks Delta Lake from AWS Redshift</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-data-in-azure-databricks-delta-lake-from-aws-redshift/m-p/5274#M1737</link>
      <description>&lt;P&gt;@Manny Cato​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To allow Redshift to read data from Delta Lake hosted on Azure, you can use AWS Glue Data Catalog as an intermediary. The Glue Data Catalog is a fully managed metadata catalog that integrates with a variety of data sources, including Delta Lake and Redshift, to enable cross-cloud data integration.&lt;/P&gt;&lt;P&gt;Here are the high-level steps you can follow to set up this integration:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Create an AWS Glue Data Catalog in your AWS account. This will serve as the metadata repository for your data.&lt;/LI&gt;&lt;LI&gt;Set up a Glue Crawler to discover the schema and metadata for your Delta Lake table(s) hosted on Azure.&lt;/LI&gt;&lt;LI&gt;Configure a Glue ETL job to extract the data from your Delta Lake table(s) and load it into a Redshift cluster.&lt;/LI&gt;&lt;LI&gt;Define an external schema in Redshift that points to the Glue Data Catalog.&lt;/LI&gt;&lt;LI&gt;Create external tables in Redshift that reference the data in the Glue Data Catalog.&lt;/LI&gt;&lt;LI&gt;Query the data in Redshift as needed.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Note that there may be additional setup required for network connectivity between Azure and AWS, such as configuring VPC peering or VPN connections.&lt;/P&gt;&lt;P&gt;Overall, the approach of using AWS Glue Data Catalog as an intermediary allows you to seamlessly integrate data between cloud environments, while maintaining control over your data and maintaining a consistent metadata repository.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Apr 2023 04:52:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-data-in-azure-databricks-delta-lake-from-aws-redshift/m-p/5274#M1737</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-27T04:52:21Z</dc:date>
    </item>
    <item>
      <title>Re: Reading data in Azure Databricks Delta Lake from AWS Redshift</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-data-in-azure-databricks-delta-lake-from-aws-redshift/m-p/5275#M1738</link>
      <description>&lt;P&gt;Thank you -- would you happen to know the details on how to set-up at that crawler? There is an option for delta lake, but for the url it askes for an s3 location. Would i just plug in a azure data lake storage location, and how would authentication work?&lt;/P&gt;</description>
      <pubDate>Thu, 27 Apr 2023 13:31:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-data-in-azure-databricks-delta-lake-from-aws-redshift/m-p/5275#M1738</guid>
      <dc:creator>playermanny2</dc:creator>
      <dc:date>2023-04-27T13:31:00Z</dc:date>
    </item>
  </channel>
</rss>

