<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Can I use Databricks to join data from S3 and Postgres using SQL? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/can-i-use-databricks-to-join-data-from-s3-and-postgres-using-sql/m-p/30886#M22444</link>
    <description>&lt;P&gt;Yes you can. You can ETL to data lake storage register your tables to metastore and register your SELECT with JOINS as VIEW or even better create additionally jobs and store your JOINED table. From BI you can connect to databricks sql or to data lake storage.&lt;/P&gt;</description>
    <pubDate>Wed, 26 Jan 2022 22:46:48 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2022-01-26T22:46:48Z</dc:date>
    <item>
      <title>Can I use Databricks to join data from S3 and Postgres using SQL?</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-use-databricks-to-join-data-from-s3-and-postgres-using-sql/m-p/30885#M22443</link>
      <description>&lt;P&gt;Hello, I'm very much new to Databricks and I'm finding it hard if it's right solution for our needs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Requirement:&lt;/P&gt;&lt;P&gt;We have multiple data sources spread across AWS S3 and Postgres. We need a common SQL endpoint that can be used to write queries to join data across these different stores.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For example:&lt;/P&gt;&lt;P&gt;We have a BI tool that connects to data sources over JDBC. However this BI tool &lt;B&gt;cannot "join"&lt;/B&gt; the data across multiple data sources. Can I use Databricks to solve this problem?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In my BI tool, I should be able to connect to Databricks over JDBC and write a SQL query like&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SELECT * FROM 
S3.Schema1.Table1 AS s,
Postgres.Schema2.Table2 AS p
WHERE s.x = p.y;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And this new Databrick SQL endpoint should be always be available 24*7 just like a normal DB instance. Is this possible?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PS: I'm aware I can "import" Postgres data into S3 and then make joins. But we need real-time joins without importing.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jan 2022 21:51:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-use-databricks-to-join-data-from-s3-and-postgres-using-sql/m-p/30885#M22443</guid>
      <dc:creator>venkyv</dc:creator>
      <dc:date>2022-01-26T21:51:22Z</dc:date>
    </item>
    <item>
      <title>Re: Can I use Databricks to join data from S3 and Postgres using SQL?</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-use-databricks-to-join-data-from-s3-and-postgres-using-sql/m-p/30886#M22444</link>
      <description>&lt;P&gt;Yes you can. You can ETL to data lake storage register your tables to metastore and register your SELECT with JOINS as VIEW or even better create additionally jobs and store your JOINED table. From BI you can connect to databricks sql or to data lake storage.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jan 2022 22:46:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-use-databricks-to-join-data-from-s3-and-postgres-using-sql/m-p/30886#M22444</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-01-26T22:46:48Z</dc:date>
    </item>
  </channel>
</rss>

