- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-26-2022 01:51 PM
Hello, I'm very much new to Databricks and I'm finding it hard if it's right solution for our needs.
Requirement:
We have multiple data sources spread across AWS S3 and Postgres. We need a common SQL endpoint that can be used to write queries to join data across these different stores.
For example:
We have a BI tool that connects to data sources over JDBC. However this BI tool cannot "join" the data across multiple data sources. Can I use Databricks to solve this problem?
In my BI tool, I should be able to connect to Databricks over JDBC and write a SQL query like
SELECT * FROM
S3.Schema1.Table1 AS s,
Postgres.Schema2.Table2 AS p
WHERE s.x = p.y;And this new Databrick SQL endpoint should be always be available 24*7 just like a normal DB instance. Is this possible?
PS: I'm aware I can "import" Postgres data into S3 and then make joins. But we need real-time joins without importing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-26-2022 02:46 PM
Yes you can. You can ETL to data lake storage register your tables to metastore and register your SELECT with JOINS as VIEW or even better create additionally jobs and store your JOINED table. From BI you can connect to databricks sql or to data lake storage.
My blog: https://databrickster.medium.com/