cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Using databricks in multi-cloud, and querying data from the same instance.

Kaan
New Contributor

I'm looking for a good product to use across two clouds at once for Data Engineering, Data modeling and governance. I currently have a GCP platform, but most of my data and future data goes through Azure, and currently is then transfered to GCS/BQ.

Currently I use Looker as my main tool for visualisation, but will incorperate Power BI as a secondary tool soon. But to skip the transfere of data from Azure, to GCP and then back to Power BI I've been looking to make some changes. Here I've been looking at Databricks to be able to query both data in Bigquery and send to Looker/Power BI, but also be able to crunch data in Azure, and maybe combine with a small subset of data or pre-aggregations of data from Bigquery, if the majority of the data already reside in the Azure cloud.

Databricks talks a lot about mutlicloud, but I still havent figured out if its a good way to use only one instance to control both clouds/lakes at the same time.

Is there any good examples out there? except Databricks own multicloud solution for monitoring, as that one seems to run 3 different cloud instances of databricks, isolated from each other.

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Karl Andrén​ :

Databricks is a great option for data engineering, data modeling, and governance across multiple clouds. It supports integrations with multiple cloud providers, including Azure, AWS, and GCP, and provides a unified interface to access data from these clouds.

You can use Databricks to query data from both BigQuery and Azure data sources, and then use Looker or Power BI to visualize the results. Databricks can also be used to perform data processing and transformation on data from both clouds, allowing you to consolidate and aggregate data before sending it to Looker or Power BI.

To manage data across multiple clouds, you can use Databricks Delta Lake, which provides a unified data management layer that works across different cloud storage platforms. Delta Lake supports ACID transactions, schema enforcement, and versioning, making it ideal for managing large, complex data sets across different clouds.

Databricks also provides a number of tools for managing and monitoring data across multiple clouds. For example, the Databricks Workspace allows you to manage notebooks, data, and clusters across multiple cloud providers from a single location. Additionally, the Databricks Monitoring Console provides a unified view of cluster activity and resource usage across different clouds.

Overall, Databricks provides a powerful platform for managing data across multiple clouds, and it's worth exploring as a solution for your data engineering, data modeling, and governance needs.

View solution in original post

1 REPLY 1

Anonymous
Not applicable

@Karl Andrén​ :

Databricks is a great option for data engineering, data modeling, and governance across multiple clouds. It supports integrations with multiple cloud providers, including Azure, AWS, and GCP, and provides a unified interface to access data from these clouds.

You can use Databricks to query data from both BigQuery and Azure data sources, and then use Looker or Power BI to visualize the results. Databricks can also be used to perform data processing and transformation on data from both clouds, allowing you to consolidate and aggregate data before sending it to Looker or Power BI.

To manage data across multiple clouds, you can use Databricks Delta Lake, which provides a unified data management layer that works across different cloud storage platforms. Delta Lake supports ACID transactions, schema enforcement, and versioning, making it ideal for managing large, complex data sets across different clouds.

Databricks also provides a number of tools for managing and monitoring data across multiple clouds. For example, the Databricks Workspace allows you to manage notebooks, data, and clusters across multiple cloud providers from a single location. Additionally, the Databricks Monitoring Console provides a unified view of cluster activity and resource usage across different clouds.

Overall, Databricks provides a powerful platform for managing data across multiple clouds, and it's worth exploring as a solution for your data engineering, data modeling, and governance needs.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.