cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Allowing Azure Databricks to query a local/private database

AskMe55
New Contributor

Hello,

I am trying to set up a simple machine learning pipeline where I want to generate example data on my computer, save this data into a MariaDB database on my computer, and then allow Azure Databricks to access my local database to train a model with it, etc. What is the best and preferred way to allow Azure Databricks to access my database since I don't want to migrate my data into Azure.

Thanks!

1 REPLY 1

-werners-
Esteemed Contributor III

To do that, Databricks needs access to your local LAN.
This means configuring network security groups or a firewall.
Setting up a private endpoint is also a good idea.
You also have to make sure that your databricks cluster can connect to your on-prem database.

Now, assuming all this is working, what will actually happen is this:
databricks connects to your database, fetches the data you defined (in your spark script) and moves it to the databricks workers which reside in the cloud.
Databricks does the transformationts etc and finally moves the data back to your on-prem system.
Depending on the sizing of the underlying on-prem database this can take a while.
So your data will be sent to the cloud, no matter what (best case = worker RAM only).
I am not sure if that is what you want.
Most of the times, on-prem data is copied to cloud storage, and processed from there.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now