I want to use Databricks to analyze data that is stored on an on-premises SQL Server. What is the best way to bring this data into Databricks?
I tried to configure Lakehouse Federation so I could query on-premises data directly from Databricks. However, due to the different networks, I cannot connect. I assume I’d have to configure an Azure Virtual Network or something similar.
The same issue arises when using the JDBC connector. I cannot simply connect and create a DataFrame from my data because it is on a completely different network.
Let's say I have a simple query that joins 10 tables and outputs just 200 rows. Do I need to bring all those 10 tables to Databricks? Or should I move them to Data Lake Storage first and then to Databricks?
Thanks