Databricks

andrew0117 · ‎03-21-2023

If I creat a table using the code below: CREATE TABLE IF NOT EXISTS jdbcTable

using org.apache.spark.sql.jdbc

options(

url "sql_server_url",

dbtable "sqlserverTable",

user "username",

password "password"

)

will jdbcTable always be automatically synchronized with sqlserverTable? Thanks!

pvignesh92 · ‎03-22-2023

Hi @andrew li There is a feature introduced from DBR11 where you can directly ingest the data to the table from a selected list of sources. As you are creating a table, I believe this command will create a managed table by loading the data from the sqlserver table to your default warehouse location. Please do DESCRIBE EXTENDED and check the path to see if you have data in there. If there is data, it is not going to sync automatically.

Can you try creating a View with the same way and see what happens there?

Please refer the below link

https://docs.databricks.com/external-data/jdbc.html

AFAIK, DBSQL and Delta lake supports external table on S3 layer like hive external table. The table automatically pickups the data when loaded in S3 layer.

View solution in original post

pvignesh92 · ‎03-22-2023

Hi @andrew li There is a feature introduced from DBR11 where you can directly ingest the data to the table from a selected list of sources. As you are creating a table, I believe this command will create a managed table by loading the data from the sqlserver table to your default warehouse location. Please do DESCRIBE EXTENDED and check the path to see if you have data in there. If there is data, it is not going to sync automatically.

Can you try creating a View with the same way and see what happens there?

Please refer the below link

https://docs.databricks.com/external-data/jdbc.html

AFAIK, DBSQL and Delta lake supports external table on S3 layer like hive external table. The table automatically pickups the data when loaded in S3 layer.

andrew0117 · ‎03-22-2023

yes, I thought the internal table stored at hive warehouse will not get updated automatically. But to my surprise, the table was synchronized immediately after I manually updated the source table in azure Sql server database.

pvignesh92 · ‎03-22-2023

@andrew li That's interesting. I'm curious to try this out and get an answer on how does the Databricks layer know that the source is updated? As it is pull based ingestion pattern, the trigger should be from DBx.

Databricks

Will a table backed by a SQL server database table automatically get updated if the base table in SQL server database is updated?

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs