cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

On-Premise SQL Server Ingestion to Databricks Bronze Layer

Enzo_Bahrami
New Contributor III

Hello everyone!

So I want to ingest tables with schemas from the on-premise SQL server to Databricks Bronze layer with Delta Live Table and I want to do it using Azure Data Factory and I want the load to be a Snapshot batch load, not an incremental load. What are the activities I will have to use in ADF?

1 ACCEPTED SOLUTION

Accepted Solutions

daniel_sahal
Esteemed Contributor

@Parsa Bahraminejad​

You'll need to use ADF Copy Activity to fetch the data from SQL Server to ADLS (Storage) in parquet format. Then you can simply ingest the data from ADLS (Raw Layer) to bronze using autoloader or spark.read.format("parquet").

View solution in original post

6 REPLIES 6

daniel_sahal
Esteemed Contributor

@Parsa Bahraminejad​

You'll need to use ADF Copy Activity to fetch the data from SQL Server to ADLS (Storage) in parquet format. Then you can simply ingest the data from ADLS (Raw Layer) to bronze using autoloader or spark.read.format("parquet").

Thank you!

Hi, could you give me specific script to ingest the data from ADLS (.parquet) to delta table using autoloader? I am not able to do that. I tried everything I could, but I get error all the time in my notebook when I am trying to set up the script (autoloader).

Sure, first error which popped up was (code from notebook below in screenshot):

AnalysisException: Incompatible format detected. A transaction log for Delta was found at `https://teststorage.blob.core.windows.net/testtest/dbo.Animal.parquet/_delta_log`, but you are trying to read from `https://teststorage.blob.core.windows.net/testtest/dbo.Animal.parquet` using format("parquet"). You must use 'format("delta")' when reading and writing to a delta table. To disable this check, SET spark.databricks.delta.formatCheck.enabled=false To learn more about Delta, see https://docs.microsoft.com/azure/databricks/delta/index 

I tried to fix it like "delta_df = spark.read.format("parquet") --> delta_df = spark.read.format("delta")"

Its dropped:

File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function.<locals>.wrapper(*args, **kwargs) 46 start = time.perf_counter() 47 try: ---> 48 res = func(*args, **kwargs) 49 logger.log_success( 50 module_name, class_name, function_name, time.perf_counter() - start, signature 51 )

But I am not sure I do it in right way. Trying to finish my project and I made it through copy data pipeline, but I want to change into autoloader and storage data in delta table.

Thanks in advance.

Anonymous
Not applicable

Hi @Parsa Bahraminejad​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.