cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Tables: bulk import of historical data?

sarguido
New Contributor II

Hello! I'm very new to working with Delta Live Tables and I'm having some issues. I'm trying to import a large amount of historical data into DLT. However letting the DLT pipeline run forever doesn't work with the database we're trying to import from - despite connection parameters being set, there are still timeout issues.

I'm trying to solve the problem by bulk importing historical data in a one time transfer into Delta Lake, and then converting those tables to DLT using the CONVERT TO DELTA sql statement. However, when I try to hook up my pipeline to those tables, I get an error: Could not materialize 'schema.table_name' because a MANAGED table already exists with that name. The table already exists because I want to seed it with a large amount of historical data. Is there a way to make this work? Do I need to change some configuration somewhere?

Another alternative I thought of was importing everything into Delta Lake and then doing a join or union somewhere in the DLT pipeline, but then I'll need to hard code in the date that I want DLT to start ingesting data at (since the rest of the data will already exist in Delta Lake).

An additional alternative could be just maxing out compute and using photon acceleration for the initial import. Perhaps that is what I should be doing here instead?

Looking for some input from experts/folks with lots of DLT experience with regard to best practices. Thanks for reading!

5 REPLIES 5

Anonymous
Not applicable

@Sarah Guido​ :

It sounds like you are on the right track with importing the historical data into Delta Lake first and then converting the tables to Delta Live Tables using the CONVERT TO DELTA SQL statement. The error message you are seeing about a managed table already existing with that name is because you have already created a managed Delta table with that name in Delta Lake and now you are trying to create a Delta Live Table with the same name.

To avoid this error, you could try renaming the managed Delta table to something else before converting it to a Delta Live Table. You can do this using the ALTER TABLE SQL statement:

ALTER TABLE schema.table_name RENAME TO schema.new_table_name;

Once you have renamed the managed Delta table, you can use the CONVERT TO DELTA statement to create a Delta Live Table with the original name.

Regarding your other alternatives, using Photon acceleration for the initial import could be a good option if you have the resources to do so. This would allow you to import the data quickly and efficiently, and then you could start ingesting new data into DLT from that point forward.

Using a join or union in the DLT pipeline to combine the historical data in Delta Lake with new data ingested into DLT could also work, but as you mentioned, you would need to hard code the date at which DLT should start ingesting data. This could be problematic if you need to make changes to that date later on.

Overall, importing the historical data into Delta Lake first and then converting to Delta Live Tables is likely the best approach for your scenario. Renaming the managed Delta table before converting it to a Delta Live Table should allow you to avoid the error message you are seeing.

Hope this helps! Happy learning.

Anonymous
Not applicable

Hi @Sarah Guido​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

sarguido
New Contributor II

Unfortunately the answer here didn't help me - I spoke to a friend who works at Databricks to get an answer.

Hi there. Did you get a good reply from your DB friend?

aprakash3
New Contributor II

Hi,

Is your problem resolved. I am also facing the same issues.Can you help me here.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group