cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks, migrating delta table data with CDF on.

grazie
Contributor

We are on Azure Databricks over ADLS Gen2 and have a set of tables and workflows that process data from and between those tables, using change data feeds. (We are not yet using Unity Catalog, and also not Hive metastore, just accessing delta tables from storage). 

Now we are required by IT to move data to a different Azure storage account, and I am faced with a conundrum. Delta documentation states that it is important to maintain timestamps if you want to preserve CDF. https://docs.delta.io/latest/delta-faq.html#can-i-copy-my-delta-lake-table-to-another-location

On the other hand, Azure support rep states that the only way they can think of is to manually use Storage Explorer something like this:


On Storage Explorer

Export the table as CSV, with option to export the timestamp

Import the CSV into the the new Storage Account changing the name of the Timestamp to something else, like Timestamp2.

Which will add a new field in the new table and also consumes more space because of it.

I need to try that recipe to see if it actually makes sense, but in reality we have many tables and such a manual process would be inefficient and error-prone. 

Maybe someone here knows a better way? 

2 REPLIES 2

grazie
Contributor

As it turns out, due to a misunderstanding, the responses from Azure support were answering a slightly different question (about Azure Table Storage instead of Delta Tables on Blob/ADLS Gen2), so we'll try there again. However, still interested in ideas/experiences from this community. 

Also, I couldn't find a way to fix the original post, or delete and recreate without the confusing details that turned out to be about Azure Table Storage so sorry about that.

Thank you for a comprehensive reply. I have comments to each point below. For now, we have actually decided to put off the move so it is not an imminent issue, though it will come back.

 

The basic issue is related to delta tables, and the fact that timestamps on the files themselves (as opposed to file contents that include timestamps)

 

- Azure Databricks Mount Points
  - This only affects how data is accessed through Databricks, and doesn't deal with the actual moving of data between accounts.

 

- Azure Service Principal and OAuth 2.0
  - This is indeed how we connect to the storage. But again, this doesn't address the issue of moving data between accounts.

 

- Automated Data Migration
  - This addresses the data movement issue, and could be interesting to learn more about. It all depends on whether "allow you to maintain timestamps and minimize manual intervention" is true for the specific timestamps that delta consumer depends on. If you have more specifics on this it would be helpful to anyone with similar issue. I found the following statement which seems to suggest this is not simple: https://learn.microsoft.com/en-us/answers/questions/656437/is-it-possible-to-edit-the-lastmodified-a...

 

- Delta Lake Table Copy
  - COPY INTO "Loads data from a file location into a Delta table.", and accepts the formats CSV, JSON, AVRO, ORC, PARQUET, TEXT, BINARYFILE. This does not seem to address the basic issue
  - We were also considering deep clone, but according to Databricks documentation "A cloned table has an independent history from its source table. Time travel queries on a cloned table do not work with the same inputs as they work on its source table."

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group