cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to write Change Data from Delta Lake to aws dynamodb

hari
Contributor

Is there some direct way to write data from DeltaLake to AWS DynamoDB.

If there is none, Is there any way to do the same.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III
8 REPLIES 8

AmanSehgal
Honored Contributor III

This is something that might be achievable via delta live tables and delta sharing in future.

But for now, what you can do is track changes in delta tables when you do an upsert.

To know more you'll have to describe how your delta tables are getting updated? What's the source, frequency of data operations in delta lake?

hari
Contributor

The data will be updated as batches nightly

AmanSehgal
Honored Contributor III

A way to track INSERT/UPDATE/DELETE would be to split out your upsert process as a 3 part process.

Step1: Include an update timestamp field in your target table.

Step2: In your incoming refresh dataset, add a update timestamp column and set it to current timestamp.

UPDATE count

Step3: Run upsert job with just whenMatchedUpdate clause. Count rows that have current timestamp for update.

Also, get an extract of the rows with current timestamp - these are the rows that you'll UPDATE in DyanmoDB

INSERT count

Step5: Get the row count of table - RC1.

Step6: Run upsert job with just whenNotMatchedInsert clause. Get row count of table after insert - RC2

INSERT count = RC2-RC1

Get an extract of the rows with current timestamp and perform a left-anti join with UPDATE df - these are the rows that you'll INSERT in DyanmoDB

DELETE count

Step7: Get the row count of table - RC1.

Step8: Run upsert job with just whenNotMatchedDelete clause. Get row count of table after insert - RC2

DELETE count = RC1-RC2

From your source data, there would be a way to identify the records that are to be deleted. Grab those indices/primary keys to delete records.

-werners-
Esteemed Contributor III

there is change data feed on delta tables:

https://docs.databricks.com/delta/delta-change-data-feed.html

hari
Contributor

@Aman Sehgal​  Thanks for the reply

Are you suggesting collecting the update/delete set and then manually writing to dynamodb?

hari
Contributor

@Werner Stinckens​ Thanks for the reply

I have gone through the CDF docs. But was wondering if there is some connector available to write data from Delta Table directly to dynamodb.

Eg: Read data from Delta table with change data mode on and write to dynamodb directly.

Or are you suggesting to get the Change Data and then write to DynamoDB

-werners-
Esteemed Contributor III

I suggest fetching the change data and write that to dynamodb indeed.

jose_gonzalez
Moderator
Moderator

Hi @Harikrishnan P H​ ,

Did @Werner Stinckens​ reply help you to resolved your issue? if yes, please mark it as best. if not, please let us know.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.