cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Replay stream to migrate to liquid cluster

EndreM
New Contributor III

The documentation is sparse about how to migrate a partition table to a liquid cluster as the Alter table suggested in the documentation doesnt work when its a partitioned table.

The comments on this forum suggest replaying the stream. And this is what I try to do. We have data in our bronze table which contains json records. 1 record is transformed to an estimated 200 000 records. This should be easily handled by any compute, but even with the largest compute of 384 GB we get large amounts of garbage collection and eventually the driver restarts (or with G1GC it times out). So I have limited the transformation to only produce 100 records. This was successfull, but I see an excessive amount of commit and offset files. It produced 53 commit and offset files. This stream did work when the silver table was a partitioned table. But now I have enabled Unity Catalog and the stream fails. There is no easy way of disabling the unity catalog so I need to resolve why the stream from partitioned bronze table to liquid cluster in silver table fails. Any suggestions? I have tried maxing memory, splitting the transformations up into 5 steps and caching between each step. The transformation converts a large json file into multiple records. I tried changing to G1GC garbage collector, but only limiting the number of records being transformed worked, and this discards some records so is not a good solution.

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now