AWS Databricks- Out of Memory issue in Delta live tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 07:15 AM
I have been using Delta live tables more than a year and have implemented good number of DLT pipelines ingesting the data from S3 bucket using the SQS. One of my pipelines process large volume of data. The DLT pipeline reads the data using CloudFiles scripted in SQL language. It is straight forward without any inferSchemaType true. So its accepts all the source data as is into the RAW layer. As part of performance test, we are loading large volume of data and the Worker and driver are configured with 32GB 16 core with worker nodes ranging from 2 to 12. While ingesting the large volume of data, the pipeline never failed but used to take long time to process the data. But recently, it started throwing OOM error while ingesting 500K records with each file size less 100KB.
Our pipeline is running in PREVIEW mode with Unity catalog implemented. Let me know if any has any solution..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 07:43 AM
Hi @Yaadhudbe,
We would need to review your DLT setup, cluster settings and spark processing to better understand the OOM errors and possible suggestions to mitigate the issue.
I suggest to file a case with us to conduct a proper investigation.

