While trying to save the spark dataframe to delta table is taking too long
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-24-2023 05:08 AM
While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs. It is taking too much time even after doing several trials with compactions. I cant use the streaming delta tables as I simply want to store extracted image bytes to the delta table and simply complete the inference task for object detection and other transformations. I have even tried to drop the lengthy data columns but did not make any difference. 1 Driver
16 GB Memory, 4 Cores 11.3.x-gpu-ml-scala2.12, g4dn.xlarge is the configuration of my current cluster.
11.3.x-gpu-ml-scala2.12
- Labels:
-
Dataframe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-25-2023 12:52 AM
can you check the spark UI, to see where the time is spent?
It can be a join, udf, ...

