โ05-24-2023 05:08 AM
While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs. It is taking too much time even after doing several trials with compactions. I cant use the streaming delta tables as I simply want to store extracted image bytes to the delta table and simply complete the inference task for object detection and other transformations. I have even tried to drop the lengthy data columns but did not make any difference. 1 Driver
16 GB Memory, 4 Cores 11.3.x-gpu-ml-scala2.12, g4dn.xlarge is the configuration of my current cluster.
11.3.x-gpu-ml-scala2.12
โ05-25-2023 12:52 AM
can you check the spark UI, to see where the time is spent?
It can be a join, udf, ...
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.