While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs. It is taking too much time even after doing several trials with compactions. I cant use the streaming delta tables as I simply want to store extracted image bytes to the delta table and simply complete the inference task for object detection and other transformations. I have even tried to drop the lengthy data columns but did not make any difference. 1 Driver
16 GB Memory, 4 Cores 11.3.x-gpu-ml-scala2.12, g4dn.xlarge is the configuration of my current cluster.
11.3.x-gpu-ml-scala2.12