โ03-10-2023 12:26 AM
My job after doing all the processing in Databricks layer writes the final output to Snowflake tables using df.write API and using Spark snowflake connector. I often see that even a small dataset (16 partitions and 20k rows in each partition) takes around 2 minutes to write. Is there any way, the write can be optimized?
โ04-03-2023 04:32 AM
There are few options I tried out which had given me a better performance.
โ03-10-2023 02:52 AM
afaik the spark connector is already optimized. Can you try to change the partitioning of your dataset? for bulk loading larger files are better.
โ03-10-2023 04:12 AM
Yes. Brought that down to 4 partitions while doing my transformations and then tried as well. On an average, it takes 2 minutes for the write. I'm not sure if that's the expected behavior with jdbc connection.
โ03-10-2023 04:16 AM
seems slow to me.
Are you sure you do not do any spark processing?
because if so: a chunck of that 2 minutes is spark transforming the data.
โ04-03-2023 04:21 AM
Hi @Vigneshraja Palanirajโ
Hope all is well!
Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
โ04-03-2023 04:32 AM
Thanks @Vartika Nainโ for following up. I closed this thread.
โ04-03-2023 04:32 AM
There are few options I tried out which had given me a better performance.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group