how to optimize the runtime in 10.4 cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ01-22-2023 04:58 AM
i am loading the 1billion data from spark dataframe into target table, but in the 7.3 cluster it takes 3 hours to complete but after migrated to 10.4 cluster its taking 8 hours to complete , how can i reduce the time durationโ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ01-24-2023 12:06 AM
Hi, Please refer https://docs.databricks.com/clusters/cluster-config-best-practices.html for best practises for cluster configurations. Please let us know if this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ01-24-2023 05:33 PM
Hi @Mohammed sadamuseanโ,
Could you provide more details on what are you doing? What type of transformations/actions are you doing? whats your source and sink? batch or streaming? all that information will help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ01-24-2023 06:32 PM
i have data in adls, i load thise data into multiple dataframes in the databricks notebook, from the final dataframe i am loading data into final target table based on the dataframe tempview, usually it takes 3 in 7.3 cluster but in 10.4 cluster it take around 8 hours , 1 billion records is thereโ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-24-2023 03:40 PM
could you check your Spark UI to identify which stage is taking the longest time, and share some information in here

