05-29-2015 07:49 AM
My company urgently needs help, we are having severe performance problems with spark and are having to switch to a different solution if we don't get to the bottom of it.
We are on 1.3.1, using spark SQL, ORC Files with partitions and caching in memory, yet just a few users making 10 requests each seems to really slow our cluster down and we need to imminently to be able to handle many more requests.
We have tried increasing nodes, could/memory, stripe sizes, config changes etc to speed up our queries and getting nowhere. We urgently need any help people can offer. Happy to pay, we just need to understand better the limitations of Spark / Spark SQL so we can decide what we need to do.
05-29-2015 08:00 AM
@msj50 - can you share a URL to a notebook that we can look at to evaluate your performance issues?
Note this is a private comment that only you can see! Please respond with the same.
05-29-2015 08:13 AM
Hi Pat, thanks for getting back to me, we aren't using the databricks cloud, but have our own cluster running on AWS. I created an earlier post with some additional information https://forums.databricks.com/questions/919/what-is-the-optimal-number-of-cores-i-should-use-f.html
We are pretty stuck at the moment so I would be grateful for any help you can provide.
05-29-2015 08:29 AM
@msj50 - if you can get running in Databricks Cloud, we can certainly help. You can sign-up here: http://go.databricks.com/register-for-dbc.
Otherwise, you may want to post your questions to the users mailing list at: user@spark.apache.org
05-29-2015 08:34 AM
Unfortunately Databricks Cloud doesn't match our requirements at this point in time... I was wondering if Databricks provides any consultancy outside of that, or if you could recommend someone else with a sufficient level of expertise in Spark performance?
06-01-2015 09:02 AM
anybody there?
06-02-2015 06:11 PM
Hi,
The performance of your Spark queries is severely impacted by the way your underlying data is encoded. If you have a ton of files, sometimes the run time for your Spark job can entirely be dependent on the time it takes to read all of your files. Other times, if you have super large files in an unsplittable format, that can also bottleneck your job. Also, if you do certain queries and your data is heavily skewed towards only a few keys, that can make your job very slow too.
But in short - it's really hard to say exactly what is slowing down your jobs and what is going on without doing some diagnosis on what you are doing specifically.
07-06-2015 12:03 AM
@vida can you please guide what are the steps required to do the proper diagnostics to identify what is actually slowing down the Spark cache data retrieval.
Is there any official or non official help and support subscription available which i can buy to get some help?
If you have expertise in spark cache slow data retrieval diagnosis and treatment, please feel free to get in contact with me.
07-15-2015 10:31 AM
@msj50, @happpy
I wish I had a neat checklist of things to check for performance, but there are too many potential issues that can cause slowness. These are the most common I've seen:
As you can see - it's just really intricate what issue you may be facing.
Since you both asked about support - with a professional license of Databricks, we can diagnose and work through these issues with you and even advise on architecture level decisions for using Spark. Please email sales@databricks.com to inquire further.
09-15-2019 02:43 AM
could you please state the work-around from each above bottlenecks? as these (files of various size, tables with high number of columns, Joins etc.) are very common use cases in data processing.
11-02-2015 02:29 PM
In my project, following solutions were launched one-by-one to improve performance
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group