Topics with Label: Performance

Forum Posts

Sorted by:

by msj50 • New Contributor III

05-29-2015 7:49:19 AM

7378 Views
11 replies
1 kudos

Spark Running Really slow - help required

My company urgently needs help, we are having severe performance problems with spark and are having to switch to a different solution if we don't get to the bottom of it. We are on 1.3.1, using spark SQL, ORC Files with partitions and caching in me...

Data Engineering

7378 Views
11 replies
1 kudos

05-29-2015 7:49:19 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-13-2023 10:21:25 PM

1 kudos

Hi @msj50 , Thank you for posting your question in our community! We are happy to assist you. To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

1 kudos

11-13-2023 10:21:25 PM

10 More Replies

by amitca71 • Contributor II

06-22-2023 4:14:56 AM

1089 Views
2 replies
1 kudos

performance tool for databricks sql

Hii'm looking for performance test tool.I saw that there was apost about jmeter https://stackoverflow.com/questions/66913893/how-can-i-connect-jmeter-with-databricks-spark-cluster#comment118293766_66915965 , however, the jdbc paraeters are requesting...

Data Engineering

1089 Views
2 replies
1 kudos

06-22-2023 4:14:56 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-22-2023 10:23:27 PM

1 kudos

Hi @Amit Cahanovich Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

1 kudos

06-22-2023 10:23:27 PM

1 More Replies

by ron_lusha • New Contributor

06-04-2023 6:27:32 AM

561 Views
1 replies
0 kudos

How can I know if databricks auto-detected to use tuneFileSizesForRewrites?

We are having some issues with merge performance, so I went and read a bit in the documentation, I found this section:https://docs.databricks.com/delta/tune-file-size.html#autotune-file-size-based-on-workload"Databricks recommends setting the table p...

Data Engineering

561 Views
1 replies
0 kudos

06-04-2023 6:27:32 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 3:02:16 AM

0 kudos

Hi @Ron Serruya Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

0 kudos

06-17-2023 3:02:16 AM

by dukebaslangic • New Contributor II

06-15-2023 11:07:24 PM

951 Views
3 replies
3 kudos

Resolved! Databricks performance related documentation/books

Hi,Do you know any good resources about Databricks performance improvements(like improving query performances, monitoring/resolving performance bottlenecks etc)?Thanks

Data Engineering

951 Views
3 replies
3 kudos

06-15-2023 11:07:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 12:18:26 AM

3 kudos

Hi @Ömer Özsakarya We haven't heard from you since the last response from @Lakshay Goel , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...

3 kudos

06-17-2023 12:18:26 AM

2 More Replies

by Thor • New Contributor III

05-19-2023 3:07:11 AM

2668 Views
0 replies
0 kudos

What is the performance impact of changing dataSkippingNumIndexedCols from 32 to 16?

I already improved a lot the performances of our ETL (x20 !) but I still want to know where I can improve performances. I seems that tables stats and column indexing slow down a bit writings so I want to decrease dataSkippingNumIndexedCols to match t...

Data Engineering

2668 Views
0 replies
0 kudos

05-19-2023 3:07:11 AM

by param3sh • New Contributor

02-09-2023 8:00:25 AM

863 Views
3 replies
0 kudos

Performance b/w Managed Table and Un-Managed table

I am using Databricks in Azure. I want to mount ADLS Gen2 on Databricks and create unmanged (external) tables on the mount point. But before that I want to know which will give best performance, is it Managed table (stores data in DBFS root)or Un-ma...

Data Engineering

863 Views
3 replies
0 kudos

02-09-2023 8:00:25 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 11:58:38 PM

0 kudos

Hi @Paramesh Malla Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

0 kudos

04-08-2023 11:58:38 PM

2 More Replies

by sedat • New Contributor II

01-30-2023 8:08:52 AM

1033 Views
2 replies
2 kudos

Hi, is there any document for databricks about performance tuning and reporting?

Hi, I need to analyse performance issues for databricks. Is there any document or monitoring tool to run to see what is happening in databricks? I am very new in databricks. Best

Data Engineering

1033 Views
2 replies
2 kudos

01-30-2023 8:08:52 AM

View Replies

Latest Reply

Nhan_Nguyen
Valued Contributor

01-31-2023 5:14:30 AM

2 kudos

You could try some courses in "https://customer-academy.databricks.com/"What's New In Apache Spark 3.0Optimizing Apache Spark on Databricks

2 kudos

01-31-2023 5:14:30 AM

1 More Replies

by User16783854657 • New Contributor III

06-09-2021 3:12:47 PM

1850 Views
4 replies
6 kudos

How do I know how much of a query/job used Photon?

I'm trying to use the native execution engine, Photon. How can I tell if a query is using Photon or is falling back to the non-native Spark engine?

Data Engineering

1850 Views
4 replies
6 kudos

06-09-2021 3:12:47 PM

View Replies

Latest Reply

venkat09
New Contributor III

01-21-2023 5:05:52 PM

6 kudos

Typo error in my second point of the previous post. Click the execution plan of your task[this is available under SQL/Dataframe tab in Spark UI]. It explains what operations run in the photon engine and what didn't execute by photon.

6 kudos

01-21-2023 5:05:52 PM

3 More Replies

by joakon • New Contributor III

12-16-2022 1:10:40 PM

1504 Views
5 replies
1 kudos

Resolved! slow running query

Hi All, I would you to get some ideas on how to improve performance on a data frame with around 10M rows. adls- gen2df1 =source1 , format , parquet ( 10 m)df2 =source2 , format , parquet ( 10 m)df = join df1 and df2 type =inner join df.count() is ...

Data Engineering

1504 Views
5 replies
1 kudos

12-16-2022 1:10:40 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-23-2022 8:37:33 PM

1 kudos

hey @raghu maremanda did you get any answer if yes ,please update here, by that other people can also get the solution

1 kudos

12-23-2022 8:37:33 PM

4 More Replies

by vr • Contributor

11-26-2022 4:26:24 PM

3984 Views
12 replies
9 kudos

Why is execution too fast?

I have a table, full scan of which takes ~20 minutes on my cluster. The table has "Time" TIMESTAMP column and "day" DATE column. The latter is computed (manually) as "Time" truncated to day and used for partitioning.I query the table using predicate ...

Data Engineering

3984 Views
12 replies
9 kudos

11-26-2022 4:26:24 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-29-2022 3:06:44 AM

9 kudos

Hi @Vladimir Ryabtsev, We haven’t heard from you since the last response from @Uma Maheswara Rao Desula, and I was checking back to see if their suggestions helped you.Or else, If you have any solution, please share it with the community, as it c...

9 kudos

11-29-2022 3:06:44 AM

11 More Replies

by AP • New Contributor III

07-31-2022 8:20:58 PM

2328 Views
5 replies
3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

Data Engineering

2328 Views
5 replies
3 kudos

07-31-2022 8:20:58 PM

View Replies

Latest Reply

Anonymous
Not applicable

08-03-2022 11:09:30 AM

3 kudos

@AKSHAY PALLERLA Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens for jumping in, as always!

3 kudos

08-03-2022 11:09:30 AM

4 More Replies

by Rahul_Samant • Contributor

03-14-2022 3:55:28 AM

6696 Views
5 replies
3 kudos

Resolved! Bucketing on Delta Tables

getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...

Data Engineering

6696 Views
5 replies
3 kudos

03-14-2022 3:55:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-10-2022 5:57:58 AM

3 kudos

Hi @Rahul Samant , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...

3 kudos

05-10-2022 5:57:58 AM

4 More Replies

by govind • New Contributor

07-28-2021 7:49:40 AM

1193 Views
4 replies
0 kudos

Write 160M rows with 300 columns into Delta Table using Databricks?

Hi, I am using databricks to load data from one delta table into another delta table. I'm using SIMBA Spark JDBC connector to pull data from delta table in my source instance and writing into delta table in my databricks instance. The source has...

Data Engineering

1193 Views
4 replies
0 kudos

07-28-2021 7:49:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-02-2022 8:41:34 AM

0 kudos

Hi @govind@dqlabs.ai Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

0 kudos

05-02-2022 8:41:34 AM

3 More Replies

by Suresh1 • New Contributor

04-25-2022 11:57:44 AM

614 Views
0 replies
0 kudos

Query failures are seen during the TPC-DS performance benchmark run

When I'm running TPC-DS (1TB) benchmark on Photon 10.2 and I see the following failures: Queries Q06, Q09 and Q41 fail with the error "Query: AEValueSubQuery is not supported". Q66 fails with the error "[MISSING_COLUMN] org.apache.spark.sql.A...

Data Engineering

614 Views
0 replies
0 kudos

04-25-2022 11:57:44 AM

by ArindamHalder • New Contributor II

08-17-2021 12:57:33 PM

1146 Views
3 replies
3 kudos

Resolved! Is there any performance result available for DeltaLake?

Specifically for write and read streaming data to HDFS or s3 etc. For IoT specific scenario how it performs on time series transactional data. Can we consider delta table as time series table?

Data Engineering

1146 Views
3 replies
3 kudos

08-17-2021 12:57:33 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-04-2022 1:28:02 AM

3 kudos

Hi @Arindam Halder , How is it going?Were you able to resolve your problem?

3 kudos

04-04-2022 1:28:02 AM

2 More Replies