cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

amitca71
by Contributor II
  • 2413 Views
  • 2 replies
  • 1 kudos

performance tool for databricks sql

Hii'm looking for performance test tool.I saw that there was apost about jmeter https://stackoverflow.com/questions/66913893/how-can-i-connect-jmeter-with-databricks-spark-cluster#comment118293766_66915965 , however, the jdbc paraeters are requesting...

  • 2413 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Amit Cahanovich​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 1 kudos
1 More Replies
ron_lusha
by New Contributor
  • 1099 Views
  • 1 replies
  • 0 kudos

How can I know if databricks auto-detected to use tuneFileSizesForRewrites?

We are having some issues with merge performance, so I went and read a bit in the documentation, I found this section:https://docs.databricks.com/delta/tune-file-size.html#autotune-file-size-based-on-workload"Databricks recommends setting the table p...

  • 1099 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ron Serruya​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
dukebaslangic
by New Contributor II
  • 2089 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks performance related documentation/books

Hi,Do you know any good resources about Databricks performance improvements(like improving query performances, monitoring/resolving performance bottlenecks etc)?Thanks

  • 2089 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ömer Özsakarya​  We haven't heard from you since the last response from @Lakshay Goel​ â€‹, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...

  • 3 kudos
2 More Replies
param3sh
by New Contributor
  • 1725 Views
  • 3 replies
  • 0 kudos

Performance b/w Managed Table and Un-Managed table

I am using Databricks in Azure. I want to mount ADLS Gen2 on Databricks and create unmanged (external) tables on the mount point. But before that I want to know which will give best performance, is it Managed table (stores data in DBFS root)or Un-ma...

  • 1725 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Paramesh Malla​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

  • 0 kudos
2 More Replies
sedat
by New Contributor II
  • 2012 Views
  • 2 replies
  • 2 kudos

Hi, is there any document for databricks about performance tuning and reporting?

Hi, I need to analyse performance issues for databricks. Is there any document or monitoring tool to run to see what is happening in databricks? I am very new in databricks. Best

  • 2012 Views
  • 2 replies
  • 2 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 2 kudos

You could try some courses in "https://customer-academy.databricks.com/"What's New In Apache Spark 3.0Optimizing Apache Spark on Databricks

  • 2 kudos
1 More Replies
cgrant
by Databricks Employee
  • 3702 Views
  • 4 replies
  • 6 kudos

How do I know how much of a query/job used Photon?

I'm trying to use the native execution engine, Photon. How can I tell if a query is using Photon or is falling back to the non-native Spark engine?

  • 3702 Views
  • 4 replies
  • 6 kudos
Latest Reply
venkat09
New Contributor III
  • 6 kudos

Typo error in my second point of the previous post. Click the execution plan of your task[this is available under SQL/Dataframe tab in Spark UI]. It explains what operations run in the photon engine and what didn't execute by photon.

  • 6 kudos
3 More Replies
joakon
by New Contributor III
  • 3055 Views
  • 5 replies
  • 1 kudos

Resolved! slow running query

Hi All, I would you to get some ideas on how to improve performance on a data frame with around 10M rows. adls- gen2df1 =source1 , format , parquet ( 10 m)df2 =source2 , format , parquet ( 10 m)df = join df1 and df2 type =inner join df.count() is ...

  • 3055 Views
  • 5 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

hey @raghu maremanda​ did you get any answer if yes ,please update here, by that other people can also get the solution

  • 1 kudos
4 More Replies
vr
by Contributor
  • 10019 Views
  • 11 replies
  • 9 kudos

Why is execution too fast?

I have a table, full scan of which takes ~20 minutes on my cluster. The table has "Time" TIMESTAMP column and "day" DATE column. The latter is computed (manually) as "Time" truncated to day and used for partitioning.I query the table using predicate ...

stage stats DAG
  • 10019 Views
  • 11 replies
  • 9 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 9 kudos

Hi @Vladimir Ryabtsev​ ,Because you are creating a delta table, I think that you are seeing a performance improvement because of Dynamic Partition pruning, According to the documentation, "Partition pruning can take place at query compilation time wh...

  • 9 kudos
10 More Replies
AP
by New Contributor III
  • 4383 Views
  • 5 replies
  • 3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

  • 4383 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@AKSHAY PALLERLA​ Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens​ for jumping in, as always!

  • 3 kudos
4 More Replies
Rahul_Samant
by Contributor
  • 11654 Views
  • 4 replies
  • 4 kudos

Resolved! Bucketing on Delta Tables

getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...

  • 11654 Views
  • 4 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Rahul Samant​  , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...

  • 4 kudos
3 More Replies
govind
by New Contributor
  • 2133 Views
  • 2 replies
  • 0 kudos

Write 160M rows with 300 columns into Delta Table using Databricks?

Hi, I am using databricks to load data from one delta table into another delta table. I'm using SIMBA Spark JDBC connector to pull data from delta table in my source instance and writing into delta table in my databricks instance. The source has...

  • 2133 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @govind@dqlabs.ai​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 0 kudos
1 More Replies
guruv
by New Contributor III
  • 5718 Views
  • 4 replies
  • 1 kudos

Resolved! Saprk UI not showing any running tasks

HI,I am running a Notebook job calling a JAR code (application code implmented in C#). in the Spark UI page for almost 2 hrs, it'w not showing any tasks and even the CPU usage is below 20%, memory usage is very small. Before this 2 hr window it shows...

  • 5718 Views
  • 4 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

If I understood the issue correctly .

  • 1 kudos
3 More Replies
ArindamHalder
by New Contributor II
  • 2094 Views
  • 1 replies
  • 3 kudos

Resolved! Is there any performance result available for DeltaLake?

Specifically for write and read streaming data to HDFS or s3 etc. For IoT specific scenario how it performs on time series transactional data. Can we consider delta table as time series table?

  • 2094 Views
  • 1 replies
  • 3 kudos
Latest Reply
mathan_pillai
Databricks Employee
  • 3 kudos

Hi @Arindam Halder​ , Delta lake is more performant compared to a regular parquet table. pls check below for some stats on the performancehttps://docs.azuredatabricks.net/_static/notebooks/delta/optimize-python.htmlyes, you can use it for time series...

  • 3 kudos
Labels