cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sriram_kumar
by New Contributor II
  • 2313 Views
  • 4 replies
  • 5 kudos

To do Optimization on the real time delta table

Hi Team,We have few prod tables which are created in s3 bucket, that have grown now very large, these tables are getting real time data continuously from round the clock databricks workflows; we would like run the optimization commands(optimize, zord...

  • 2313 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Sriram Kumar​ We haven't heard from you since the last response from @Suteja Kanuri​ â€‹ . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

  • 5 kudos
3 More Replies
JRT5933
by New Contributor III
  • 2361 Views
  • 4 replies
  • 7 kudos

Resolved! GOLD table slowed down at MERGE INTO

Howdy - I recently took a table FACT_TENDER and made it into a medalliona tyle TABLE to test performance since I suspected medallion would be quicker. Key differences: Both tables use bronze dataoriginal has all logic in one long notebookMERGE INTO t...

  • 2361 Views
  • 4 replies
  • 7 kudos
Latest Reply
JRT5933
New Contributor III
  • 7 kudos

I ended up instituing true and tried PARTITIONING and PRUNING methods to boost performance, which has succeeded.

  • 7 kudos
3 More Replies
hello_world
by New Contributor III
  • 3864 Views
  • 3 replies
  • 6 kudos

Resolved! What exactly is Z Ordering and Bloom Filter?

Have gone through the documentation, still cannot understand it.How is bloom filter indexing a column different from z ordering a column?Can somebody explain to me what exactly happens while these two techniques are applied?

  • 3864 Views
  • 3 replies
  • 6 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 6 kudos

hey @Daniel Sahal​ 1-A Bloomfilter index is a space-efficient data structure that enables data skipping on chosen columns, particularly for fields containing arbitrary textrefer this code snipet to create bloom filter CREATE BLOOMFILTER INDEX ON [TAB...

  • 6 kudos
2 More Replies
Arun_tsr
by New Contributor III
  • 5992 Views
  • 6 replies
  • 2 kudos

How to do bucketing in Databricks?

We are migrating a job from onprem to databricks. We are trying to optimize the jobs but couldn't use bucketing because by default databricks stores all tables as delta table and it shows error that bucketing is not supported for delta. Is there anyw...

  • 5992 Views
  • 6 replies
  • 2 kudos
Latest Reply
Pat
Honored Contributor III
  • 2 kudos

Hi @Arun Balaji​ ,bucketing is not supported for the delta tables as you have noticed.For the optimization and best practices with delta tables check this:https://docs.databricks.com/optimizations/index.htmlhttps://docs.databricks.com/delta/best-prac...

  • 2 kudos
5 More Replies
isaac_gritz
by Databricks Employee
  • 7082 Views
  • 4 replies
  • 2 kudos

Performance Tuning Best Practices

Recommendations for performance tuning best practices on DatabricksWe recommend also checking out this article from my colleague @Franco Patano​ on best practices for performance tuning on Databricks.​Performance tuning your workloads is an important...

Performance Tuning Framework.png
  • 7082 Views
  • 4 replies
  • 2 kudos
Latest Reply
isaac_gritz
Databricks Employee
  • 2 kudos

Let us know in the comments if you have any other performance tuning tips & tricks

  • 2 kudos
3 More Replies
wyzer
by Contributor II
  • 2043 Views
  • 2 replies
  • 1 kudos

Resolved! Are we using the advantage of "Map & Reduce" ?

Hello,We are new on Databricks and we would like to know if our working method are good.Currently, we are working like this :spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")With this method, are we us...

  • 2043 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Spark will handle the map/reduce for you.So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.You just care about what you want as a result.And afterwards when you are more...

  • 1 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 3327 Views
  • 2 replies
  • 0 kudos

Resolved! Garbage Collection optimization

I have a case where garbage collection is taking much time and I want to optimize it for better performance

  • 3327 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

You can also tune the JVM's GC parameters directly, if you mean the pauses are too long. Set "spark.executor.extraJavaOptions", but it does require knowing a thing or two about how to tune for what performance goal.

  • 0 kudos
1 More Replies
Labels