Data Engineering

Forum Posts

Sorted by:

by sriram_kumar • New Contributor II

06-06-2023 12:01:55 AM

3604 Views
4 replies
5 kudos

To do Optimization on the real time delta table

Hi Team,We have few prod tables which are created in s3 bucket, that have grown now very large, these tables are getting real time data continuously from round the clock databricks workflows; we would like run the optimization commands(optimize, zord...

Data Engineering

3604 Views
4 replies
5 kudos

06-06-2023 12:01:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-14-2023 11:03:54 PM

5 kudos

Hi @Sriram Kumar We haven't heard from you since the last response from @Suteja Kanuri . Kindly share the information with us, and in return, we will provide you with the necessary solution.Thanks and Regards

5 kudos

06-14-2023 11:03:54 PM

3 More Replies

by JRT5933 • New Contributor III

02-21-2023 11:01:03 AM

4574 Views
4 replies
7 kudos

Resolved! GOLD table slowed down at MERGE INTO

Howdy - I recently took a table FACT_TENDER and made it into a medalliona tyle TABLE to test performance since I suspected medallion would be quicker. Key differences: Both tables use bronze dataoriginal has all logic in one long notebookMERGE INTO t...

Data Engineering

4574 Views
4 replies
7 kudos

02-21-2023 11:01:03 AM

View Replies

Latest Reply

JRT5933
New Contributor III

02-22-2023 7:30:37 AM

7 kudos

I ended up instituing true and tried PARTITIONING and PRUNING methods to boost performance, which has succeeded.

7 kudos

02-22-2023 7:30:37 AM

3 More Replies

by hello_world • New Contributor III

12-28-2022 4:05:17 PM

6990 Views
3 replies
7 kudos

Resolved! What exactly is Z Ordering and Bloom Filter?

Have gone through the documentation, still cannot understand it.How is bloom filter indexing a column different from z ordering a column?Can somebody explain to me what exactly happens while these two techniques are applied?

Data Engineering

6990 Views
3 replies
7 kudos

12-28-2022 4:05:17 PM

View Replies

Latest Reply

Rishabh-Pandey
Databricks MVP

12-29-2022 12:28:30 AM

7 kudos

hey @Daniel Sahal 1-A Bloomfilter index is a space-efficient data structure that enables data skipping on chosen columns, particularly for fields containing arbitrary textrefer this code snipet to create bloom filter CREATE BLOOMFILTER INDEX ON [TAB...

7 kudos

12-29-2022 12:28:30 AM

2 More Replies

by Arun_tsr • New Contributor III

11-08-2022 10:06:47 PM

10269 Views
6 replies
3 kudos

How to do bucketing in Databricks?

We are migrating a job from onprem to databricks. We are trying to optimize the jobs but couldn't use bucketing because by default databricks stores all tables as delta table and it shows error that bucketing is not supported for delta. Is there anyw...

Data Engineering

10269 Views
6 replies
3 kudos

11-08-2022 10:06:47 PM

View Replies

Latest Reply

Pat
Esteemed Contributor

11-09-2022 4:28:37 AM

3 kudos

Hi @Arun Balaji ,bucketing is not supported for the delta tables as you have noticed.For the optimization and best practices with delta tables check this:https://docs.databricks.com/optimizations/index.htmlhttps://docs.databricks.com/delta/best-prac...

3 kudos

11-09-2022 4:28:37 AM

5 More Replies

by isaac_gritz • Databricks Employee

08-22-2022 11:54:29 PM

12913 Views
4 replies
3 kudos

Performance Tuning Best Practices

Recommendations for performance tuning best practices on DatabricksWe recommend also checking out this article from my colleague @Franco Patano on best practices for performance tuning on Databricks.Performance tuning your workloads is an important...

Data Engineering

12913 Views
4 replies
3 kudos

08-22-2022 11:54:29 PM

View Replies

Latest Reply

isaac_gritz
Databricks Employee

08-22-2022 11:55:26 PM

3 kudos

Let us know in the comments if you have any other performance tuning tips & tricks

3 kudos

08-22-2022 11:55:26 PM

3 More Replies

by wyzer • Contributor II

02-07-2022 6:06:57 AM

2928 Views
2 replies
1 kudos

Resolved! Are we using the advantage of "Map & Reduce" ?

Hello,We are new on Databricks and we would like to know if our working method are good.Currently, we are working like this :spark.sql("CREATE TABLE Temp (SELECT avg(***), sum(***) FROM aaa LEFT JOIN bbb WHERE *** >= ***)")With this method, are we us...

Data Engineering

2928 Views
2 replies
1 kudos

02-07-2022 6:06:57 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-08-2022 2:23:30 AM

1 kudos

Spark will handle the map/reduce for you.So as long as you use Spark provided functions, be it in scala, python or sql (or even R) you will be using distributed processing.You just care about what you want as a result.And afterwards when you are more...

1 kudos

02-08-2022 2:23:30 AM

1 More Replies

by User16826994223 • Databricks Employee

06-22-2021 6:08:09 AM

4758 Views
2 replies
0 kudos

Resolved! Garbage Collection optimization

I have a case where garbage collection is taking much time and I want to optimize it for better performance

Data Engineering

4758 Views
2 replies
0 kudos

06-22-2021 6:08:09 AM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-22-2021 9:06:59 AM

0 kudos

You can also tune the JVM's GC parameters directly, if you mean the pauses are too long. Set "spark.executor.extraJavaOptions", but it does require knowing a thing or two about how to tune for what performance goal.

0 kudos

06-22-2021 9:06:59 AM

1 More Replies

Databricks Community

To do Optimization on the real time delta table

Resolved! GOLD table slowed down at MERGE INTO

Resolved! What exactly is Z Ordering and Bloom Filter?

How to do bucketing in Databricks?

Performance Tuning Best Practices

Resolved! Are we using the advantage of "Map & Reduce" ?

Resolved! Garbage Collection optimization