Data Engineering

Forum Posts

Sorted by:

by Rinat • New Contributor

05-19-2023 8:29:01 AM

2157 Views
0 replies
0 kudos

How to configure Spark to adjust the number of output partitions after a join or groupBy?

I know you can set "spark.sql.shuffle.partitions" and "spark.sql.adaptive.advisoryPartitionSizeInBytes". The former will not work with adaptive query execution, and the latter only works for the first shuffle for some reason, after which it just uses...

Data Engineering

2157 Views
0 replies
0 kudos

05-19-2023 8:29:01 AM

by pantelis_mare • Contributor III

04-27-2022 12:41:08 AM

4208 Views
3 replies
0 kudos

Spark 3 AQE and cache

Hello everybody,I recently discovered (the hard way) that when a query plan uses cached data, the AQE does not kick-in. Result is that you loose the super cool feature of dynamic partition coalesce (no more custom shuffle readers in the DAG). Is ther...

Data Engineering

4208 Views
3 replies
0 kudos

04-27-2022 12:41:08 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

08-15-2022 1:45:53 PM

0 kudos

Hi @Pantelis Maroudis,Did you check the physical query plan? did you check the SQL sub tab with in Spark UI? it will help you to undertand better what is happening.

0 kudos

08-15-2022 1:45:53 PM

2 More Replies

by -werners- • Esteemed Contributor III

12-09-2021 3:27:53 AM

4012 Views
3 replies
14 kudos

Notebook fails in job but not in interactive mode

I have this notebook which is scheduled by Data Factory on a daily basis.It works fine, up to today. All of a sudden I keep on getting NullpointerException when writing the data.After some searching online, I disabled AQE. But this does not help.Th...

Data Engineering

4012 Views
3 replies
14 kudos

12-09-2021 3:27:53 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-09-2021 7:53:30 AM

14 kudos

After some tests it seems that if I run the notebook on an interactive cluster, I only get 80% of load (Ganglia metrics).If I run the same notebook on a job cluster with the same VM types etc (so the only difference is interactive vs job), I get over...

14 kudos

12-09-2021 7:53:30 AM

2 More Replies

by Personal1 • New Contributor II

10-13-2021 4:02:55 PM

5537 Views
2 replies
2 kudos

Resolved! Understanding Partitions in Spark Local Mode

I have few fundamental questions in Spark3 while running a simple Spark app in my local mac machine (with 6 cores in total). Please help.local[*] runs my Spark application in local mode with all the cores present on my mac, correct? It also means tha...

Data Engineering

5537 Views
2 replies
2 kudos

10-13-2021 4:02:55 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-15-2021 1:24:59 AM

2 kudos

That is a lot of questions in one topic.Let's give it a try:[1] this all depends on the values of the concerning parameters and the program you run(think joins, unions, repartition etc)[2] spark.default.parallelism is by default the number of cores *...

2 kudos

10-15-2021 1:24:59 AM

1 More Replies

by User16826992666 • Databricks Employee

06-16-2021 8:59:37 PM

1954 Views
1 replies
0 kudos

Resolved! Do I still need to use skew join hints if I have Adaptive Query Execution enabled?

From what I have read about AQE it seems to do a lot of what skew join hints did automatically. So should I still be using skew hints in my queries? Is there harm in using them?

Data Engineering

1954 Views
1 replies
0 kudos

06-16-2021 8:59:37 PM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-17-2021 11:13:31 PM

0 kudos

With AQE Databricks has the most up-to-date accurate statistics at the end of a query stage and can opt for a better physical strategy and or do optimizations that used to require hints,In the case of skew join hints, is recommended to rely on AQE...

0 kudos

06-17-2021 11:13:31 PM

by User16783855117 • Databricks Employee

06-08-2021 3:27:14 PM

1977 Views
0 replies
0 kudos

Is there a way to know if Adaptive Query Execution with Spark 3 has changed my Spark plan?

From the demo notebook located here (https://databricks.com/blog/2020/05/29/adaptive-query-execution-speeding-up-spark-sql-at-runtime.html) it seems like the approach to demonstrate AQE was working was to first calculate the Spark query plan before r...

Data Engineering

1977 Views
0 replies
0 kudos

06-08-2021 3:27:14 PM

Databricks Community

How to configure Spark to adjust the number of output partitions after a join or groupBy?

Spark 3 AQE and cache

Notebook fails in job but not in interactive mode

Resolved! Understanding Partitions in Spark Local Mode

Resolved! Do I still need to use skew join hints if I have Adaptive Query Execution enabled?

Is there a way to know if Adaptive Query Execution with Spark 3 has changed my Spark plan?