Topics with Label: Performance Issues

Forum Posts

Sorted by:

by Jana • New Contributor III

02-15-2022 9:26:54 AM

10574 Views
9 replies
4 kudos

Resolved! Parsing 5 GB json file is running long on cluster

I was creating delta table from ADLS json input file. but the job was running long while creating delta table from json. Below is my cluster configuration. Is the issue related to cluster config ? Do I need to upgrade the cluster config ?The cluster ...

Data Engineering

10574 Views
9 replies
4 kudos

02-15-2022 9:26:54 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

03-01-2022 12:48:29 AM

4 kudos

with multiline = true, the json is read as a whole and processed as such.I'd try with a beefier cluster.

4 kudos

03-01-2022 12:48:29 AM

8 More Replies

by mbaumga • New Contributor III

04-18-2023 3:34:20 AM

8767 Views
3 replies
2 kudos

Performance issues when loading an Excel file from DBFS using R

I have uploaded small Excel files on my DBFS. I then use function read_xlsx() from the "readxl" package in R to import the file into the R memory. I use a standard cluster (12.1, non ML). The function works but it takes ages. E.g. a simple Excel tabl...

Data Engineering

8767 Views
3 replies
2 kudos

04-18-2023 3:34:20 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-23-2023 9:29:01 PM

2 kudos

Hi @Marcel Baumgartner Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

2 kudos

04-23-2023 9:29:01 PM

2 More Replies

by sedat • New Contributor II

01-30-2023 8:08:52 AM

2651 Views
2 replies
2 kudos

Hi, is there any document for databricks about performance tuning and reporting?

Hi, I need to analyse performance issues for databricks. Is there any document or monitoring tool to run to see what is happening in databricks? I am very new in databricks. Best

Data Engineering

2651 Views
2 replies
2 kudos

01-30-2023 8:08:52 AM

View Replies

Latest Reply

Nhan_Nguyen
Valued Contributor

01-31-2023 5:14:30 AM

2 kudos

You could try some courses in "https://customer-academy.databricks.com/"What's New In Apache Spark 3.0Optimizing Apache Spark on Databricks

2 kudos

01-31-2023 5:14:30 AM

1 More Replies

by data_boy_2022 • New Contributor III

08-19-2022 1:51:44 PM

3774 Views
2 replies
0 kudos

Resolved! Writing transformed DataFrame to a persistent table is unbearable slow

I want to transform a DF with a simple UDF. Afterwards I want to store the resulting DF in a new table (see code below)key = "test_key" schema = StructType([ StructField("***", StringType(), True), StructField("yyy", StringType(), True), StructF...

Data Engineering

3774 Views
2 replies
0 kudos

08-19-2022 1:51:44 PM

View Replies

Latest Reply

Vidula
Honored Contributor

09-11-2022 11:48:29 PM

0 kudos

Hello @Jan R Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

09-11-2022 11:48:29 PM

1 More Replies

by User16826992666 • Valued Contributor

06-16-2021 9:42:52 AM

2449 Views
1 replies
0 kudos

How do I know if the number of files are causing performance issues?

I have read and heard that having too many small files can cause performance problems when reading large data sets. But how do I know if that is an issue I am facing?

Data Engineering

2449 Views
1 replies
0 kudos

06-16-2021 9:42:52 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-18-2021 1:47:00 PM

0 kudos

Databricks SQL endpoint has a query history section which provides additional information to debug / tune queries. One such metric under execution details is the number of files read. For ETL/Data science workloads, you could use the Spark UI of the ...

0 kudos

06-18-2021 1:47:00 PM

Databricks Community

Resolved! Parsing 5 GB json file is running long on cluster

Performance issues when loading an Excel file from DBFS using R

Hi, is there any document for databricks about performance tuning and reporting?

Resolved! Writing transformed DataFrame to a persistent table is unbearable slow

How do I know if the number of files are causing performance issues?