Topics with Label: Performance Issue

Forum Posts

Sorted by:

by sensanjoy • Contributor II

03-29-2023 2:42:45 AM

27830 Views
6 replies
1 kudos

Resolved! Performance issue with pyspark udf function calling rest api

Hi All,I am facing some performance issue with one of pyspark udf function that post data to REST API(uses cosmos db backend to store the data).Please find the details below: # The spark dataframe(df) contains near about 30-40k data. # I am using pyt...

Data Engineering

27830 Views
6 replies
1 kudos

03-29-2023 2:42:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-03-2023 11:36:56 PM

1 kudos

Hi @Sanjoy Sen Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback w...

1 kudos

04-03-2023 11:36:56 PM

5 More Replies

by Indra • New Contributor

01-18-2023 1:52:24 PM

3067 Views
1 replies
0 kudos

Performance issue with Simba ODBC Driver to perform simple insert command to Deltalake

Hi,Our team is using Simba ODBC to perform data loading to Deltalake, and For a table with 3 columns it took around 55 seconds to insert 15 records. How to improve transactional loading into Deltalake? is there any option from the Simba ODBC driver t...

Data Engineering

3067 Views
1 replies
0 kudos

01-18-2023 1:52:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 8:03:43 AM

0 kudos

@Indra Limena :There are several ways to improve transactional loading into Delta Lake:Use Delta Lake's native Delta JDBC/ODBC connector instead of a third-party ODBC driver like Simba. The native connector is optimized for Delta Lake and supports b...

0 kudos

04-10-2023 8:03:43 AM

by Phani1 • Databricks MVP

03-01-2023 9:40:00 PM

4507 Views
3 replies
0 kudos

Performance issue while loading bulk data into Post Gress DB from data bricks.

We are facing a performance issue while loading bulk data into Postgress DB from data bricks. We are using spark JDBC connections to move the data. However, the rate of transfer is very low which is causing performance bottleneck. is there any better...

Data Engineering

4507 Views
3 replies
0 kudos

03-01-2023 9:40:00 PM

View Replies

Latest Reply

User16502773013
Databricks Employee

03-29-2023 7:30:59 PM

0 kudos

Hello @Janga Reddy @Daniel Sahal and @Vidula Khanna ,To enhance performance in general we need to design for more parallelism, in Spark JDBC context this controlled by the number of partitions for the data to be writtenThe example here shows how t...

0 kudos

03-29-2023 7:30:59 PM

2 More Replies

by KuldeepChitraka • New Contributor III

01-31-2023 8:08:58 AM

2953 Views
3 replies
6 kudos

Performance Issue : Create DELTA table form 2 TB PARQUET file

We are trying to create a DELTA table (CTAS statement) from 2 TB PARQUET file and its taking huge amount of time around 12~ hrs.is it normal.? What are option to tune/optimize this ? are we doing anything wrongCluster : Interactive/30 Cores / 320 GB ...

Data Engineering

2953 Views
3 replies
6 kudos

01-31-2023 8:08:58 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

01-31-2023 10:58:05 AM

6 kudos

Please use COPY INTO (first create an empty delta table) or CONVERT TO DELTA instead of CTAS it will be much more faster, and it process will be auto-optimized.

6 kudos

01-31-2023 10:58:05 AM

2 More Replies

Databricks Community

Resolved! Performance issue with pyspark udf function calling rest api

Performance issue with Simba ODBC Driver to perform simple insert command to Deltalake

Performance issue while loading bulk data into Post Gress DB from data bricks.

Performance Issue : Create DELTA table form 2 TB PARQUET file