Data Engineering

Forum Posts

Sorted by:

by AyushModi038 • New Contributor III

02-17-2023 6:26:26 AM

9531 Views
8 replies
10 kudos

Library installation in cluster taking a long time

I am trying to install "pycaret" libraray in cluster using whl file.But it is creating conflict in the dependency sometimes (not always, sometimes it works too.) My questions are -1 - How to install libraries in cluster only single time (Maybe from ...

Data Engineering

9531 Views
8 replies
10 kudos

02-17-2023 6:26:26 AM

View Replies

Latest Reply

Spencer_Kent
New Contributor III

07-05-2024 10:22:18 AM

10 kudos

@Retired_modWhat about question #1, which is what subsequent comments to this thread have been referring to? To recap the question: is it possible for "cluster-installed" libraries to be cached in such a way that they aren't completely reinstalled ev...

10 kudos

07-05-2024 10:22:18 AM

7 More Replies

by ghofigjong • New Contributor

02-27-2023 12:29:55 AM

8299 Views
4 replies
2 kudos

Resolved! How does partition pruning work on a merge into statement?

I have a delta table that is partitioned by Year, Date and month. I'm trying to merge data to this on all three partition columns + an extra column (an ID). My merge statement is below:MERGE INTO delta.<path of delta table> oldData using df newData ...

Data Engineering

8299 Views
4 replies
2 kudos

02-27-2023 12:29:55 AM

View Replies

Latest Reply

Umesh_S
New Contributor II

03-30-2023 1:24:57 PM

2 kudos

Isn't the suggested idea only filtering the input dataframe (resulting in a smaller amount of data to match across the whole delta table) rather than prune the delta table for relevant partitions to scan?

2 kudos

03-30-2023 1:24:57 PM

3 More Replies

by suresh1122 • New Contributor III

12-12-2022 10:14:09 PM

15271 Views
12 replies
7 kudos

dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?

I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...

Data Engineering

15271 Views
12 replies
7 kudos

12-12-2022 10:14:09 PM

View Replies

Latest Reply

Lakshay
Databricks Employee

08-24-2023 7:57:45 AM

7 kudos

You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time

7 kudos

08-24-2023 7:57:45 AM

11 More Replies

by ChrisS • New Contributor III

06-17-2023 4:24:04 AM

29149 Views
2 replies
2 kudos

Resolved! Am I being charged for Starter Warehouse Pro?

When I go to add data, I see that the Starter Warehouse Pro cluster spun up after the first use and has been there for a long time. It does not show in my clusters and I can't find a way to shut it down. Am I being charged for this? If so, how do I s...

Data Engineering

29149 Views
2 replies
2 kudos

06-17-2023 4:24:04 AM

View Replies

Latest Reply

ChrisS
New Contributor III

06-18-2023 1:09:15 AM

2 kudos

Thank you

2 kudos

06-18-2023 1:09:15 AM

1 More Replies

by Anonymous • Not applicable

04-17-2023 5:44:39 AM

7064 Views
0 replies
0 kudos

As companies grow and evolve, a Chief Technology Officer (CTO) becomes crucial in shaping the organization's technical direction and driving innov...

As companies grow and evolve, a Chief Technology Officer (CTO) becomes crucial in shaping the organization's technical direction and driving innovation. Regarding filling this critical leadership position, companies decide to either promote an existi...

Data Engineering

7064 Views
0 replies
0 kudos

04-17-2023 5:44:39 AM

by sid_de • New Contributor II

01-14-2023 10:51:34 PM

3645 Views
2 replies
2 kudos

404 Not Found [IP: 185.125.190.36 80] on trying to install google-chrome in databricks spark driver

We are installing google-chrome-stable in databricks cluster using apt-get install. Which has been working fine for a long time, but since the past few days it has started to fail intermittently.The following is the code that we run.%sh sudo curl -s...

Data Engineering

3645 Views
2 replies
2 kudos

01-14-2023 10:51:34 PM

View Replies

Latest Reply

sid_de
New Contributor II

01-24-2023 2:27:36 AM

2 kudos

Hi The issue was still persistent. We are trying to solve this by using docker image with preinstalled Selenium driver and chrome browser.RegardsDharmin

2 kudos

01-24-2023 2:27:36 AM

1 More Replies

by Anonymous • Not applicable

06-16-2021 2:03:00 PM

13446 Views
3 replies
1 kudos

Cluster in Pending State for long time

Pending for a long time at this stage “Finding instances for new nodes, acquiring more instances if necessary”. How can this be fixed?

Data Engineering

13446 Views
3 replies
1 kudos

06-16-2021 2:03:00 PM

View Replies

Latest Reply

Databricks_Buil
New Contributor III

01-18-2023 11:50:11 PM

1 kudos

Figured out after multiple connects that This is typically a cloud provider issue. You can file a support ticket if the issue persists.

1 kudos

01-18-2023 11:50:11 PM

2 More Replies

by Anonymous • Not applicable

02-23-2022 1:47:24 AM

9830 Views
8 replies
7 kudos

Resolved! data frame takes unusually long time to write for small data sets

We have configured workspace with own vpc. We need to extract data from DB2 and write as delta format. we tried to for 550k records with 230 columns, it took 50mins to complete the task. 15mn records takes more than 18hrs. Not sure why this takes suc...

Data Engineering

9830 Views
8 replies
7 kudos

02-23-2022 1:47:24 AM

View Replies

Latest Reply

elgeo
Valued Contributor II

11-10-2022 3:14:42 AM

7 kudos

Hello. We face exactly the same issue. Reading is quick but writing takes long time. Just to clarify that it is about a table with only 700k rows. Any suggestions please? Thank youremote_table = spark.read.format ( "jdbc" ) \.option ( "driver" , "com...

7 kudos

11-10-2022 3:14:42 AM

7 More Replies

by Anonymous • Not applicable

06-14-2021 2:55:19 PM

809 Views
0 replies
0 kudos

How should one debug the cause when Attaching to cluster takes a long time? (more than a few minutes)

Data Engineering

809 Views
0 replies
0 kudos

06-14-2021 2:55:19 PM

Databricks Community

Library installation in cluster taking a long time

Resolved! How does partition pruning work on a merge into statement?

dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?

Resolved! Am I being charged for Starter Warehouse Pro?

As companies grow and evolve, a Chief Technology Officer (CTO) becomes crucial in shaping the organization&#39;s technical direction and driving innov...

404 Not Found [IP: 185.125.190.36 80] on trying to install google-chrome in databricks spark driver

Cluster in Pending State for long time

Resolved! data frame takes unusually long time to write for small data sets

How should one debug the cause when Attaching to cluster takes a long time? (more than a few minutes)

As companies grow and evolve, a Chief Technology Officer (CTO) becomes crucial in shaping the organization's technical direction and driving innov...