cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AyushModi038
by New Contributor III
  • 3139 Views
  • 4 replies
  • 3 kudos

Library installation in cluster taking a long time

I am trying to install "pycaret" libraray in cluster using whl file.But it is creating conflict in the dependency sometimes (not always, sometimes it works too.) ​My questions are -1 - How to install libraries in cluster only single time (Maybe from ...

  • 3139 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ayush Modi​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 3 kudos
3 More Replies
suresh1122
by New Contributor III
  • 8862 Views
  • 11 replies
  • 7 kudos

dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?

I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...

  • 8862 Views
  • 11 replies
  • 7 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 7 kudos

You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time

  • 7 kudos
10 More Replies
ChrisS
by New Contributor III
  • 10285 Views
  • 2 replies
  • 2 kudos

Resolved! Am I being charged for Starter Warehouse Pro?

When I go to add data, I see that the Starter Warehouse Pro cluster spun up after the first use and has been there for a long time. It does not show in my clusters and I can't find a way to shut it down. Am I being charged for this? If so, how do I s...

image
  • 10285 Views
  • 2 replies
  • 2 kudos
Latest Reply
ChrisS
New Contributor III
  • 2 kudos

Thank you

  • 2 kudos
1 More Replies
Anonymous
by Not applicable
  • 570 Views
  • 0 replies
  • 0 kudos

As companies grow and evolve, a Chief Technology Officer (CTO) becomes crucial in shaping the organization's technical direction and driving innov...

As companies grow and evolve, a Chief Technology Officer (CTO) becomes crucial in shaping the organization's technical direction and driving innovation. Regarding filling this critical leadership position, companies decide to either promote an existi...

  • 570 Views
  • 0 replies
  • 0 kudos
ghofigjong
by New Contributor
  • 3485 Views
  • 2 replies
  • 1 kudos

Resolved! How does partition pruning work on a merge into statement?

I have a delta table that is partitioned by Year, Date and month. I'm trying to merge data to this on all three partition columns + an extra column (an ID). My merge statement is below:MERGE INTO delta.<path of delta table> oldData using df newData ...

  • 3485 Views
  • 2 replies
  • 1 kudos
Latest Reply
Umesh_S
New Contributor II
  • 1 kudos

Isn't the suggested idea only filtering the input dataframe (resulting in a smaller amount of data to match across the whole delta table) rather than prune the delta table for relevant partitions to scan?

  • 1 kudos
1 More Replies
sid_de
by New Contributor II
  • 2335 Views
  • 3 replies
  • 2 kudos

404 Not Found [IP: 185.125.190.36 80] on trying to install google-chrome in databricks spark driver

We are installing google-chrome-stable in databricks cluster using apt-get install. Which has been working fine for a long time, but since the past few days it has started to fail intermittently.The following is the code that we run.%sh sudo curl -s...

  • 2335 Views
  • 3 replies
  • 2 kudos
Latest Reply
sid_de
New Contributor II
  • 2 kudos

Hi The issue was still persistent. We are trying to solve this by using docker image with preinstalled Selenium driver and chrome browser.RegardsDharmin

  • 2 kudos
2 More Replies
Anonymous
by Not applicable
  • 9413 Views
  • 3 replies
  • 1 kudos

Cluster in Pending State for long time

Pending for a long time at this stage “Finding instances for new nodes, acquiring more instances if necessary”. How can this be fixed?

  • 9413 Views
  • 3 replies
  • 1 kudos
Latest Reply
Databricks_Buil
New Contributor III
  • 1 kudos

Figured out after multiple connects that This is typically a cloud provider issue. You can file a support ticket if the issue persists.

  • 1 kudos
2 More Replies
Anonymous
by Not applicable
  • 5889 Views
  • 9 replies
  • 7 kudos

Resolved! data frame takes unusually long time to write for small data sets

We have configured workspace with own vpc. We need to extract data from DB2 and write as delta format. we tried to for 550k records with 230 columns, it took 50mins to complete the task. 15mn records takes more than 18hrs. Not sure why this takes suc...

  • 5889 Views
  • 9 replies
  • 7 kudos
Latest Reply
elgeo
Valued Contributor II
  • 7 kudos

Hello. We face exactly the same issue. Reading is quick but writing takes long time. Just to clarify that it is about a table with only 700k rows. Any suggestions please? Thank youremote_table = spark.read.format ( "jdbc" ) \.option ( "driver" , "com...

  • 7 kudos
8 More Replies
LukaszJ
by Contributor III
  • 2461 Views
  • 7 replies
  • 1 kudos

Resolved! Long time turning on another notebook

Hello,I want to run some notebooks from notebook "A".And regardless of the contents of the some notebook, it is run for a long time (20 seconds). It is constans value and I do not know why it takes so long.I tried run simple notebook with one input p...

  • 2461 Views
  • 7 replies
  • 1 kudos
Latest Reply
LukaszJ
Contributor III
  • 1 kudos

Okay I am not able to set the same session for the both notebooks (parent and children).So my result is to use %run ./notebook_name .I put all the code to functions and now I can use them.Example:# Children notebook def do_something(param1, param2): ...

  • 1 kudos
6 More Replies
Labels