cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jlanglois98
by New Contributor II
  • 783 Views
  • 2 replies
  • 0 kudos

Bootstrap timeout during cluster start

Hi all, I am getting the following error when I try to start a cluster in our Databricks workspace for east us 2:Bootstrap Timeout:Compute terminated. Reason: Bootstrap TimeoutHelpBootstrap Timeout. Please try again later. Instance bootstrap failed c...

  • 783 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @jlanglois98 ,Take a look at below thread. Similar issue:Solved: Re: Problem with spinning up a cluster on a new wo... - Databricks Community - 29996

  • 0 kudos
1 More Replies
vannipart
by New Contributor III
  • 225 Views
  • 0 replies
  • 0 kudos

Volumes unzip files

I have this shell unzip that I use to unzip files %shsudo apt-get updatesudo apt-get install -y p7zip-full But when it comes to new workspace, I get error sudo: a terminal is required to read the password; either use the -S option to read from standa...

  • 225 Views
  • 0 replies
  • 0 kudos
ajbush
by New Contributor III
  • 16106 Views
  • 8 replies
  • 2 kudos

Connecting to Snowflake using an SSO user from Azure Databricks

Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...

  • 16106 Views
  • 8 replies
  • 2 kudos
Latest Reply
BobGeor_68322
New Contributor II
  • 2 kudos

we ended up using device flow oauth because, as noted above, it is not possible to launch a browser on the Databricks cluster from a notebook so you cannot use "externalBrowser" flow. It gives you a url and a code and you open the url in a new tab an...

  • 2 kudos
7 More Replies
Steve_Harrison
by New Contributor III
  • 228 Views
  • 0 replies
  • 0 kudos

Invalid Path when getting Notebook Path

The undocumented feature to get a notebook path as is great but it does not actually return a valid path that can be used in python, e.g.:from pathlib import Pathprint(Path(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPat...

  • 228 Views
  • 0 replies
  • 0 kudos
Thor
by New Contributor III
  • 175 Views
  • 0 replies
  • 0 kudos

Native code in Databricks clusters

Is it possible to install our own binaries (lib or exec) in Databricks clusters and use JNI to execute them?I guess that Photon is native code as far as I could read so it must use a similar technic.

  • 175 Views
  • 0 replies
  • 0 kudos
guangyi
by Contributor II
  • 274 Views
  • 1 replies
  • 0 kudos

How to identify the mandatory fields of the create clusters API

After several attempts I found some mandatory fields for cluster creation API: num_workers, spark_version, node_type_id. I’m not finding these fields directly against the API but via job cluster definition in the asset bundle yaml file.I ask the Chat...

  • 274 Views
  • 1 replies
  • 0 kudos
Latest Reply
guangyi
Contributor II
  • 0 kudos

And also I found the `defaultValue` in the policy definition not working. Here I give the node_type_id allow list in the policy  "node_type_id": { "defaultValue": "Standard_D8s_v3", "type": "allowlist", "values": [ ...

  • 0 kudos
alexgavrysh
by New Contributor
  • 152 Views
  • 0 replies
  • 0 kudos

Job scheduled run fail alert

Hello,I have a job that should run every six hours. I need to set up an alert for the case if this doesn't start (for example, someone paused it). How do I configure such an alert using Databricks native alerts?Theoretically, this may be done using s...

  • 152 Views
  • 0 replies
  • 0 kudos
biafch
by Contributor
  • 1428 Views
  • 2 replies
  • 1 kudos

Resolved! Failure starting repl. Try detaching and re-attaching the notebook

I just started my manual cluster this morning in the production environment to run some code and it isn't executing and giving me the error "Failure starting repl. Try detaching and re-attaching the notebook.".What can I do to solve this?I have tried...

  • 1428 Views
  • 2 replies
  • 1 kudos
Latest Reply
biafch
Contributor
  • 1 kudos

Just in case anyone needs to know how to solve this in the future.Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.So I fixed this by forcing specific versions in my global init s...

  • 1 kudos
1 More Replies
biafch
by Contributor
  • 436 Views
  • 2 replies
  • 0 kudos

Resolved! Runtime 11.3 LTS not working in my production

Hello,I have a cluster with Runtime 11.3 LTS in my production. Whenever I start this up and try to run my notebooks it's giving me error: Failure starting repl. Try detaching and re-attaching the notebook. I have a cluster with the same Runtime in my...

  • 436 Views
  • 2 replies
  • 0 kudos
Latest Reply
biafch
Contributor
  • 0 kudos

Just in case anyone needs to know how to solve this in the future.Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.So I fixed this by forcing specific versions in my global init s...

  • 0 kudos
1 More Replies
ImAbhishekTomar
by New Contributor III
  • 386 Views
  • 2 replies
  • 0 kudos

drop duplicate in 500B records

I’m trying to drop duplicate in a DF where I have 500B records I’m trying to delete  based on multiple columns but this process it’s takes 5h, I try lot of things that available on internet but nothing is works for me.my code is like this.df_1=spark....

  • 386 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor
  • 0 kudos

Drop the duplicates from the df_1 and df_2 first and then do the join.If the join is just a city code, then most likely you know which rows in df_2 and in df_1 will give you the duplicates in df_join. So drop in df_1 and drop in df_2 instead of df_jo...

  • 0 kudos
1 More Replies
dashawn
by New Contributor
  • 2171 Views
  • 3 replies
  • 1 kudos

DLT Pipeline Error Handling

Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...

Data Engineering
Delta Live Tables
  • 2171 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Thank you for sharing this @Retired_mod. @dashawn did you were able to check Kaniz's docs? do you still need help or shall you accept Kaniz's solution? 

  • 1 kudos
2 More Replies
eriodega
by Contributor
  • 429 Views
  • 1 replies
  • 0 kudos

Resolved! Escaping $ (dollar sign) in a regex backreference in notebook (so not seen as a parameter)

I am trying to do a regular expression replace in a Databricks notebook.The following query works fine in a regular query (i.e. not running it in a cell in a notebook):  select regexp_replace('abcd', '^(.+)c(.+)$', '$1_$2') --normally outputs ab_d  H...

  • 429 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor
  • 0 kudos

Hi, just put a backslash before $ as an escape character: 

  • 0 kudos
geronimo_signol
by New Contributor
  • 294 Views
  • 1 replies
  • 0 kudos

ISSUE: PySpark task exception handling on "Shared Compute" cluster

I am experiencing an issue with a PySpark job that behaves differently depending on the compute environment in Databricks. And this is blocking us from deploying the job into the PROD environment for our planned release.Specifically:- When running th...

  • 294 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor
  • 0 kudos

Hi @geronimo_signol ,Recently, other user has reported similar behavior on shared clusters, and both issues seem to be related to Spark Connect.To verify whether your cluster is using Spark Connect, please run the following code in your notebook: pri...

  • 0 kudos
annetemplon
by New Contributor II
  • 448 Views
  • 3 replies
  • 0 kudos

Explaining the explain plan

Hi All,I am new to Databricks and have recently started exploring databricks' explain plans to try and understand how the queries are executed (and eventually tune them as needed).There are some things that I can somehow "guess" based on what I know ...

  • 448 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @annetemplon ,There are plenty of resources about this topic but they are scattered all over internet  I like below videos, pretty informative:https://m.youtube.com/watch?v=99fYi2mopbshttps://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&u...

  • 0 kudos
2 More Replies
Databricks143
by New Contributor III
  • 14047 Views
  • 14 replies
  • 3 kudos

Recrusive cte in databrick sql

Hi Team,How to write recrusive cte in databricks SQL.Please let me know any one have solution for this 

  • 14047 Views
  • 14 replies
  • 3 kudos
Latest Reply
dlehmann
New Contributor III
  • 3 kudos

Hello @filipniziol , I went with your second suggestion as i preferred to use views in this case. It works very well as there is a limited depth and i could just write that many unions.Thanks for your response!

  • 3 kudos
13 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels