cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

biafch
by New Contributor III
  • 363 Views
  • 2 replies
  • 0 kudos

Resolved! Failure starting repl. Try detaching and re-attaching the notebook

I just started my manual cluster this morning in the production environment to run some code and it isn't executing and giving me the error "Failure starting repl. Try detaching and re-attaching the notebook.".What can I do to solve this?I have tried...

  • 363 Views
  • 2 replies
  • 0 kudos
Latest Reply
biafch
New Contributor III
  • 0 kudos

Just in case anyone needs to know how to solve this in the future.Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.So I fixed this by forcing specific versions in my global init s...

  • 0 kudos
1 More Replies
biafch
by New Contributor III
  • 236 Views
  • 2 replies
  • 0 kudos

Resolved! Runtime 11.3 LTS not working in my production

Hello,I have a cluster with Runtime 11.3 LTS in my production. Whenever I start this up and try to run my notebooks it's giving me error: Failure starting repl. Try detaching and re-attaching the notebook. I have a cluster with the same Runtime in my...

  • 236 Views
  • 2 replies
  • 0 kudos
Latest Reply
biafch
New Contributor III
  • 0 kudos

Just in case anyone needs to know how to solve this in the future.Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.So I fixed this by forcing specific versions in my global init s...

  • 0 kudos
1 More Replies
johnb1
by Contributor
  • 19482 Views
  • 14 replies
  • 10 kudos

Problems with pandas.read_parquet() and path

I am doing the "Data Engineering with Databricks V2" learning path.I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:%run ../Includes/Classroom-Setup-04.2Screenshot 1: Inside the setup note...

MicrosoftTeams-image MicrosoftTeams-image (1) Capture Capture_2
  • 19482 Views
  • 14 replies
  • 10 kudos
Latest Reply
jonathanchcc
New Contributor III
  • 10 kudos

Thanks for sharing this helped me too 

  • 10 kudos
13 More Replies
Stephanos
by New Contributor
  • 266 Views
  • 0 replies
  • 0 kudos

Sequencing Job Deployments with Databricks Asset Bundles

Hello Databricks Community!I'm working on a project where I need to deploy jobs in a specific sequence using Databricks Asset Bundles. Some of my jobs (let's call them coordination jobs) depend on other jobs (base jobs) and need to look up their job ...

  • 266 Views
  • 0 replies
  • 0 kudos
ImAbhishekTomar
by New Contributor III
  • 260 Views
  • 2 replies
  • 0 kudos

drop duplicate in 500B records

I’m trying to drop duplicate in a DF where I have 500B records I’m trying to delete  based on multiple columns but this process it’s takes 5h, I try lot of things that available on internet but nothing is works for me.my code is like this.df_1=spark....

  • 260 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
New Contributor III
  • 0 kudos

Drop the duplicates from the df_1 and df_2 first and then do the join.If the join is just a city code, then most likely you know which rows in df_2 and in df_1 will give you the duplicates in df_join. So drop in df_1 and drop in df_2 instead of df_jo...

  • 0 kudos
1 More Replies
dashawn
by New Contributor
  • 1733 Views
  • 4 replies
  • 1 kudos

DLT Pipeline Error Handling

Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...

Data Engineering
Delta Live Tables
  • 1733 Views
  • 4 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Thank you for sharing this @Kaniz_Fatma. @dashawn did you were able to check Kaniz's docs? do you still need help or shall you accept Kaniz's solution? 

  • 1 kudos
3 More Replies
eriodega
by New Contributor III
  • 202 Views
  • 1 replies
  • 0 kudos

Resolved! Escaping $ (dollar sign) in a regex backreference in notebook (so not seen as a parameter)

I am trying to do a regular expression replace in a Databricks notebook.The following query works fine in a regular query (i.e. not running it in a cell in a notebook):  select regexp_replace('abcd', '^(.+)c(.+)$', '$1_$2') --normally outputs ab_d  H...

  • 202 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
New Contributor III
  • 0 kudos

Hi, just put a backslash before $ as an escape character: 

  • 0 kudos
geronimo_signol
by New Contributor
  • 164 Views
  • 1 replies
  • 0 kudos

ISSUE: PySpark task exception handling on "Shared Compute" cluster

I am experiencing an issue with a PySpark job that behaves differently depending on the compute environment in Databricks. And this is blocking us from deploying the job into the PROD environment for our planned release.Specifically:- When running th...

  • 164 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
New Contributor III
  • 0 kudos

Hi @geronimo_signol ,Recently, other user has reported similar behavior on shared clusters, and both issues seem to be related to Spark Connect.To verify whether your cluster is using Spark Connect, please run the following code in your notebook: pri...

  • 0 kudos
Data_Engineer3
by Contributor III
  • 2192 Views
  • 3 replies
  • 0 kudos

Default maximum spark streaming chunk size in delta files in each batch?

working with delta files spark structure streaming , what is the maximum default chunk size in each batch?How do identify this type of spark configuration in databricks?#[Databricks SQL]​ #[Spark streaming]​ #[Spark structured streaming]​ #Spark​ 

  • 2192 Views
  • 3 replies
  • 0 kudos
Latest Reply
NandiniN
Esteemed Contributor III
  • 0 kudos

Hello @KARTHICK N​ ,The default value for spark.sql.files.maxPartitionBytes is 128 MB. These defaults are in the Apache Spark documentation https://spark.apache.org/docs/latest/sql-performance-tuning.html (unless there might be some overrides).To che...

  • 0 kudos
2 More Replies
annetemplon
by New Contributor II
  • 266 Views
  • 3 replies
  • 0 kudos

Explaining the explain plan

Hi All,I am new to Databricks and have recently started exploring databricks' explain plans to try and understand how the queries are executed (and eventually tune them as needed).There are some things that I can somehow "guess" based on what I know ...

  • 266 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor
  • 0 kudos

Hi @annetemplon ,There are plenty of resources about this topic but they are scattered all over internet  I like below videos, pretty informative:https://m.youtube.com/watch?v=99fYi2mopbshttps://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&u...

  • 0 kudos
2 More Replies
Databricks143
by New Contributor III
  • 11229 Views
  • 14 replies
  • 3 kudos

Recrusive cte in databrick sql

Hi Team,How to write recrusive cte in databricks SQL.Please let me know any one have solution for this 

  • 11229 Views
  • 14 replies
  • 3 kudos
Latest Reply
dlehmann
New Contributor II
  • 3 kudos

Hello @filipniziol , I went with your second suggestion as i preferred to use views in this case. It works very well as there is a limited depth and i could just write that many unions.Thanks for your response!

  • 3 kudos
13 More Replies
s3
by New Contributor II
  • 10471 Views
  • 5 replies
  • 8 kudos

Resolved! notebook for SFTP server connectivity without password.

I am trying to develop some script using python to access an sftp server without password and all valid public/private keys in a notebook. However I am not getting any such example. All examples has a password in it. Can I get some help?

  • 10471 Views
  • 5 replies
  • 8 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 8 kudos

Hi @soumen sarangi​ , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

  • 8 kudos
4 More Replies
dener
by New Contributor
  • 187 Views
  • 0 replies
  • 0 kudos

Infinity load execution

I am experiencing performance issues when loading a table with 50 million rows into Delta Lake on AWS using Databricks. Despite successfully handling other larger tables, this especific table/process takes hours and doesn't finish. Here's the command...

  • 187 Views
  • 0 replies
  • 0 kudos
dcrezee
by New Contributor III
  • 135 Views
  • 0 replies
  • 0 kudos

workflow set maximum queued items

Hi all,I have a question regarding Workflows and queuing of job runs. I'm running into a case where jobs are running longer than expected and result in job runs being queued, which is expected and desired. However, in this particular case we only nee...

  • 135 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels