cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

wesg2
by New Contributor
  • 1413 Views
  • 1 replies
  • 0 kudos

Programmatically create Databricks Notebook

I am creating a databricks notebook via string concats (sample below)Notebook_Head = """# Databricks notebook source# from pyspark.sql.types import StringType# from pyspark.sql.functions import split# COMMAND ----------"""Full_NB = Notebook_Head + Mi...

  • 1413 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @wesg2 ,One needs to be very precise when building this.The below code WORKS:# Define the content of the .py file with cell separators (Works!) notebook_content = """# Databricks notebook source # This is the header of the notebook # You can add i...

  • 0 kudos
DBUser2
by New Contributor III
  • 2378 Views
  • 2 replies
  • 0 kudos

How to use transaction when connecting to Databricks using Simba ODBC driver

I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the below error, an...

  • 2378 Views
  • 2 replies
  • 0 kudos
Latest Reply
florence023
New Contributor III
  • 0 kudos

@DBUser2 wrote:I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the ...

  • 0 kudos
1 More Replies
FabriceDeseyn
by Contributor
  • 12777 Views
  • 6 replies
  • 6 kudos

Resolved! What does autoloader's cloudfiles.backfillInterval do?

I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection ...

image
  • 12777 Views
  • 6 replies
  • 6 kudos
Latest Reply
822025
New Contributor II
  • 6 kudos

If we set the backfill to 1 week, will it run only 1ce a week or rather it will look for old files not processed in every trigger ?For eg :- if we set it to 1 day and the job runs every hour, then will it look for files in past 24 hours on a sliding ...

  • 6 kudos
5 More Replies
jlanglois98
by New Contributor II
  • 4065 Views
  • 2 replies
  • 0 kudos

Bootstrap timeout during cluster start

Hi all, I am getting the following error when I try to start a cluster in our Databricks workspace for east us 2:Bootstrap Timeout:Compute terminated. Reason: Bootstrap TimeoutHelpBootstrap Timeout. Please try again later. Instance bootstrap failed c...

  • 4065 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @jlanglois98 ,Take a look at below thread. Similar issue:Solved: Re: Problem with spinning up a cluster on a new wo... - Databricks Community - 29996

  • 0 kudos
1 More Replies
ajbush
by New Contributor III
  • 29067 Views
  • 8 replies
  • 3 kudos

Connecting to Snowflake using an SSO user from Azure Databricks

Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...

  • 29067 Views
  • 8 replies
  • 3 kudos
Latest Reply
BobGeor_68322
New Contributor III
  • 3 kudos

we ended up using device flow oauth because, as noted above, it is not possible to launch a browser on the Databricks cluster from a notebook so you cannot use "externalBrowser" flow. It gives you a url and a code and you open the url in a new tab an...

  • 3 kudos
7 More Replies
biafch
by Contributor
  • 13214 Views
  • 2 replies
  • 4 kudos

Resolved! Failure starting repl. Try detaching and re-attaching the notebook

I just started my manual cluster this morning in the production environment to run some code and it isn't executing and giving me the error "Failure starting repl. Try detaching and re-attaching the notebook.".What can I do to solve this?I have tried...

  • 13214 Views
  • 2 replies
  • 4 kudos
Latest Reply
biafch
Contributor
  • 4 kudos

Just in case anyone needs to know how to solve this in the future.Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.So I fixed this by forcing specific versions in my global init s...

  • 4 kudos
1 More Replies
biafch
by Contributor
  • 1891 Views
  • 2 replies
  • 0 kudos

Resolved! Runtime 11.3 LTS not working in my production

Hello,I have a cluster with Runtime 11.3 LTS in my production. Whenever I start this up and try to run my notebooks it's giving me error: Failure starting repl. Try detaching and re-attaching the notebook. I have a cluster with the same Runtime in my...

  • 1891 Views
  • 2 replies
  • 0 kudos
Latest Reply
biafch
Contributor
  • 0 kudos

Just in case anyone needs to know how to solve this in the future.Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.So I fixed this by forcing specific versions in my global init s...

  • 0 kudos
1 More Replies
ImAbhishekTomar
by New Contributor III
  • 1086 Views
  • 2 replies
  • 0 kudos

drop duplicate in 500B records

I’m trying to drop duplicate in a DF where I have 500B records I’m trying to delete  based on multiple columns but this process it’s takes 5h, I try lot of things that available on internet but nothing is works for me.my code is like this.df_1=spark....

  • 1086 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Drop the duplicates from the df_1 and df_2 first and then do the join.If the join is just a city code, then most likely you know which rows in df_2 and in df_1 will give you the duplicates in df_join. So drop in df_1 and drop in df_2 instead of df_jo...

  • 0 kudos
1 More Replies
dashawn
by New Contributor
  • 5518 Views
  • 3 replies
  • 1 kudos

DLT Pipeline Error Handling

Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...

Data Engineering
Delta Live Tables
  • 5518 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Thank you for sharing this @Retired_mod. @dashawn did you were able to check Kaniz's docs? do you still need help or shall you accept Kaniz's solution? 

  • 1 kudos
2 More Replies
eriodega
by Contributor
  • 3179 Views
  • 1 replies
  • 0 kudos

Resolved! Escaping $ (dollar sign) in a regex backreference in notebook (so not seen as a parameter)

I am trying to do a regular expression replace in a Databricks notebook.The following query works fine in a regular query (i.e. not running it in a cell in a notebook):  select regexp_replace('abcd', '^(.+)c(.+)$', '$1_$2') --normally outputs ab_d  H...

  • 3179 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi, just put a backslash before $ as an escape character: 

  • 0 kudos
geronimo_signol
by New Contributor
  • 831 Views
  • 1 replies
  • 0 kudos

ISSUE: PySpark task exception handling on "Shared Compute" cluster

I am experiencing an issue with a PySpark job that behaves differently depending on the compute environment in Databricks. And this is blocking us from deploying the job into the PROD environment for our planned release.Specifically:- When running th...

  • 831 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @geronimo_signol ,Recently, other user has reported similar behavior on shared clusters, and both issues seem to be related to Spark Connect.To verify whether your cluster is using Spark Connect, please run the following code in your notebook: pri...

  • 0 kudos
annetemplon
by New Contributor II
  • 1657 Views
  • 3 replies
  • 0 kudos

Explaining the explain plan

Hi All,I am new to Databricks and have recently started exploring databricks' explain plans to try and understand how the queries are executed (and eventually tune them as needed).There are some things that I can somehow "guess" based on what I know ...

  • 1657 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @annetemplon ,There are plenty of resources about this topic but they are scattered all over internet  I like below videos, pretty informative:https://m.youtube.com/watch?v=99fYi2mopbshttps://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&u...

  • 0 kudos
2 More Replies
s3
by New Contributor II
  • 16773 Views
  • 4 replies
  • 8 kudos

Resolved! notebook for SFTP server connectivity without password.

I am trying to develop some script using python to access an sftp server without password and all valid public/private keys in a notebook. However I am not getting any such example. All examples has a password in it. Can I get some help?

  • 16773 Views
  • 4 replies
  • 8 kudos
Latest Reply
Atanu
Databricks Employee
  • 8 kudos

https://stackoverflow.com/questions/58562744/how-to-upload-text-file-to-ftp-from-databricks-notebook this example looks good to me. or may be try using data libs. https://www.cdata.com/kb/tech/sftp-jdbc-azure-databricks.rst

  • 8 kudos
3 More Replies
maafsl
by New Contributor II
  • 1530 Views
  • 1 replies
  • 2 kudos

Vulnerability in the Guava dependency of the Databricks jdbc driver

Good afternoon I want to report that the JDBC driver incorporates a version of com.google.guava:guava that has two vulnerabilities. Image attached. Could the dependency be updated?

  • 1530 Views
  • 1 replies
  • 2 kudos
Latest Reply
maafsl
New Contributor II
  • 2 kudos

  • 2 kudos
Dnirmania
by Contributor
  • 2457 Views
  • 2 replies
  • 1 kudos

Resolved! Dynamic Python UDF in unity catalog

Hi Team I am trying to create a python UDF which I want to use for column masking. This function will take 2 input parameters(column name and groups name) and return the column value if user is part of group otherwise return masked value. I wrote fol...

  • 2457 Views
  • 2 replies
  • 1 kudos
Latest Reply
menotron
Valued Contributor
  • 1 kudos

Hi @Dnirmania, You could achieve something similar using this UDF:%sql CREATE OR REPLACE FUNCTION ryanlakehouse.default.column_masking(column_value STRING, groups_str String) RETURNS STRING LANGUAGE SQL COMMENT 'Return the column value if use...

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels