cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jayj_us
by New Contributor
  • 439 Views
  • 1 replies
  • 0 kudos

Intellisense doesnt work most of the time

I have noticed that in the databricks SQL editor, the intellisense doesnt work most of the time. Is there a setting for this to work always. Its very anti productive to go look for table columns manually.

  • 439 Views
  • 1 replies
  • 0 kudos
Latest Reply
florence023
New Contributor III
  • 0 kudos

@jayj_us wrote:I have noticed that in the databricks SQL editor, the intellisense doesnt work most of the time. Official SiteIs there a setting for this to work always. Its very anti productive to go look for table columns manually.Hello,I understand...

  • 0 kudos
talenik
by New Contributor III
  • 630 Views
  • 1 replies
  • 0 kudos

Not able to access dbfs in init script GCP databricks

Hi Everyone, I am trying to access DBFS files while cluster is starting in init script on GCP databricks, but I am not able to list files which are there on DBFS. I tried to download files from GCS bucket as well but init script throws timeout errors...

Data Engineering
Databricks
GCP databricks
spark
  • 630 Views
  • 1 replies
  • 0 kudos
Latest Reply
jason34
New Contributor II
  • 0 kudos

Hello,To access DBFS files or download from GCS bucket within a Databricks cluster's init script, consider the following approaches:Install Databricks Connect on your local machine. Connect to your Databricks cluster using Databricks Connect. Use the...

  • 0 kudos
ggsmith
by New Contributor III
  • 604 Views
  • 2 replies
  • 0 kudos

Resolved! DLT Streaming Schema and Select

I am reading JSON files written to adls from Kafka using dlt and spark.readStream to create a streaming table for my raw ingest data. My schema is two arrays at the top levelNewRecord array, OldRecord array. I pass the schema and I run a select on Ne...

Data Engineering
dlt
streaming
  • 604 Views
  • 2 replies
  • 0 kudos
Latest Reply
ggsmith
New Contributor III
  • 0 kudos

I did a full refresh from the delta tables pipeline and that fixed it. I guess it was remembering the first run where I just had the top level arrays as two columns in the table. 

  • 0 kudos
1 More Replies
wesg2
by New Contributor
  • 411 Views
  • 1 replies
  • 0 kudos

Programmatically create Databricks Notebook

I am creating a databricks notebook via string concats (sample below)Notebook_Head = """# Databricks notebook source# from pyspark.sql.types import StringType# from pyspark.sql.functions import split# COMMAND ----------"""Full_NB = Notebook_Head + Mi...

  • 411 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @wesg2 ,One needs to be very precise when building this.The below code WORKS:# Define the content of the .py file with cell separators (Works!) notebook_content = """# Databricks notebook source # This is the header of the notebook # You can add i...

  • 0 kudos
DBUser2
by New Contributor III
  • 651 Views
  • 2 replies
  • 0 kudos

How to use transaction when connecting to Databricks using Simba ODBC driver

I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the below error, an...

  • 651 Views
  • 2 replies
  • 0 kudos
Latest Reply
florence023
New Contributor III
  • 0 kudos

@DBUser2 wrote:I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the ...

  • 0 kudos
1 More Replies
FabriceDeseyn
by Contributor
  • 7664 Views
  • 6 replies
  • 6 kudos

Resolved! What does autoloader's cloudfiles.backfillInterval do?

I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection ...

image
  • 7664 Views
  • 6 replies
  • 6 kudos
Latest Reply
822025
New Contributor II
  • 6 kudos

If we set the backfill to 1 week, will it run only 1ce a week or rather it will look for old files not processed in every trigger ?For eg :- if we set it to 1 day and the job runs every hour, then will it look for files in past 24 hours on a sliding ...

  • 6 kudos
5 More Replies
jlanglois98
by New Contributor II
  • 1532 Views
  • 2 replies
  • 0 kudos

Bootstrap timeout during cluster start

Hi all, I am getting the following error when I try to start a cluster in our Databricks workspace for east us 2:Bootstrap Timeout:Compute terminated. Reason: Bootstrap TimeoutHelpBootstrap Timeout. Please try again later. Instance bootstrap failed c...

  • 1532 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @jlanglois98 ,Take a look at below thread. Similar issue:Solved: Re: Problem with spinning up a cluster on a new wo... - Databricks Community - 29996

  • 0 kudos
1 More Replies
vannipart
by New Contributor III
  • 303 Views
  • 0 replies
  • 0 kudos

Volumes unzip files

I have this shell unzip that I use to unzip files %shsudo apt-get updatesudo apt-get install -y p7zip-full But when it comes to new workspace, I get error sudo: a terminal is required to read the password; either use the -S option to read from standa...

  • 303 Views
  • 0 replies
  • 0 kudos
ajbush
by New Contributor III
  • 17073 Views
  • 8 replies
  • 2 kudos

Connecting to Snowflake using an SSO user from Azure Databricks

Hi all,I'm just reaching out to see if anyone has information or can point me in a useful direction. I need to connect to Snowflake from Azure Databricks using the connector: https://learn.microsoft.com/en-us/azure/databricks/external-data/snowflakeT...

  • 17073 Views
  • 8 replies
  • 2 kudos
Latest Reply
BobGeor_68322
New Contributor II
  • 2 kudos

we ended up using device flow oauth because, as noted above, it is not possible to launch a browser on the Databricks cluster from a notebook so you cannot use "externalBrowser" flow. It gives you a url and a code and you open the url in a new tab an...

  • 2 kudos
7 More Replies
Thor
by New Contributor III
  • 227 Views
  • 0 replies
  • 0 kudos

Native code in Databricks clusters

Is it possible to install our own binaries (lib or exec) in Databricks clusters and use JNI to execute them?I guess that Photon is native code as far as I could read so it must use a similar technic.

  • 227 Views
  • 0 replies
  • 0 kudos
guangyi
by Contributor III
  • 348 Views
  • 1 replies
  • 0 kudos

How to identify the mandatory fields of the create clusters API

After several attempts I found some mandatory fields for cluster creation API: num_workers, spark_version, node_type_id. I’m not finding these fields directly against the API but via job cluster definition in the asset bundle yaml file.I ask the Chat...

  • 348 Views
  • 1 replies
  • 0 kudos
Latest Reply
guangyi
Contributor III
  • 0 kudos

And also I found the `defaultValue` in the policy definition not working. Here I give the node_type_id allow list in the policy  "node_type_id": { "defaultValue": "Standard_D8s_v3", "type": "allowlist", "values": [ ...

  • 0 kudos
alexgavrysh
by New Contributor
  • 206 Views
  • 0 replies
  • 0 kudos

Job scheduled run fail alert

Hello,I have a job that should run every six hours. I need to set up an alert for the case if this doesn't start (for example, someone paused it). How do I configure such an alert using Databricks native alerts?Theoretically, this may be done using s...

  • 206 Views
  • 0 replies
  • 0 kudos
biafch
by Contributor
  • 2244 Views
  • 2 replies
  • 1 kudos

Resolved! Failure starting repl. Try detaching and re-attaching the notebook

I just started my manual cluster this morning in the production environment to run some code and it isn't executing and giving me the error "Failure starting repl. Try detaching and re-attaching the notebook.".What can I do to solve this?I have tried...

  • 2244 Views
  • 2 replies
  • 1 kudos
Latest Reply
biafch
Contributor
  • 1 kudos

Just in case anyone needs to know how to solve this in the future.Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.So I fixed this by forcing specific versions in my global init s...

  • 1 kudos
1 More Replies
biafch
by Contributor
  • 547 Views
  • 2 replies
  • 0 kudos

Resolved! Runtime 11.3 LTS not working in my production

Hello,I have a cluster with Runtime 11.3 LTS in my production. Whenever I start this up and try to run my notebooks it's giving me error: Failure starting repl. Try detaching and re-attaching the notebook. I have a cluster with the same Runtime in my...

  • 547 Views
  • 2 replies
  • 0 kudos
Latest Reply
biafch
Contributor
  • 0 kudos

Just in case anyone needs to know how to solve this in the future.Apparently one of my clusters was suddenly having library compatibility issues. Mainly between pandas,numpy and pyarrow.So I fixed this by forcing specific versions in my global init s...

  • 0 kudos
1 More Replies
ImAbhishekTomar
by New Contributor III
  • 444 Views
  • 2 replies
  • 0 kudos

drop duplicate in 500B records

I’m trying to drop duplicate in a DF where I have 500B records I’m trying to delete  based on multiple columns but this process it’s takes 5h, I try lot of things that available on internet but nothing is works for me.my code is like this.df_1=spark....

  • 444 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Drop the duplicates from the df_1 and df_2 first and then do the join.If the join is just a city code, then most likely you know which rows in df_2 and in df_1 will give you the duplicates in df_join. So drop in df_1 and drop in df_2 instead of df_jo...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels