cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

SrinuM
by New Contributor III
  • 180 Views
  • 4 replies
  • 1 kudos

CLOUD_PROVIDER_LAUNCH_FAILURE (CLOUD_FAILURE) for workflow job with all-purpose cluster

One of our databricks workflow job is failing occasionally with below error, after re-running and working fine without any issue.What is the exact reason for the issue and how can we fix itError:Unexpected failure while waiting for the cluster to be ...

  • 180 Views
  • 4 replies
  • 1 kudos
Latest Reply
PSR100
New Contributor
  • 1 kudos

These are cloud provider related errors and we will not have much error details from the error message. Based on the error message and also, that you have enough CPU/VM quota available, I think the issue might be due to the storage creation stage in ...

  • 1 kudos
3 More Replies
ksenija
by Contributor
  • 94 Views
  • 2 replies
  • 1 kudos

DLT pipeline - silver table, joining streaming data

Hello!I'm trying to do my modeling in DLT pipelines. For bronze, I created 3 streaming views. When I try to join them to create silver table, I got an error that I can't join stream and stream without watermarks. I tried adding them but then I got no...

  • 94 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ravivarma
New Contributor III
  • 1 kudos

Hello @ksenija , Greetings! Streaming uses watermarks to control the threshold for how long to continue processing updates for a given state entity. Common examples of state entities include: Aggregations over a time window. Unique keys in a join b...

  • 1 kudos
1 More Replies
thackman
by New Contributor
  • 88 Views
  • 3 replies
  • 0 kudos

Databricks cluster random slow start times.

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python l...

thackman_1-1720639616797.png thackman_0-1720639478363.png
  • 88 Views
  • 3 replies
  • 0 kudos
Latest Reply
PSR100
New Contributor
  • 0 kudos

Sometimes, there can be a delay in init script execution. But based on the screenshot, there are no init script logs and also as you have mentioned, there are only 2 libraries to be installed on the cluster. So this should not take much time to insta...

  • 0 kudos
2 More Replies
alpine
by New Contributor
  • 1784 Views
  • 4 replies
  • 0 kudos

Deploy lock force acquired error when deploying asset bundle using databricks cli

I'm running this command on a DevOps pipeline.databricks bundle deploy -t devI receive this error and have tried using --force-lock but it still doesn't work.Error: deploy lock force acquired by name@company.com at 2024-02-20 16:38:34.99794209 +0000 ...

  • 1784 Views
  • 4 replies
  • 0 kudos
Latest Reply
manish1987c
New Contributor III
  • 0 kudos

why we geneally get this error any specific reason

  • 0 kudos
3 More Replies
Littlesheep_
by New Contributor
  • 140 Views
  • 2 replies
  • 0 kudos

How to run a notebook in a .py file in databricks

The situation is that my colleague was using pycharm and now needs to adapt to databricks. They are now doing their job by connecting VScode to databricks and run the .py file using databricks clusters.The problem is they want to call a notebook in d...

  • 140 Views
  • 2 replies
  • 0 kudos
Latest Reply
jacovangelder
Contributor III
  • 0 kudos

You can do so by adding your own dbutils function in your py file: def get_dbutils(): """ This is to make your local env (and flake happy.""" from pyspark.sql import SparkSession spark = SparkSession.getActiveSession() if spark....

  • 0 kudos
1 More Replies
RKNutalapati
by Valued Contributor
  • 1250 Views
  • 3 replies
  • 0 kudos

Jobs API "run now" - How to set task wise parameters

I have a job with multiple tasks like Task1 -> Task2 -> Task3. I am trying to call the job using api "run now". Task details are belowTask1 - It executes a Note Book with some input parametersTask2 - It runs using "ABC.jar", so its a jar based task ...

  • 1250 Views
  • 3 replies
  • 0 kudos
Latest Reply
Harsha777
New Contributor
  • 0 kudos

Hi,It would be a good feature to pass parameters at task level. We have scenarios where we would like to create a job with multiple tasks (notebook/dbt) and pass parameters at task level.

  • 0 kudos
2 More Replies
hadoan
by New Contributor II
  • 231 Views
  • 3 replies
  • 1 kudos

How to define DLT table with cyclic reference

 @Dlt.table def table_A(): return ( dlt.read_stream(...) ) @dlt.table def table_join_A_and_C(): df_A = dlt.read_stream(table_A) df_C = dlt.read_stream(table_C) return ( ....df_A.join(df_C) ) @dlt.table def table_C(): return ( ...

  • 231 Views
  • 3 replies
  • 1 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 1 kudos

Hi @hadoan , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback ...

  • 1 kudos
2 More Replies
Ram-Dev7
by New Contributor
  • 96 Views
  • 2 replies
  • 0 kudos

Query on using secret scope for dbt-core integration with databricks workflow

Hello all,I am currently configuring dbt-core with Azure Databricks Workflow and using Azure Databricks M2M (Machine-to-Machine) authentication for this setup. I have the cluster ID and cluster secret ID stored in Databricks secret scope.I am seeking...

  • 96 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @Ram-Dev7 , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
safoineext
by New Contributor
  • 94 Views
  • 2 replies
  • 0 kudos

Uploading wheel using `dbutils.fs.cp` to workspace and install it in Runtime>15

I have been trying to find an alternative to copying a wheel file from my local file system to Databricks and then installing it into the cluster. Doing this databricks_client.dbutils.fs.cp("file:/local..../..whl", "dbfs:/Workspace/users/..../..whl")...

safoineext_0-1720009993682.png
  • 94 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @safoineext , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedb...

  • 0 kudos
1 More Replies
Mahesh_Yadav
by New Contributor
  • 91 Views
  • 2 replies
  • 0 kudos

System Access Column lineage showing inaccurate results

Hi All,I have been trying to leverage the system column lineage table to check the overall journey of a column. But i am getting inaccurate results wherever unpivot transformations are used.Instead of showing the results in a way that 20 columns are ...

Mahesh_Yadav_1-1719985303244.png
  • 91 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @Mahesh_Yadav , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your fee...

  • 0 kudos
1 More Replies
beautrincia
by New Contributor
  • 123 Views
  • 2 replies
  • 0 kudos

How to get data permissions from Sharepoint and Confluence to Unity Catalog for RAG LLM chatbot

We're implementing a chatbot where documents in SharePoint and pages in Confluence augment the results. We want to adhere to existing RBAC policies in these data sources so that the chatbot doesn't produce results that someone should not see. Are you...

  • 123 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @beautrincia , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 0 kudos
1 More Replies
Tiwarisk
by New Contributor II
  • 525 Views
  • 5 replies
  • 3 kudos

How can I preserve the data type of the delta tables while writing to Azure blob storage ?

I am writing a file using this but the data type of columns get changed while reading. df.write.format("com.crealytics.spark.excel").option("header", "true").mode("overwrite").save(path) Due to this I have to manual change every time as I can't chang...

  • 525 Views
  • 5 replies
  • 3 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 3 kudos

Hi @Tiwarisk , Thank you for reaching out to our community! We're here to help you.To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback...

  • 3 kudos
4 More Replies
938452
by New Contributor III
  • 7875 Views
  • 5 replies
  • 3 kudos

Resolved! Executor memory increase limitation based on node type

Hi Databricks community,I'm using Databricks Jobs Cluster to run some jobs. I'm setting the worker and driver type to AWS m6gd.large, which has 2 cores and 8G of memory each.After seeing it's defaulting executor memory to 2G, I wanted to increase it,...

  • 7875 Views
  • 5 replies
  • 3 kudos
Latest Reply
938452
New Contributor III
  • 3 kudos

I think I found the right answer here: https://kb.databricks.com/en_US/clusters/spark-shows-less-memoryIt seems it sets fixed size of ~4GB is used for internal node services. So depending on the node type, `spark.executor.memory` is fixed by Databric...

  • 3 kudos
4 More Replies
Karthig
by New Contributor III
  • 19893 Views
  • 14 replies
  • 8 kudos

Error Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient - while trying to create database

Hello All,I get the org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient while trying to create a database scr...

image image.png image
  • 19893 Views
  • 14 replies
  • 8 kudos
Latest Reply
mroy
Contributor
  • 8 kudos

Alright, we've implemented a workaround for this, and so far it's been working very well:First, we created a reusable notebook to wait until Hive has been initialized (see code below).We then execute this notebook using the %run command at the top of...

  • 8 kudos
13 More Replies
Mathias_Peters
by Contributor
  • 74 Views
  • 1 replies
  • 2 kudos

Resolved! Service principal seemingly cannot access its own workspace folder

We have implemented an asset bundle (DAB) that creates a wheel. During DAB deployment, the wheel is built and stored in the folder of the service principal running the deployment via GH workflow. The full path is/Workspace/Users/SERVICE-PRINCIPAL-ID/...

  • 74 Views
  • 1 replies
  • 2 kudos
Latest Reply
Mathias_Peters
Contributor
  • 2 kudos

Update: When moving the whl to the shared workspace folder, the installation on the cluster works. 

  • 2 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels