cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Eduard
by New Contributor II
  • 117373 Views
  • 3 replies
  • 1 kudos

Cluster xxxxxxx was terminated during the run.

Hello,I have a problem with the autoscaling of a cluster. Every time the autoscaling is activated I get this error. Does anyone have any idea why this could be?"Cluster xxxxxxx was terminated during the run (cluster state message: Lost communication ...

  • 117373 Views
  • 3 replies
  • 1 kudos
Latest Reply
louisgarza
New Contributor II
  • 1 kudos

Hello Databricks Community,The error message indicates that the driver node was lost, which can happen due to network issues or malfunctioning instances. Here are a few possible reasons and solutions:Instance Instability: If your cloud provider has u...

  • 1 kudos
2 More Replies
Fatimah-Tariq
by New Contributor III
  • 701 Views
  • 4 replies
  • 0 kudos

Schema update Issue in DLT

I have a pipeline in databricks with this flowSQL SERVER (Source) -> Staging (Parquet) -> Bronze (DLT) -> Silver(DLT) -> Gold (DLT)The pipeline is up and running smoothly for months but recently, there was a schema update at my source level and one o...

  • 701 Views
  • 4 replies
  • 0 kudos
Latest Reply
Fatimah-Tariq
New Contributor III
  • 0 kudos

Hi @Alberto_Umana, is there any word on how to fix my data and bring all the records back to the pipeline schema?

  • 0 kudos
3 More Replies
cpayne_vax
by New Contributor III
  • 23066 Views
  • 15 replies
  • 9 kudos

Resolved! Delta Live Tables: dynamic schema

Does anyone know if there's a way to specify an alternate Unity schema in a DLT workflow using the @Dlt.table syntax? In my case, I’m looping through folders in Azure datalake storage to ingest data. I’d like those folders to get created in different...

  • 23066 Views
  • 15 replies
  • 9 kudos
Latest Reply
abhishek_02
New Contributor II
  • 9 kudos

Hi @kuldeep-in, Could you please provide the exact location how to disable DPM enabled option as i was not able to locate it in pipeline settings or Databricks settings.Thank you

  • 9 kudos
14 More Replies
zmsoft
by Contributor
  • 568 Views
  • 1 replies
  • 0 kudos

Resolved! How do I use the azure databricks dlt pipeline to consume azure Event Center data

Hi there, How do I use the azure databricks dlt pipeline to consume azure Event Hub dataCode :TOPIC = "myeventhub" KAFKA_BROKER = "" GROUP_ID = "group_dev" raw_kafka_events = (spark.readStream .format("kafka") .option("subscribe", EH_NAME) .opt...

  • 568 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi there @zmsoft ,Did you have a look at this ref doc : https://docs.databricks.com/aws/en/dlt/event-hubsThis might help

  • 0 kudos
Brianben
by New Contributor III
  • 487 Views
  • 1 replies
  • 0 kudos

Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk

Hi all,We have a daily Databricks job that downloads excel files from SharePoint and read them, the job works fine until today (3March). We are getting the following error message when running the code to read the excel file:org.apache.spark.SparkExc...

  • 487 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Contributor III
  • 0 kudos

I think the issue comes from installing Office365-REST-Python-Client using dbutils.library.installPyPI, which seems to create a conflicting Python environment for Spark executors. Since notebook specific installs modify the environment dynamically, t...

  • 0 kudos
xx123
by New Contributor III
  • 315 Views
  • 1 replies
  • 0 kudos

ETL Pipeline work fine, but when executed via Workflow it fails due to StorageAccessError

I have a fairly simple ETL Pipeline that uses dlt. It streams data from ADLS2 SA and creates materialized view using two tables. It works fine when i execute it on its own. Materialized view is properly refreshed.Now I wanted to add this as a task to...

xx123_0-1741285604974.png xx123_1-1741285627523.png
  • 315 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Contributor II
  • 0 kudos

Hi @xx123,Could you please provide more details on the cluster configuration?. I am guessing the cluster policy you might be using for deploying the job in workflow / and when you are testing might be different. Please try to use same cluster policy ...

  • 0 kudos
Nalapriya
by New Contributor II
  • 1024 Views
  • 3 replies
  • 0 kudos

I've data in s3/Iceberg tables. How to read it using databricks SparkSQL ?

I tried this method: df = spark.read.format("iceberg").load("s3-bucket-path")But got an error: Multiple sources found for iceberg (com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource, org.apache.iceberg.spark.source.Icebe...

  • 1024 Views
  • 3 replies
  • 0 kudos
Latest Reply
Nalapriya
New Contributor II
  • 0 kudos

Hi @Alberto_Umana, I tried the steps you've provided but still I'm not able to read data which is in iceberg format. It would be useful if I get any other suggestions.

  • 0 kudos
2 More Replies
Srujanm01
by New Contributor III
  • 2314 Views
  • 1 replies
  • 0 kudos

Databricks Managed RG Storage cost is High

Hi Community,How to calculate the databricks storage cost and where to see the data which is stored and charged in databricks.I'm trying to understand the storage cost on a managed resource group and i'm clueless about the data and where it is stored...

  • 2314 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi,How are you doing today? To understand Databricks storage costs in Azure, you can check where your data is stored and how it’s being charged. Managed tables, DBFS files, and Unity Catalog volumes are usually stored in an Azure Data Lake Storage (A...

  • 0 kudos
narendra11
by New Contributor
  • 464 Views
  • 1 replies
  • 1 kudos

Need to run update statement from databricks using azure sql pyodbc connection

Hi All, I was Trying to run the update statement in data bricks notebook using pyodbc connection. while i was doing I was getting following error. I need assistance to solve this.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODB...

  • 464 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi Narendra,How are you doing today? As per my understanding, It looks like your Databricks notebook can't find the ODBC Driver 17 for SQL Server. You can first check if the driver is installed by running !odbcinst -q -d in a notebook cell. If it's m...

  • 1 kudos
BobCat62
by New Contributor III
  • 1002 Views
  • 3 replies
  • 0 kudos

Resolved! Delta Live Tables are refreshed in parallel rather than sequentially

Hi experts,I have defined my DLT Pipeline as follows:-- Define a streaming table to ingest data from a volume CREATE OR REFRESH STREAMING TABLE pumpdata_bronze TBLPROPERTIES ("myCompanyPipeline.quality" = "bronze") AS SELECT * FROM cloud_files("abfss...

  • 1002 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi @BobCat62 ,So the thing is Now dlt has different modes dlt direct publishing mode , classic mode(legacy). Look here for mode details : https://docs.databricks.com/aws/en/release-notes/product/2025/january#dlt-now-supports-publishing-to-tables-in-m...

  • 0 kudos
2 More Replies
Venugopal
by New Contributor III
  • 1911 Views
  • 5 replies
  • 1 kudos

databricks asset bundles: Unable to fetch variables from variable-overrides.json

Hi,I am using Databricks CLI 0.227.1 for creating a bundle project to deploy job.As per this , https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/variables I wanted to have variable-overrides.json to have my variables.I created a js...

  • 1911 Views
  • 5 replies
  • 1 kudos
Latest Reply
Venugopal
New Contributor III
  • 1 kudos

@ashraf1395 any thoughts on the above issue?

  • 1 kudos
4 More Replies
NehaR
by New Contributor III
  • 2875 Views
  • 4 replies
  • 2 kudos

Set time out or Auto termination for long running query

Hi ,We want to set auto termination for long running queries in data bricks adhoc cluster.I attempted below two approaches in my notebook. Despite my understanding that queries should automatically terminate after one hour, with both the approaches q...

  • 2875 Views
  • 4 replies
  • 2 kudos
Latest Reply
JissMathew
Valued Contributor
  • 2 kudos

Hi @NehaR  Apply these settings at the cluster-level configuration in the Databricks UI:Go to the Cluster Settings.Add the following Spark configuration:spark.databricks.queryWatchdog.enabled truespark.databricks.queryWatchdog.timeout 3600Restart the...

  • 2 kudos
3 More Replies
tp992
by New Contributor II
  • 2002 Views
  • 1 replies
  • 0 kudos

Using pyspark databricks UDFs with outside function imports

Problem with minimal exampleThe below minimal example does not run locally with databricks-connect==15.3 but does run within databricks workspace.main.pyfrom databricks.connect import DatabricksSession from module.udf import send_message, send_compl...

  • 2002 Views
  • 1 replies
  • 0 kudos
Latest Reply
tp992
New Contributor II
  • 0 kudos

I think the solution is in .addArtifact if I read this:https://kb.databricks.com/en_US/clusters/cannot-access-apache-sparkcontext-object-using-addpyfilehttps://www.databricks.com/blog/python-dependency-management-spark-connect But have not gotten it ...

  • 0 kudos
yorkuDE01
by New Contributor II
  • 524 Views
  • 2 replies
  • 1 kudos

Resolved! Keyvault reference for federated connection setup - Azure

I am trying to create a federated connection in unity catalog for an Oracle Database. The connection configuration GUI seems to ask for the password. Is it possible to put a keyvault reference here instead?  

Screenshot 2025-03-07 at 12.39.22 PM.png
  • 524 Views
  • 2 replies
  • 1 kudos
Latest Reply
Nivethan_Venkat
Contributor II
  • 1 kudos

Hi @yorkuDE01,I suppose this could be done when you are trying to create / setup federated connection using API. But, I don't think so this could be possible via UI, where you can reference a key-vault scoped secret.But please refer the documentation...

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels