cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Fatimah-Tariq
by New Contributor III
  • 844 Views
  • 4 replies
  • 0 kudos

Schema update Issue in DLT

I have a pipeline in databricks with this flowSQL SERVER (Source) -> Staging (Parquet) -> Bronze (DLT) -> Silver(DLT) -> Gold (DLT)The pipeline is up and running smoothly for months but recently, there was a schema update at my source level and one o...

  • 844 Views
  • 4 replies
  • 0 kudos
Latest Reply
Fatimah-Tariq
New Contributor III
  • 0 kudos

Hi @Alberto_Umana, is there any word on how to fix my data and bring all the records back to the pipeline schema?

  • 0 kudos
3 More Replies
cpayne_vax
by New Contributor III
  • 24382 Views
  • 15 replies
  • 9 kudos

Resolved! Delta Live Tables: dynamic schema

Does anyone know if there's a way to specify an alternate Unity schema in a DLT workflow using the @Dlt.table syntax? In my case, I’m looping through folders in Azure datalake storage to ingest data. I’d like those folders to get created in different...

  • 24382 Views
  • 15 replies
  • 9 kudos
Latest Reply
abhishek_02
New Contributor II
  • 9 kudos

Hi @kuldeep-in, Could you please provide the exact location how to disable DPM enabled option as i was not able to locate it in pipeline settings or Databricks settings.Thank you

  • 9 kudos
14 More Replies
zmsoft
by Contributor
  • 667 Views
  • 1 replies
  • 0 kudos

Resolved! How do I use the azure databricks dlt pipeline to consume azure Event Center data

Hi there, How do I use the azure databricks dlt pipeline to consume azure Event Hub dataCode :TOPIC = "myeventhub" KAFKA_BROKER = "" GROUP_ID = "group_dev" raw_kafka_events = (spark.readStream .format("kafka") .option("subscribe", EH_NAME) .opt...

  • 667 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi there @zmsoft ,Did you have a look at this ref doc : https://docs.databricks.com/aws/en/dlt/event-hubsThis might help

  • 0 kudos
Brianben
by New Contributor III
  • 547 Views
  • 1 replies
  • 0 kudos

Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk

Hi all,We have a daily Databricks job that downloads excel files from SharePoint and read them, the job works fine until today (3March). We are getting the following error message when running the code to read the excel file:org.apache.spark.SparkExc...

  • 547 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

I think the issue comes from installing Office365-REST-Python-Client using dbutils.library.installPyPI, which seems to create a conflicting Python environment for Spark executors. Since notebook specific installs modify the environment dynamically, t...

  • 0 kudos
xx123
by New Contributor III
  • 361 Views
  • 1 replies
  • 0 kudos

ETL Pipeline work fine, but when executed via Workflow it fails due to StorageAccessError

I have a fairly simple ETL Pipeline that uses dlt. It streams data from ADLS2 SA and creates materialized view using two tables. It works fine when i execute it on its own. Materialized view is properly refreshed.Now I wanted to add this as a task to...

xx123_0-1741285604974.png xx123_1-1741285627523.png
  • 361 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 0 kudos

Hi @xx123,Could you please provide more details on the cluster configuration?. I am guessing the cluster policy you might be using for deploying the job in workflow / and when you are testing might be different. Please try to use same cluster policy ...

  • 0 kudos
Nalapriya
by New Contributor II
  • 1204 Views
  • 3 replies
  • 0 kudos

I've data in s3/Iceberg tables. How to read it using databricks SparkSQL ?

I tried this method: df = spark.read.format("iceberg").load("s3-bucket-path")But got an error: Multiple sources found for iceberg (com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource, org.apache.iceberg.spark.source.Icebe...

  • 1204 Views
  • 3 replies
  • 0 kudos
Latest Reply
Nalapriya
New Contributor II
  • 0 kudos

Hi @Alberto_Umana, I tried the steps you've provided but still I'm not able to read data which is in iceberg format. It would be useful if I get any other suggestions.

  • 0 kudos
2 More Replies
Srujanm01
by New Contributor III
  • 3085 Views
  • 1 replies
  • 0 kudos

Databricks Managed RG Storage cost is High

Hi Community,How to calculate the databricks storage cost and where to see the data which is stored and charged in databricks.I'm trying to understand the storage cost on a managed resource group and i'm clueless about the data and where it is stored...

  • 3085 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi,How are you doing today? To understand Databricks storage costs in Azure, you can check where your data is stored and how it’s being charged. Managed tables, DBFS files, and Unity Catalog volumes are usually stored in an Azure Data Lake Storage (A...

  • 0 kudos
narendra11
by New Contributor
  • 531 Views
  • 1 replies
  • 1 kudos

Need to run update statement from databricks using azure sql pyodbc connection

Hi All, I was Trying to run the update statement in data bricks notebook using pyodbc connection. while i was doing I was getting following error. I need assistance to solve this.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODB...

  • 531 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Narendra,How are you doing today? As per my understanding, It looks like your Databricks notebook can't find the ODBC Driver 17 for SQL Server. You can first check if the driver is installed by running !odbcinst -q -d in a notebook cell. If it's m...

  • 1 kudos
BobCat62
by New Contributor III
  • 1161 Views
  • 3 replies
  • 0 kudos

Resolved! Delta Live Tables are refreshed in parallel rather than sequentially

Hi experts,I have defined my DLT Pipeline as follows:-- Define a streaming table to ingest data from a volume CREATE OR REFRESH STREAMING TABLE pumpdata_bronze TBLPROPERTIES ("myCompanyPipeline.quality" = "bronze") AS SELECT * FROM cloud_files("abfss...

  • 1161 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi @BobCat62 ,So the thing is Now dlt has different modes dlt direct publishing mode , classic mode(legacy). Look here for mode details : https://docs.databricks.com/aws/en/release-notes/product/2025/january#dlt-now-supports-publishing-to-tables-in-m...

  • 0 kudos
2 More Replies
Venugopal
by New Contributor III
  • 2345 Views
  • 5 replies
  • 1 kudos

databricks asset bundles: Unable to fetch variables from variable-overrides.json

Hi,I am using Databricks CLI 0.227.1 for creating a bundle project to deploy job.As per this , https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/variables I wanted to have variable-overrides.json to have my variables.I created a js...

  • 2345 Views
  • 5 replies
  • 1 kudos
Latest Reply
Venugopal
New Contributor III
  • 1 kudos

@ashraf1395 any thoughts on the above issue?

  • 1 kudos
4 More Replies
NehaR
by New Contributor III
  • 3508 Views
  • 4 replies
  • 2 kudos

Set time out or Auto termination for long running query

Hi ,We want to set auto termination for long running queries in data bricks adhoc cluster.I attempted below two approaches in my notebook. Despite my understanding that queries should automatically terminate after one hour, with both the approaches q...

  • 3508 Views
  • 4 replies
  • 2 kudos
Latest Reply
JissMathew
Valued Contributor
  • 2 kudos

Hi @NehaR  Apply these settings at the cluster-level configuration in the Databricks UI:Go to the Cluster Settings.Add the following Spark configuration:spark.databricks.queryWatchdog.enabled truespark.databricks.queryWatchdog.timeout 3600Restart the...

  • 2 kudos
3 More Replies
tp992
by New Contributor II
  • 2701 Views
  • 1 replies
  • 0 kudos

Using pyspark databricks UDFs with outside function imports

Problem with minimal exampleThe below minimal example does not run locally with databricks-connect==15.3 but does run within databricks workspace.main.pyfrom databricks.connect import DatabricksSession from module.udf import send_message, send_compl...

  • 2701 Views
  • 1 replies
  • 0 kudos
Latest Reply
tp992
New Contributor II
  • 0 kudos

I think the solution is in .addArtifact if I read this:https://kb.databricks.com/en_US/clusters/cannot-access-apache-sparkcontext-object-using-addpyfilehttps://www.databricks.com/blog/python-dependency-management-spark-connect But have not gotten it ...

  • 0 kudos
yorkuDE01
by New Contributor II
  • 626 Views
  • 2 replies
  • 1 kudos

Resolved! Keyvault reference for federated connection setup - Azure

I am trying to create a federated connection in unity catalog for an Oracle Database. The connection configuration GUI seems to ask for the password. Is it possible to put a keyvault reference here instead?  

Screenshot 2025-03-07 at 12.39.22 PM.png
  • 626 Views
  • 2 replies
  • 1 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 1 kudos

Hi @yorkuDE01,I suppose this could be done when you are trying to create / setup federated connection using API. But, I don't think so this could be possible via UI, where you can reference a key-vault scoped secret.But please refer the documentation...

  • 1 kudos
1 More Replies
Ramonrcn
by New Contributor III
  • 3085 Views
  • 8 replies
  • 1 kudos

Cant read/write tables with shared cluster

Hi!I have a pipeline that i cant execute sucessfully in a shared cluster. Basically i read a query from multiple sources on my databricks instance, including streaming tables (thats the reason i have to use a shared cluster).But when comes to the par...

  • 3085 Views
  • 8 replies
  • 1 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 1 kudos

Hi @Ramonrcn,If I understand your question, you should need to have MODIFY / ALL PRIVILEGES permission on top of the table inorder to drop or modify a table. And if you are performing this change using Managed Identity / IAM, the same permission ment...

  • 1 kudos
7 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels