cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Srujanm01
by New Contributor III
  • 3851 Views
  • 1 replies
  • 0 kudos

Databricks Managed RG Storage cost is High

Hi Community,How to calculate the databricks storage cost and where to see the data which is stored and charged in databricks.I'm trying to understand the storage cost on a managed resource group and i'm clueless about the data and where it is stored...

  • 3851 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi,How are you doing today? To understand Databricks storage costs in Azure, you can check where your data is stored and how it’s being charged. Managed tables, DBFS files, and Unity Catalog volumes are usually stored in an Azure Data Lake Storage (A...

  • 0 kudos
narendra11
by New Contributor
  • 639 Views
  • 1 replies
  • 1 kudos

Need to run update statement from databricks using azure sql pyodbc connection

Hi All, I was Trying to run the update statement in data bricks notebook using pyodbc connection. while i was doing I was getting following error. I need assistance to solve this.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODB...

  • 639 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Narendra,How are you doing today? As per my understanding, It looks like your Databricks notebook can't find the ODBC Driver 17 for SQL Server. You can first check if the driver is installed by running !odbcinst -q -d in a notebook cell. If it's m...

  • 1 kudos
BobCat62
by New Contributor III
  • 1428 Views
  • 3 replies
  • 0 kudos

Resolved! Delta Live Tables are refreshed in parallel rather than sequentially

Hi experts,I have defined my DLT Pipeline as follows:-- Define a streaming table to ingest data from a volume CREATE OR REFRESH STREAMING TABLE pumpdata_bronze TBLPROPERTIES ("myCompanyPipeline.quality" = "bronze") AS SELECT * FROM cloud_files("abfss...

  • 1428 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi @BobCat62 ,So the thing is Now dlt has different modes dlt direct publishing mode , classic mode(legacy). Look here for mode details : https://docs.databricks.com/aws/en/release-notes/product/2025/january#dlt-now-supports-publishing-to-tables-in-m...

  • 0 kudos
2 More Replies
Venugopal
by New Contributor III
  • 2922 Views
  • 5 replies
  • 1 kudos

databricks asset bundles: Unable to fetch variables from variable-overrides.json

Hi,I am using Databricks CLI 0.227.1 for creating a bundle project to deploy job.As per this , https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/variables I wanted to have variable-overrides.json to have my variables.I created a js...

  • 2922 Views
  • 5 replies
  • 1 kudos
Latest Reply
Venugopal
New Contributor III
  • 1 kudos

@ashraf1395 any thoughts on the above issue?

  • 1 kudos
4 More Replies
yorkuDE01
by New Contributor II
  • 788 Views
  • 2 replies
  • 1 kudos

Resolved! Keyvault reference for federated connection setup - Azure

I am trying to create a federated connection in unity catalog for an Oracle Database. The connection configuration GUI seems to ask for the password. Is it possible to put a keyvault reference here instead?  

Screenshot 2025-03-07 at 12.39.22 PM.png
  • 788 Views
  • 2 replies
  • 1 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 1 kudos

Hi @yorkuDE01,I suppose this could be done when you are trying to create / setup federated connection using API. But, I don't think so this could be possible via UI, where you can reference a key-vault scoped secret.But please refer the documentation...

  • 1 kudos
1 More Replies
Ramonrcn
by New Contributor III
  • 3509 Views
  • 8 replies
  • 1 kudos

Cant read/write tables with shared cluster

Hi!I have a pipeline that i cant execute sucessfully in a shared cluster. Basically i read a query from multiple sources on my databricks instance, including streaming tables (thats the reason i have to use a shared cluster).But when comes to the par...

  • 3509 Views
  • 8 replies
  • 1 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 1 kudos

Hi @Ramonrcn,If I understand your question, you should need to have MODIFY / ALL PRIVILEGES permission on top of the table inorder to drop or modify a table. And if you are performing this change using Managed Identity / IAM, the same permission ment...

  • 1 kudos
7 More Replies
megan
by New Contributor
  • 3173 Views
  • 1 replies
  • 0 kudos

Databricks is not converting my .ipynb file to .py

I can't find a way to have Databricks automatically convert my python notebooks to .py files when committing in a Git directory. This seems to counter the documentation I've read in that it should automatically convert. I've double checked that my se...

  • 3173 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nivethan_Venkat
Contributor III
  • 0 kudos

Hi @megan,The default format for notebooks in Databricks workspace now is .ipynb and not .py as mentioned in the image below. And you can navigate to settings -> Developer - > Default file format for notebooks and you can set the default format acros...

  • 0 kudos
sowj02
by New Contributor
  • 542 Views
  • 1 replies
  • 0 kudos

Stream-stream join using MongoDB sink

I am performing stream-to-stream join in Databricks using MongoDB as a source (readStream()). Both sources collections receive data at same time. Initially I tried with using watermarks orderWithWatermark = order \  .selectExpr("order_id AS orderId",...

  • 542 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

There is not enough information in this high-level error message. Please expand the full stacktrace and feel free to post it here

  • 0 kudos
jeremy98
by Honored Contributor
  • 2549 Views
  • 9 replies
  • 0 kudos

restarting the cluster always running doesn't free the memory?

Hello community,I was working on optimising the driver memory, since there are code that are not optimised for spark, and I was planning temporary to restart the cluster to free up the memory.that could be a potential solution, since if the cluster i...

Screenshot 2025-03-04 at 14.49.44.png
  • 2549 Views
  • 9 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

any suggestion Mr. @Alberto_Umana ?

  • 0 kudos
8 More Replies
jkb7
by New Contributor III
  • 764 Views
  • 1 replies
  • 0 kudos

How can we import the exception "MetadataChangedException"?

I regularly getMetadataChangedException: [DELTA_METADATA_CHANGED] MetadataChangedException: The metadata of the Delta table has been changed by a concurrent update. Please try the operation again.What is the recommended way to import this specific ty...

  • 764 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nik_Vanderhoof
Contributor
  • 0 kudos

Hi! It depends on whether you're using Scala or Python.If you're using Scala, you should be able to import `io.delta.exceptions.MetadataChangedException`, which you can see defined here: https://github.com/delta-io/delta/blob/master/spark/src/main/sc...

  • 0 kudos
Akshay_Petkar
by Valued Contributor
  • 1202 Views
  • 1 replies
  • 0 kudos

Issue with Liquid Clustering on Partitioned Table in Databricks

 I recently tried applying Liquid Clustering to a partitioned table in Databricks and encountered the followingerror: [DELTA_ALTER_TABLE_CLUSTER_BY_ON_PARTITIONED_TABLE_NOT_ALLOWED] ALTER TABLE CLUSTER BY cannot be applied to a partitioned table. I u...

  • 1202 Views
  • 1 replies
  • 0 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 0 kudos

Hi @Akshay_Petkar  Since we cannot use Liquid Clustering with a partitioned table, the only way I can think of is migrating from partitioning to Liquid Clustering. The same partitioning key columns and the additional columns you wanted to add can be ...

  • 0 kudos
joseph_sf
by New Contributor
  • 1283 Views
  • 1 replies
  • 1 kudos

Implement Delta tables optimized for Databricks SQL service

This question is on  the Databricks Certified Data Engineer Professional exam in section 1: "Implement Delta tables optimized for Databricks SQL service"I do not understand what is being asked by this question. i would assume that their different way...

  • 1283 Views
  • 1 replies
  • 1 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 1 kudos

Hi @joseph_sf , I assume you are referring to the exam guide PDF file. As you assumed, there are different techniques to optimize a Delta table. Some of them are already mentioned in the other bullet points in the same section 1, such as partitioning...

  • 1 kudos
drollason
by New Contributor II
  • 854 Views
  • 1 replies
  • 1 kudos

Resolved! Issue with UDF's and DLT where UDF is multi layered and externalized

Having issue getting UDF's to work within a DLT where the UDF is externalized outside of the notebook and it attempts to call other functions.  End goal to put unit test coverage around the various functions, hence the pattern. For test purpose I cre...

  • 854 Views
  • 1 replies
  • 1 kudos
Latest Reply
bgiesbrecht
Databricks Employee
  • 1 kudos

Hi @drollason. In DLT pipelines, I would try packaging up your code as a wheel and then install it via pip. I had the same scenario as you and was able to bring in my custom code this way.

  • 1 kudos
nolanreilly
by New Contributor II
  • 1360 Views
  • 1 replies
  • 1 kudos

Impossible to read a custom pipeline? (Scala)

I have created a custom transformer to be used in a ml pipeline. I was able to write the pipeline to storage by extending the transformer class with DefaultParamsWritable. Reading the pipeline back in however, does not seem possible in Scala. I have...

  • 1360 Views
  • 1 replies
  • 1 kudos
Latest Reply
WarrenO
New Contributor III
  • 1 kudos

Hi, did you ever find a solution for this?

  • 1 kudos
NaeemS
by New Contributor III
  • 2064 Views
  • 2 replies
  • 4 kudos

Custom transformers with mlflow

Hi Everyone,I have created a spark pipeline in which I have a stage which is a Custom Transformer. Now I am using feature stores to log my model. But the issue is that the custom Transformer stage is not serialized properly and is not logged along wi...

  • 2064 Views
  • 2 replies
  • 4 kudos
Latest Reply
WarrenO
New Contributor III
  • 4 kudos

Hi @NaeemS,Did you ever get a solution to this problem? I've now encountered this myself. When I save the pipeline using ML Flow log_model, I am able to load the model fine. When I log it with Databricks Feature Engineering package, it throws an erro...

  • 4 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels