Data Engineering

Forum Posts

Sorted by:

by saicharandeepb • New Contributor III

4 hours ago

29 Views
3 replies
0 kudos

Looking for Suggestions: Designed a Decision Tree to Recommend Optimal VM Types for Workloads

Hi everyone!I recently designed a decision tree model to help recommend the most suitable VM types for different kinds of workloads in Databricks. Thought Process Behind the Design:Determining the optimal virtual machine (VM) for a workload is heavil...

Data Engineering

29 Views
3 replies
0 kudos

4 hours ago

View Replies

Latest Reply

jameswood32
New Contributor III

2 hours ago

0 kudos

Your decision tree idea sounds solid! To improve it, consider including additional factors like network bandwidth, storage IOPS, and workload burst patterns. Also, think about cost-performance trade-offs and potential scaling requirements. Validating...

0 kudos

2 hours ago

2 More Replies

by deng_dev • New Contributor III

7 hours ago

35 Views
3 replies
1 kudos

Databricks Apps pricing

Hi everyone!I was investigating Databricks Apps as solution for my task and didn't fully understood pricing.I have found this page and it indicates it will cost 75$ / DBU for Premium subscription plan when using AWS cloud. Is it full cost or will the...

Data Engineering

35 Views
3 replies
1 kudos

7 hours ago

View Replies

Latest Reply

ManojkMohan
Honored Contributor

6 hours ago

1 kudos

@deng_dev The $75 per DBU Premium subscription plan price for Databricks Apps on AWS shown on the Databricks Apps pricing page reflects the charge from Databricks itself. https://www.databricks.com/product/pricing/databricks-appsHowever, this is not ...

1 kudos

6 hours ago

2 More Replies

by JameDavi_51481 • Contributor

01-17-2024 7:30:00 AM

9846 Views
11 replies
13 kudos

Can we add tags to Unity Catalog through Terraform?

We use Terraform to manage most of our infrastructure, and I would like to extend this to Unity Catalog. However, we are extensive users of tagging to categorize our datasets, and the only programmatic method I can find for adding tags is to use SQL ...

Data Engineering

9846 Views
11 replies
13 kudos

01-17-2024 7:30:00 AM

View Replies

Latest Reply

jlieow
Databricks Employee

6 hours ago

13 kudos

In case anyone comes across this, have a look at databricks_entity_tag_assignment and see if it suits your needs.

13 kudos

6 hours ago

10 More Replies

by DataGirl • New Contributor

09-08-2022 5:41:51 PM

16320 Views
7 replies
2 kudos

Multi value parameter on Power BI Paginated / SSRS connected to databricks using ODBC

Hi All, I'm wondering if anyone has had any luck setting up multi valued parameters on SSRS using ODBC connection to Databricks? I'm getting "Cannot add multi value query parameter" error everytime I change my parameter to multi value. In the query s...

Data Engineering

16320 Views
7 replies
2 kudos

09-08-2022 5:41:51 PM

View Replies

Latest Reply

kashti123
Visitor

6 hours ago

2 kudos

Hi I am also trying to set multi value parameters using the dynamic sql expression. However, the report gives error that multi value parameters are not supported by the data extension. Any help on this would be highly appreciated. Thanks , Drishti

2 kudos

6 hours ago

6 More Replies

by kcyugesh • Visitor

yesterday

29 Views
1 replies
0 kudos

Delta live table not showing in workspace (Azure databricks with premium plan)

- I have a premium plan and owner level access

Screenshot 2025-11-07 at 12.15.29 PM.png

Screenshot 2025-11-07 at 12.22.33 PM.png

Data Engineering

29 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

yesterday

0 kudos

Hi @kcyugesh ,They changed the name from DLT to Lakeflow Declartive Pipelines, so you won't find DLT name in UI.Click job & pipelines and then ETL pipeline to access declarative pipeline editior

0 kudos

yesterday

by DE5 • Visitor

yesterday

27 Views
1 replies
1 kudos

Unable to see the Assistant suggested code and current code side by side

Hi,I'm unable to see the Assistant suggested code and current code side by side. Previously I'm able to see the my code and Assistant suggested code side by side which helped me to understand the changes. Please suggest if there is any ways for it. T...

Data Engineering

27 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

ManojkMohan
Honored Contributor

yesterday

1 kudos

@DE5 Some recent updates moved comparison features into the SQL Editor side panel or rely on “Cell Actions,” where you can generate code or format it and then see differences before applying changeshttps://www.databricks.com/blog/introducing-new-sql-...

1 kudos

yesterday

by Dhruv-22 • Contributor II

a week ago

268 Views
7 replies
6 kudos

Reading empty json file in serverless gives error

I ran a databricks notebook to do incremental loads from files in raw layer to bronze layer tables. Today, I encountered a case where the delta file was empty. I tried running it manually on the serverless compute and encountered an error.df = spark....

Data Engineering

268 Views
7 replies
6 kudos

a week ago

View Replies

Latest Reply

K_Anudeep
Databricks Employee

a week ago

6 kudos

Hello @Dhruv-22 , Can you share the schema of the df? Do you have a _corrupt_record column in your dataframe? If yes.. where are you getting it from, because you said its an empty file correct?As per the design ,Spark blocks queries that only referen...

6 kudos

a week ago

6 More Replies

by aav331 • Visitor

yesterday

60 Views
1 replies
0 kudos

Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task

I am running into the following error while trying to deploy a serverless job running a spark_python_task with GIT as the source for the code. The Job was deployed as part of a DAB from a Github Actions Runner.Run failed with error message Library i...

Data Engineering

60 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

0 kudos

Hey @aav331 , here’s a focused analysis of the community post’s issue and how to fix it. Summary of the problem The job is a serverless spark_python_task sourced from Git, and it fails to install packages from a requirements.txt because the file i...

0 kudos

yesterday

by dbdev • Contributor

Tuesday

108 Views
3 replies
0 kudos

Lakehouse Federation - fetch size parameter for optimization

Hi,We use lakehouse federation to connect to a database.A performance recommendation is to use 'fetchSize':Lakehouse Federation performance recommendations - Azure Databricks | Microsoft Learn SELECT * FROM mySqlCatalog.schema.table WITH ('fetchSiz...

Data Engineering

108 Views
3 replies
0 kudos

Tuesday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

0 kudos

Hello @dbdev , I did some digging and here are some suggestions. The `fetchSize` parameter in Lakehouse Federation is currently only available through SQL syntax using the `WITH` clause, as documented in the performance recommendations. Unfortunately...

0 kudos

yesterday

2 More Replies

by hgm251 • New Contributor

yesterday

37 Views
1 replies
1 kudos

badrequest: cannot create online table is being deprecated. creating new online table is not allowed

Hello!This seems so sudden that we cannot create online tables anymore? Is there a workaround to being able to create online tables temporarily as we need more time to move to synced tables? #online_tables

Data Engineering

37 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

yesterday

1 kudos

Yes, the Databricks online tables (legacy) are being deprecated, and after January 15, 2026, you will no longer be able to access or create them.https://docs.databricks.com/aws/en/machine-learning/feature-store/migrate-from-online-tablesHere are few ...

1 kudos

yesterday

by databricksero • New Contributor II

yesterday

71 Views
2 replies
3 kudos

Resolved! Databricks Bundle Validation Error After CLI Upgrade (0.274.0 → 0.276.0)

After upgrading the Databricks CLI from version 0.274.0 to 0.276.0, bundle validation is failing with an error indicating that my configuration is formatted for "open-source Spark Declarative Pipelines" while the CLI now only supports "Lakeflow Decla...

Data Engineering

71 Views
2 replies
3 kudos

yesterday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

yesterday

3 kudos

Hi @databricksero ,It's a bug. I've checked and the PR fixing this bug is already merged to main branch. Check below github thread and then once they build new release just update databricks CLI (soon they should release version without bug). Fix oss...

3 kudos

yesterday

1 More Replies

by Y_WANG • New Contributor

yesterday

34 Views
1 replies
0 kudos

Want to use DataFrame equality functions but also Numpy >= 2.0

In my team, we has a lot of Data science workflow using Spark and Pandas. In order to rassure the stability of workflows, we need to implement the unit test. Recently, I found out the DataFrame equality test functions introduced in Spark 3.5 which se...

Data Engineering

34 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

ManojkMohan
Honored Contributor

yesterday

0 kudos

@Y_WANG The root cause of the AttributeError you face when importing assertDataFrameEqual from pyspark.testing in Spark 3.5 is due to Spark's code using the deprecated np.NaN attribute, which was removed in NumPy 2.0 (replaced by np.nan). This break...

0 kudos

yesterday

by der • Contributor II

Tuesday

115 Views
5 replies
1 kudos

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

I want to read an Excel xlsx file on DBR 17.3. On the Cluster the library dev.mauch:spark-excel_2.13:4.0.0_0.31.2 is installed. V1 Implementation works fine:df = spark.read.format("dev.mauch.spark.excel").schema(schema).load(excel_file) display(df)V2...

Data Engineering

115 Views
5 replies
1 kudos

Tuesday

View Replies

Latest Reply

mmayorga
Databricks Employee

yesterday

1 kudos

hi @der First of all thank you for your patience and for providing more information about your case. Use of ".format("excel")" I replicated equally your cluster config in Azure. Without installing any library, I was able to run and load the xlsx fil...

1 kudos

yesterday

4 More Replies

by erigaud • Honored Contributor

12-02-2024 7:08:32 AM

2976 Views
10 replies
8 kudos

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Hello everyone !Since Databricks Asset Bundles can now be used to deploy dashboards, I'm wondering how to pass parameters so that the queries for the dev dashboard query the dev catalog, and the dashboard in stg query the stg catalog etc.Is there any...

Data Engineering

2976 Views
10 replies
8 kudos

12-02-2024 7:08:32 AM

View Replies

Latest Reply

Coffee77
Contributor

yesterday

8 kudos

What I did as a workaround. It works pretty fine but you'll need to duplicate Dashboard JSON code per environment and then, replace catalog names It is not the perfect solution but the only way I could find to include these deployment in my Databric...

8 kudos

yesterday

9 More Replies

by bidek56 • Contributor

a week ago

118 Views
3 replies
0 kudos

Location of spark.scheduler.allocation.file

In DBR 164.LTS, I am trying to add the following Spark config: spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xmlBut the all purpose cluster is throwing this error Spark error: Driver down cause: com.databricks.backend.daemon.dri...

Data Engineering

118 Views
3 replies
0 kudos

a week ago

View Replies

Latest Reply

bidek56
Contributor

yesterday

0 kudos

@mark_ott Setting WSFS_ENABLE=false does not effect anything. Thx

0 kudos

yesterday

2 More Replies

Databricks Community

Forum Posts

Looking for Suggestions: Designed a Decision Tree to Recommend Optimal VM Types for Workloads

Databricks Apps pricing

Can we add tags to Unity Catalog through Terraform?

Multi value parameter on Power BI Paginated / SSRS connected to databricks using ODBC

Delta live table not showing in workspace (Azure databricks with premium plan)

Unable to see the Assistant suggested code and current code side by side

Reading empty json file in serverless gives error

Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task

Lakehouse Federation - fetch size parameter for optimization

badrequest: cannot create online table is being deprecated. creating new online table is not allowed

Resolved! Databricks Bundle Validation Error After CLI Upgrade (0.274.0 → 0.276.0)

Want to use DataFrame equality functions but also Numpy >= 2.0

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Location of spark.scheduler.allocation.file

Join Us as a Local Community Builder!

Databricks Bundle Validation Error After CLI Upgra...

DABs with multi github sources

DLT Streaming With Watermark fails, suggesting I s...

Bug in Asset Bundle Sync

Migrating from on-premises HDFS to Unity Catalog -...