Data Engineering

Forum Posts

Sorted by:

by GANAPATI_HEGDE • New Contributor III

39m ago

6 Views
0 replies
0 kudos

Unable to configure custom compute for DLT pipeline

I am trying to configure cluster for a pipeline like above, However dlt keeps using the small cluster as usual, how to resolve this?

Data Engineering

6 Views
0 replies
0 kudos

39m ago

by GJ2 • New Contributor II

02-17-2025 5:22:02 AM

10394 Views
11 replies
1 kudos

Install the ODBC Driver 17 for SQL Server

Hi,I am not a Data Engineer, I want to connect to ssas. It looks like it can be connected through pyodbc. however looks like I need to install "ODBC Driver 17 for SQL Server" using the following command. How do i install the driver on the cluster an...

Data Engineering

10394 Views
11 replies
1 kudos

02-17-2025 5:22:02 AM

View Replies

Latest Reply

kathrynshai
New Contributor

an hour ago

1 kudos

Hello Databrick Community,You're right—the "ODBC Driver 17 for SQL Server" is typically needed to connect to SSAS using pyodbc. The processes to install it on your cluster vary depending on your operating system: for Windows, you can download the ins...

1 kudos

an hour ago

10 More Replies

by shubham007 • New Contributor III

2 hours ago

7 Views
0 replies
0 kudos

Urgency: Getting Lakebridge installation failed in our organization environment (laptop)

Dear community expert,I’m reaching out for assistance with installing Databricks Lakebridge on my organization laptop. I have confirmed the stated prerequisites are installed: Java 22+, Python 3.11+, and the latest Databricks CLI, but the installer f...

Data Engineering

7 Views
0 replies
0 kudos

2 hours ago

by shubham007 • New Contributor III

2 hours ago

9 Views
0 replies
0 kudos

Urgency: How to do Data Migration task using Databricks Lakebridge tool ?

Dear community expert,I have completed two phases Analyzer & Converter of Databricks Lakebridge but stuck at migrating data from source to target using lakebridge. I have watched BrickBites Series on Lakebridge but did not find on how to migrate data...

Data Engineering

9 Views
0 replies
0 kudos

2 hours ago

by QuanSun • New Contributor II

04-28-2025 6:38:43 AM

1486 Views
6 replies
3 kudos

How to select performance mode for Databricks Delta Live Tables

Hi everyone,Based on the official link,For triggered pipelines, you can select the serverless compute performance mode using the Performance optimized setting in the pipeline scheduler. When this setting is disabled, the pipeline uses standard perfor...

Data Engineering

1486 Views
6 replies
3 kudos

04-28-2025 6:38:43 AM

View Replies

Latest Reply

mimimon
New Contributor II

2 hours ago

3 kudos

May I know if this was automatically on through all DLT tables? How do we monitor timestamp of turning this on and off and the id who did it? Or is automatically configured?

3 kudos

2 hours ago

5 More Replies

by Anonymous • Not applicable

02-23-2022 1:47:24 AM

11355 Views
9 replies
8 kudos

Resolved! data frame takes unusually long time to write for small data sets

We have configured workspace with own vpc. We need to extract data from DB2 and write as delta format. we tried to for 550k records with 230 columns, it took 50mins to complete the task. 15mn records takes more than 18hrs. Not sure why this takes suc...

Data Engineering

11355 Views
9 replies
8 kudos

02-23-2022 1:47:24 AM

View Replies

Latest Reply

Sown7
New Contributor II

4 hours ago

8 kudos

facing same issue - I have ~ 700 k rows and I am trying to write this table but it takes forever to write. Previously one time it took only like 5 sec to write but after that whenever we update the analysis and rewrite the table it takes very long an...

8 kudos

4 hours ago

8 More Replies

by saicharandeepb • New Contributor III

Friday

77 Views
5 replies
0 kudos

Looking for Suggestions: Designed a Decision Tree to Recommend Optimal VM Types for Workloads

Hi everyone!I recently designed a decision tree model to help recommend the most suitable VM types for different kinds of workloads in Databricks. Thought Process Behind the Design:Determining the optimal virtual machine (VM) for a workload is heavil...

Data Engineering

77 Views
5 replies
0 kudos

Friday

View Replies

Latest Reply

Coffee77
Contributor

12 hours ago

0 kudos

It looks interesting and I'll take a deeper loop! At first sight, as a suggestion I would include a new decision node to conditionally include VMs ready to "delta cache acceleration" or now "disk caching". These VMs have local SSD volumes so that the...

0 kudos

12 hours ago

4 More Replies

by 73334 • New Contributor II

03-04-2025 10:38:40 AM

3651 Views
2 replies
1 kudos

Dedicated Access Mode Interactive Cluster with a Service Principal

Hi, I am wondering if it is possible to set up an interactive cluster set to dedicated access mode and having that user be a machine user?I've tried the cluster creation API, /api/2.1/clusters/create, and set the user name to the service principal na...

Data Engineering

3651 Views
2 replies
1 kudos

03-04-2025 10:38:40 AM

View Replies

Latest Reply

Isi
Honored Contributor III

14 hours ago

1 kudos

Hello @73334 ,I tested and its possible. You have to use the application-id. Hope this helps, Isi

1 kudos

14 hours ago

1 More Replies

by Coffee77 • Contributor

20 hours ago

61 Views
5 replies
2 kudos

Resolved! Databricks Asset Bundles - High Level Diagrams Flow

Hi guys!Working recently in fully understanding (and helping others...) Databricks Asset Bundles (DAB) and having fun creating some diagrams with DAB flow at high level. First one contains flow with a simple deployment in PROD and second one contains...

Data Engineering

61 Views
5 replies
2 kudos

20 hours ago

View Replies

Latest Reply

Coffee77
Contributor

15 hours ago

2 kudos

Updated first high-level diagram. Now, it looks like this way:DAB High Level Diagram v1.1

2 kudos

15 hours ago

4 More Replies

by kanikvijay9 • New Contributor III

17 hours ago

29 Views
2 replies
5 kudos

Optimizing Delta Table Writes for Massive Datasets in Databricks

Problem StatementIn one of my recent projects, I faced a significant challenge: Writing a huge dataset of 11,582,763,212 rows and 2,068 columns to a Databricks managed Delta table.The initial write operation took 22.4 hours using the following setup:...

Data Engineering

29 Views
2 replies
5 kudos

17 hours ago

View Replies

Latest Reply

kanikvijay9
New Contributor III

15 hours ago

5 kudos

Hey @Louis_Frolio ,Thank you for the thoughtful feedback and great suggestions!A few clarifications:AQE is already enabled in my setup, and it definitely helped reduce shuffle overhead during the write.Regarding Column Pruning, in this case, the fina...

5 kudos

15 hours ago

1 More Replies

by eyalholzmann • Visitor

21 hours ago

30 Views
1 replies
0 kudos

Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?

I'm working with Delta tables using the Iceberg Uniform feature to enable Iceberg-compatible reads. I’m trying to understand how metadata cleanup works in this setup.Specifically, does the VACUUM operation—which removes old Delta Lake metadata based ...

Data Engineering

30 Views
1 replies
0 kudos

21 hours ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

15 hours ago

0 kudos

Great question @eyalholzmann , In Databricks Delta Lake with the Iceberg Uniform feature, VACUUM operations on the Delta table do NOT automatically clean up the corresponding Iceberg metadata. The two metadata layers are managed separately, and unde...

0 kudos

15 hours ago

by RobsonNLPT • Contributor III

02-18-2025 3:33:16 AM

3736 Views
2 replies
0 kudos

Databricks Rest API Statement Execution - External Links

Hi.I've tested the adb Rest Api to execute queries on databricks sql serverless. Using INLINE as disposition I have the json array with my correct results but using EXTERNAL_LINKS I have the chunks but the external_link (URL starting with http://stor...

Data Engineering

3736 Views
2 replies
0 kudos

02-18-2025 3:33:16 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

16 hours ago

0 kudos

The issue you're experiencing with Databricks SQL Serverless REST API in EXTERNAL_LINKS mode—where the external_link URL (http://storage-proxy.databricks.com/...) does not work, but you can access chunks directly via the /api/2.0/sql/statements/{stat...

0 kudos

16 hours ago

1 More Replies

by rbee • New Contributor II

02-25-2025 7:43:51 AM

5029 Views
1 replies
0 kudos

Connect to Sql server analysis services(SSAS) server to run DAX query using python

Hi, I have a powerbi server which I'm able to connect through SSMS. I tried using pyodbc to connect to same in databricks, but it is throwing below error.OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout exp...

Data Engineering

5029 Views
1 replies
0 kudos

02-25-2025 7:43:51 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

16 hours ago

0 kudos

Your understanding is correct: Power BI’s data model is stored in an Analysis Services (SSAS) engine, not a traditional SQL Server database. This means that while SSMS may connect to Power BI Premium datasets via XMLA endpoints, attempting to use pyo...

0 kudos

16 hours ago

by RobsonNLPT • Contributor III

02-27-2025 8:10:51 AM

4514 Views
4 replies
0 kudos

Connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x

Hi.I'm testing a databricks connection to a mongo cluster V7 (azure cluster) using the library org.mongodb.spark:mongo-spark-connector_2.13:10.4.1I can connect using compass but I get a timeout error using my adb notebookMongoTimeoutException: Timed ...

Data Engineering

4514 Views
4 replies
0 kudos

02-27-2025 8:10:51 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

16 hours ago

0 kudos

The error you’re seeing — MongoTimeoutException referencing localhost:27017 — suggests your Databricks cluster is trying to connect to MongoDB using the wrong address or that it cannot properly reach the MongoDB cluster endpoint from the notebook, ev...

0 kudos

16 hours ago

3 More Replies

by kertsman_nm • New Contributor

03-05-2025 3:53:07 PM

3826 Views
1 replies
0 kudos

Trying to use Broadcast to run Presidio distrubuted

Hello,I am currently evaluating using Microsoft's Presidio de-identification libraries for my organization and would like to see if we can take advantage to Sparks broadcast capabilities, but I am getting an error message:"[BROADCAST_VARIABLE_NOT_LOA...

Data Engineering

3826 Views
1 replies
0 kudos

03-05-2025 3:53:07 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

16 hours ago

0 kudos

You’re encountering the [BROADCAST_VARIABLE_NOT_LOADED] error because Databricks in shared access mode cannot use broadcast variables with non-serializable Python objects (such as your Presidio engines) due to cluster architecture limitations. The cl...

0 kudos

16 hours ago

Databricks Community

Forum Posts

Unable to configure custom compute for DLT pipeline

Install the ODBC Driver 17 for SQL Server

Urgency: Getting Lakebridge installation failed in our organization environment (laptop)

Urgency: How to do Data Migration task using Databricks Lakebridge tool ?

How to select performance mode for Databricks Delta Live Tables

Resolved! data frame takes unusually long time to write for small data sets

Looking for Suggestions: Designed a Decision Tree to Recommend Optimal VM Types for Workloads

Dedicated Access Mode Interactive Cluster with a Service Principal

Resolved! Databricks Asset Bundles - High Level Diagrams Flow

Optimizing Delta Table Writes for Massive Datasets in Databricks

Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?

Databricks Rest API Statement Execution - External Links

Connect to Sql server analysis services(SSAS) server to run DAX query using python

Connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x

Trying to use Broadcast to run Presidio distrubuted

Join Us as a Local Community Builder!

Databricks Asset Bundles - High Level Diagrams Flo...

Delta live table not showing in workspace (Azure d...

Unable to install libraries from requirements.txt ...

Databricks Bundle Validation Error After CLI Upgra...

DABs with multi github sources