cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

73334
by New Contributor II
  • 4014 Views
  • 3 replies
  • 1 kudos

Dedicated Access Mode Interactive Cluster with a Service Principal

Hi, I am wondering if it is possible to set up an interactive cluster set to dedicated access mode and having that user be a machine user?I've tried the cluster creation API, /api/2.1/clusters/create, and set the user name to the service principal na...

  • 4014 Views
  • 3 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

It turns out that now is possible to include deployment of interactive and SQL Warehouse clusters with Databricks Asset Bundles, so you can include a YAML file similar to this one to deploy that type of interactive clusters:Definition of Interactive ...

  • 1 kudos
2 More Replies
TomDeas
by New Contributor II
  • 2188 Views
  • 2 replies
  • 2 kudos

Resolved! Resource Throttling; Large Merge Operation - Recent Engine Change?

Morning all, hope you can help as I've been stumped for weeks.Question: have there been recent changes to the Databricks query engine, or Photon (etc) which may impact large sort operations?I have a Jobs pipeline that runs a series of notebooks which...

runhistory.JPG query1.png query2.png query_peak.JPG
Data Engineering
MERGE
Performance Optimisation
Photon
Query Plan
serverless
  • 2188 Views
  • 2 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

There have indeed been recent changes to the Databricks query engine and Photon, especially during the June 2025 platform releases, which may influence how large sort operations and resource allocation are handled in SQL pipelines similar to yours. S...

  • 2 kudos
1 More Replies
feliximmanuel
by New Contributor II
  • 2685 Views
  • 1 replies
  • 1 kudos

Error: oidc: fetch .well-known: Get "https://%E2%80%93host/oidc/.well-known/oauth-authorization-serv

I'm trying to authenticate databricks using WSL but suddenly getting this error./databricks-asset-bundle$ databricks auth login –host https://<XXXXXXXXX>.12.azuredatabricks.netDatabricks Profile Name:<XXXXXXXXX>Error: oidc: fetch .well-known: Get "ht...

  • 2685 Views
  • 1 replies
  • 1 kudos
Latest Reply
code-vj
New Contributor II
  • 1 kudos

It looks like the issue is caused by the dash before host. The command is using an en-dash (–) instead of a regular hyphen (-) — which breaks the URL parsing.Try running this instead:databricks auth login --host https://<your-instance>.azuredatabrick...

  • 1 kudos
Coffee77
by Contributor III
  • 402 Views
  • 6 replies
  • 2 kudos

Resolved! Databricks Asset Bundles - High Level Diagrams Flow

Hi guys!Working recently in fully understanding (and helping others...) Databricks Asset Bundles (DAB) and having fun creating some diagrams with DAB flow at high level. First one contains flow with a simple deployment in PROD and second one contains...

databricks_dab_deployment_prod.png databricks_dab_deployment_prod_with_tests.png
  • 402 Views
  • 6 replies
  • 2 kudos
Latest Reply
Coffee77
Contributor III
  • 2 kudos

I will go only with latest version then , that can be applied to any other lower environment for QA or testing.

  • 2 kudos
5 More Replies
QuanSun
by New Contributor II
  • 1726 Views
  • 6 replies
  • 3 kudos

How to select performance mode for Databricks Delta Live Tables

Hi everyone,Based on the official link,For triggered pipelines, you can select the serverless compute performance mode using the Performance optimized setting in the pipeline scheduler. When this setting is disabled, the pipeline uses standard perfor...

  • 1726 Views
  • 6 replies
  • 3 kudos
Latest Reply
mimimon
New Contributor II
  • 3 kudos

May I know if this was automatically on through all DLT tables? How do we monitor timestamp of turning this on and off and the id who did it? Or is automatically configured?

  • 3 kudos
5 More Replies
Anonymous
by Not applicable
  • 11659 Views
  • 9 replies
  • 8 kudos

Resolved! data frame takes unusually long time to write for small data sets

We have configured workspace with own vpc. We need to extract data from DB2 and write as delta format. we tried to for 550k records with 230 columns, it took 50mins to complete the task. 15mn records takes more than 18hrs. Not sure why this takes suc...

  • 11659 Views
  • 9 replies
  • 8 kudos
Latest Reply
Sown7
New Contributor II
  • 8 kudos

facing same issue - I have ~ 700 k rows and I am trying to write this table but it takes forever to write. Previously one time it took only like 5 sec to write but after that whenever we update the analysis and rewrite the table it takes very long an...

  • 8 kudos
8 More Replies
saicharandeepb
by Contributor
  • 350 Views
  • 5 replies
  • 2 kudos

Looking for Suggestions: Designed a Decision Tree to Recommend Optimal VM Types for Workloads

Hi everyone!I recently designed a decision tree model to help recommend the most suitable VM types for different kinds of workloads in Databricks. Thought Process Behind the Design:Determining the optimal virtual machine (VM) for a workload is heavil...

saicharandeepb_0-1762515348166.png
  • 350 Views
  • 5 replies
  • 2 kudos
Latest Reply
Coffee77
Contributor III
  • 2 kudos

It looks interesting and I'll take a deeper loop! At first sight, as a suggestion I would include a new decision node to conditionally include VMs ready to "delta cache acceleration" or now "disk caching". These VMs have local SSD volumes so that the...

  • 2 kudos
4 More Replies
RobsonNLPT
by Contributor III
  • 3951 Views
  • 2 replies
  • 0 kudos

Databricks Rest API Statement Execution - External Links

Hi.I've tested the adb Rest Api to execute queries on databricks sql serverless. Using INLINE as disposition I have the json array with my correct results but using EXTERNAL_LINKS I have the chunks but the external_link (URL starting with http://stor...

  • 3951 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The issue you're experiencing with Databricks SQL Serverless REST API in EXTERNAL_LINKS mode—where the external_link URL (http://storage-proxy.databricks.com/...) does not work, but you can access chunks directly via the /api/2.0/sql/statements/{stat...

  • 0 kudos
1 More Replies
rbee
by New Contributor II
  • 5298 Views
  • 1 replies
  • 0 kudos

Connect to Sql server analysis services(SSAS) server to run DAX query using python

Hi, I have a powerbi server which I'm able to connect through SSMS. I tried using pyodbc to connect to same in databricks, but it is throwing below error.OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout exp...

  • 5298 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Your understanding is correct: Power BI’s data model is stored in an Analysis Services (SSAS) engine, not a traditional SQL Server database. This means that while SSMS may connect to Power BI Premium datasets via XMLA endpoints, attempting to use pyo...

  • 0 kudos
RobsonNLPT
by Contributor III
  • 4741 Views
  • 4 replies
  • 0 kudos

Connection timeout when connecting to MongoDB using MongoDB Connector for Spark 10.x

Hi.I'm testing a databricks connection to a mongo cluster V7 (azure cluster) using the library org.mongodb.spark:mongo-spark-connector_2.13:10.4.1I can connect using compass but I get a timeout error using my adb notebookMongoTimeoutException: Timed ...

  • 4741 Views
  • 4 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you’re seeing — MongoTimeoutException referencing localhost:27017 â€” suggests your Databricks cluster is trying to connect to MongoDB using the wrong address or that it cannot properly reach the MongoDB cluster endpoint from the notebook, ev...

  • 0 kudos
3 More Replies
kertsman_nm
by New Contributor
  • 3980 Views
  • 1 replies
  • 0 kudos

Trying to use Broadcast to run Presidio distrubuted

Hello,I am currently evaluating using Microsoft's Presidio de-identification libraries for my organization and would like to see if we can take advantage to Sparks broadcast capabilities, but I am getting an error message:"[BROADCAST_VARIABLE_NOT_LOA...

  • 3980 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You’re encountering the [BROADCAST_VARIABLE_NOT_LOADED] error because Databricks in shared access mode cannot use broadcast variables with non-serializable Python objects (such as your Presidio engines) due to cluster architecture limitations. The cl...

  • 0 kudos
6502
by New Contributor III
  • 3403 Views
  • 1 replies
  • 0 kudos

Schema change and OpenSearch

Let me be crystal clear: Schema Change and OpenSeach do not fit well together. However, the data pushed to it are processed and always have the same schema. The problem here is that Spark is reading a CDC feed, which is subject to Schema Change becau...

  • 3403 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are encountering a common issue in Databricks Delta Lake streaming when working with Change Data Capture (CDC) feeds: schema evolution, especially with column mapping enabled, is not fully supported automatically in streaming reads—that includes ...

  • 0 kudos
van45678
by New Contributor
  • 4084 Views
  • 2 replies
  • 0 kudos

Getting connection reset issue while connecting to a SQL server

Hello All,I am unable to connect to a SQL server instance that is installed in a on-premise network from databricks. I am able to successfully ping the server from the notebook using this command [nc -vz <hostname> <port>]  which means I am able to e...

Data Engineering
Databricks
sqlserver
timeout
  • 4084 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you are encountering, "com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset," even after a successful nc (netcat) connection, is a common but nuanced problem when connecting Databricks to an on-premise SQL Server. Although your...

  • 0 kudos
1 More Replies
meghana_tulla
by New Contributor III
  • 2347 Views
  • 1 replies
  • 0 kudos

Issue: UCX Assessment Installation Error in Databricks Automation Script

Hi I'm experiencing a problem when installing UCX Assessment through an automation script in Databricks. The script fails with this error:13:38:06 WARNING [databricks.labs.ucx.hive_metastore.tables] {listing_tables_0} failed-table-crawl: listing tabl...

  • 2347 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The manual installation of UCX Assessment in Databricks works with the default <ALL> values, but automation scripts that set WORKSPACE_GROUPS="<ALL>" DATABASES="<ALL>" often encounter a SCHEMA_NOT_FOUND error related to 'ALL' not being recognized as ...

  • 0 kudos
aav331
by New Contributor II
  • 445 Views
  • 3 replies
  • 2 kudos

Resolved! Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task

I am running into the following error while trying to deploy a serverless job running a spark_python_task with GIT as the source for the code. The Job was deployed as part of a DAB from a Github Actions Runner.Run failed with error message Library i...

  • 445 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@aav331 , if you are happy with the result please "Accept as Solution." This will help others who may be in the same boat. Cheers, Louis.

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels