Data Engineering

Forum Posts

Sorted by:

by hidden • New Contributor II

Thursday

56 Views
1 replies
0 kudos

DLT PARAMETERIZATION FROM JOBS PARAMETERS

I have created a dlt pipeline notebook which creates tables based on a config file that has the configuration of the tables that need to be created . now what i want is i want to run my pipeline every 30 min for 4 tables from config and every 3 hours...

Data Engineering

56 Views
1 replies
0 kudos

Thursday

View Replies

Latest Reply

Coffee77
Contributor III

Thursday

0 kudos

Define "parameters" in job as usual and then, try to capture them in DLT by using similar code to this one:dlt.conf.get("PARAMETER_NAME", "PARAMETER_DEFAULT_VALUE")It should get parameter values from job if value exists, otherwise it'll set the defau...

0 kudos

Thursday

by santosh_bhosale • New Contributor

Thursday

49 Views
2 replies
1 kudos

Issue with Unity Catlog on Azure

when I create Databricks workspace on Azure and tries to login on https://accounts.azuredatabricks.net/ it redirects to my workspace. Where as on Azure subscription I am the owner, I created this azure subscription and Databricks workspace is also cr...

Data Engineering

49 Views
2 replies
1 kudos

Thursday

View Replies

Latest Reply

Coffee77
Contributor III

Thursday

1 kudos

Clearly, you don't have "admin account" permissions. Try to click in the workspace drop-down and then, check if you can see and click in "Manage Account" to confirm BUT it will be very likely you are not allowed to access.You must be Azure Global Adm...

1 kudos

Thursday

1 More Replies

by leenack • New Contributor

Tuesday

246 Views
7 replies
2 kudos

No rows returned when calling Databricks procedure via .NET API and Simba ODBC driver

I created a simple Databricks procedure that should return a single value."SELECT 1 AS result;"When I call this procedure from my .NET API using ExecuteReader, ExecuteAdapter, or ExecuteScalar, the call completes without any errors, but no rows are r...

Data Engineering

246 Views
7 replies
2 kudos

Tuesday

View Replies

Latest Reply

Coffee77
Contributor III

Thursday

2 kudos

So, @leenack best option so far is to refactor part of your code from stored procedures to functions, specifically the part of querying data. Exactly the same, I proposed in previous comments. Thanks @matt for your response.

2 kudos

Thursday

6 More Replies

by Allen123Maria_1 • New Contributor

Tuesday

110 Views
2 replies
0 kudos

Optimizing Azure Functions for Performance and Cost with Variable Workloads

Hey, everyone!!I use Azure Functions in a project where the workloads change a lot. Sometimes it's quiet, and other times we get a lot of traffic.Azure Functions is very scalable, but I've had some trouble with cold starts and keeping costs down.I'm ...

Data Engineering

110 Views
2 replies
0 kudos

Tuesday

View Replies

Latest Reply

susanrobert3
New Contributor

Thursday

0 kudos

Hey!!!Cold starts on Azure Functions Premium can still bite if your instances go idle long enough — even with pre-warmed instances.What usually helps is bumping the `preWarmedInstanceCount` to at least 1 per plan (so there’s always a warm worker), an...

0 kudos

Thursday

1 More Replies

by wkgcls • New Contributor

Thursday

79 Views
2 replies
1 kudos

Resolved! DQX usage outside Databricks

Hello, When evaluating data quality frameworks for PySpark pipelines, I came across DQX. I noticed it's available on PyPI (databricks-labs-dqx) and GitHub, which is great for accessibility.However, I'm trying to understand the licensing requirements....

Data Engineering

79 Views
2 replies
1 kudos

Thursday

View Replies

Latest Reply

wkgcls
New Contributor

Thursday

1 kudos

Thanks a lot for the quick response, @ManojkMohan! This was very helpful.I'll keep this in mind.

1 kudos

Thursday

1 More Replies

by liquibricks • New Contributor II

Wednesday

76 Views
3 replies
2 kudos

Moving tables between pipelines in production

We are testing an ingestion from kafka to databricks using a streaming table. The streaming table was created by a DAB deployed to "production" which runs as a service principal. This means the service principal is the "owner" of the table.We now wan...

Data Engineering

76 Views
3 replies
2 kudos

Wednesday

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

Wednesday

2 kudos

You’ve hit two limitations:Streaming tables don’t allow SET OWNER – ownership cannot be changed.Lakeflow pipeline ID changes require pipeline-level permissions – if you’re not the pipeline owner, you can’t run ALTER STREAMING TABLE ... SET PIPELINE_I...

2 kudos

Wednesday

2 More Replies

by Suheb • New Contributor II

Wednesday

85 Views
4 replies
3 kudos

When working with large data sets in Databricks, what are best practices to avoid memory out-of-memo

How can I optimize Databricks to handle large datasets without running into memory or performance problems?

Data Engineering

85 Views
4 replies
3 kudos

Wednesday

View Replies

Latest Reply

tarunnagar
New Contributor III

Thursday

3 kudos

Hey! Great question — I’ve run into this issue quite a few times while working with large datasets in Databricks, and out-of-memory errors can be a real headache. One of the biggest things that helps is making sure your cluster configuration matches ...

3 kudos

Thursday

3 More Replies

by Marcus_S • New Contributor II

05-26-2025 7:55:38 AM

2856 Views
2 replies
0 kudos

Change in UNRESOLVED_COLUMN error behavior in Runtime 14.3 LTS

I've noticed a change in how Databricks handles unresolved column references in PySpark when using All-purpose compute (not serverless).In Databricks Runtime 14.3 LTS, referencing a non-existent column like this:df = spark.table('default.example').se...

Data Engineering

2856 Views
2 replies
0 kudos

05-26-2025 7:55:38 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

a week ago

0 kudos

Databricks has recently changed how unresolved column references are handled in PySpark on All-purpose compute clusters. In earlier Databricks Runtime (DBR) 14.3 LTS builds, referencing a non-existent column—such as: python df = spark.tabl...

0 kudos

a week ago

1 More Replies

by Asaph • New Contributor

01-22-2025 6:21:22 PM

3919 Views
8 replies
1 kudos

Issue with databricks.sdk - AccountClient Service Principals API

Hi everyone,I’ve been trying to work with the databricks.sdk Python library to manage service principals programmatically. However, I’m running into an issue when attempting to create a service principal using the AccountClient class. Below is the co...

Data Engineering

3919 Views
8 replies
1 kudos

01-22-2025 6:21:22 PM

View Replies

Latest Reply

MarlonFojas
New Contributor

Wednesday

1 kudos

I am using the Python SDK and to authenticate I am using a SP and a Secret. Here is the code that worked for me in Azure Databricks notebook.from databricks.sdk import AccountClient acct_client = AccountClient( host="https://accounts.azuredatabr...

1 kudos

Wednesday

7 More Replies

by Ramana • Valued Contributor

09-11-2025 12:50:07 PM

783 Views
6 replies
1 kudos

Resolved! Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolution

Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.When we try to execute the existing jobs with Serverless Compute, if the ...

Data Engineering

783 Views
6 replies
1 kudos

09-11-2025 12:50:07 PM

View Replies

Latest Reply

Ramana
Valued Contributor

Wednesday

1 kudos

In Serverless Version 4, Databricks fixed this issue.

1 kudos

Wednesday

5 More Replies

by akuma643 • New Contributor II

02-18-2025 6:48:44 AM

3728 Views
3 replies
1 kudos

The authentication value "ActiveDirectoryManagedIdentity" is not valid.

Hi Team,i am trying to connect to SQL server hosted in azure vm using Entra id authentication from Databricks.("authentication", "ActiveDirectoryManagedIdentity")Below is the notebook script i am using. driver = "com.microsoft.sqlserver.jdbc.SQLServe...

Data Engineering

3728 Views
3 replies
1 kudos

02-18-2025 6:48:44 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

1 kudos

You are encountering an error because the default SQL Server JDBC driver bundled with Databricks may not fully support the authentication value "ActiveDirectoryManagedIdentity"—this option requires at least version 10.2.0 of the Microsoft SQL Server ...

1 kudos

2 weeks ago

2 More Replies

by cdn_yyz_yul • New Contributor III

Monday

137 Views
4 replies
1 kudos

delta as streaming source, can the reader reads only newly appended rows?

Hello everyone,In our implementation of Medallion Architecture, we want to stream changes with spark structured streaming. I would like some advice on how to use delta table as source correctly, and if there is performance (memory usage) concern in t...

Data Engineering

137 Views
4 replies
1 kudos

Monday

View Replies

Latest Reply

mark_ott
Databricks Employee

Wednesday

1 kudos

In your scenario using Medallion Architecture with Delta tables as both streaming source and sink, it is important to understand Spark Structured Streaming behavior and performance characteristics, especially with joins and memory usage. Here is a di...

1 kudos

Wednesday

3 More Replies

by Shubhankar_123 • New Contributor

Monday

92 Views
1 replies
0 kudos

Internal error 500 on databricks vector search endpoint

We are facing an internal 500 error accessing the vector search endpoint through streamlit application, if I refresh the application sometimes the error goes away, it has now started to become an usual occurrence. If I try to query the endpoint using...

Data Engineering

92 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

mark_ott
Databricks Employee

Wednesday

0 kudos

The intermittent Internal 500 errors you’re experiencing when accessing the vector search endpoint through a Streamlit app on Databricks—while direct console queries work—suggest an issue with the interaction between your Streamlit app’s environment ...

0 kudos

Wednesday

by SumitB14 • New Contributor

Tuesday

69 Views
1 replies
0 kudos

Databricks Nested Json Flattening

Hi Databricks Community,I am facing an issue while exploding nested JSON data.In the content column, I have dynamic nested JSON, and I am using the below approach to parse and explode it.from pyspark.sql import SparkSessionfrom pyspark.sql.functions ...

Data Engineering

69 Views
1 replies
0 kudos

Tuesday

View Replies

Latest Reply

mark_ott
Databricks Employee

Wednesday

0 kudos

You are encountering an AttributeError related to strip, which likely means that some entries for activity.value are not strings (maybe None or dicts) and your code expects all to be strings before calling .strip(). This kind of problem can arise if ...

0 kudos

Wednesday

by ShivangiB1 • New Contributor III

Monday

74 Views
2 replies
0 kudos

DATABRICKS LAKEFLOW SQL SERVER INGESTION PIPELINE ERROR

Hey Team,I am getting below error while creating pipeline :com.databricks.pipelines.execution.extensions.managedingestion.errors.ManagedIngestionNonRetryableException: [INGESTION_GATEWAY_DDL_OBJECTS_MISSING] DDL objects missing on table 'coedb.dbo.so...

Data Engineering

74 Views
2 replies
0 kudos

Monday

View Replies

Latest Reply

mark_ott
Databricks Employee

Wednesday

0 kudos

The error you are seeing means Databricks cannot capture DDL (table definition) changes, even though CDC (Change Data Capture) and CT (Change Tracking) are enabled. You must run the specific DDL support objects script for Databricks ingestion and the...

0 kudos

Wednesday

1 More Replies

Databricks Community

Forum Posts

DLT PARAMETERIZATION FROM JOBS PARAMETERS

Issue with Unity Catlog on Azure

No rows returned when calling Databricks procedure via .NET API and Simba ODBC driver

Optimizing Azure Functions for Performance and Cost with Variable Workloads

Resolved! DQX usage outside Databricks

Moving tables between pipelines in production

When working with large data sets in Databricks, what are best practices to avoid memory out-of-memo

Change in UNRESOLVED_COLUMN error behavior in Runtime 14.3 LTS

Issue with databricks.sdk - AccountClient Service Principals API

Resolved! Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolution

The authentication value "ActiveDirectoryManagedIdentity" is not valid.

delta as streaming source, can the reader reads only newly appended rows?

Internal error 500 on databricks vector search endpoint

Databricks Nested Json Flattening

DATABRICKS LAKEFLOW SQL SERVER INGESTION PIPELINE ERROR

Join Us as a Local Community Builder!

Trouble Enabling File Events For An External Locat...

Want to use DataFrame equality functions but also ...

Loading CSV from private S3 bucket

DATA_SOURCE_NOT_FOUND Error with MongoDB (Suggesti...

Location of spark.scheduler.allocation.file