Data Engineering

Forum Posts

Sorted by:

by Galih • New Contributor II

12-13-2025 8:09:17 AM

815 Views
3 replies
4 kudos

Resolved! Spark structured streaming- calculate signal, help required! 🙏

Hello everyone!I’m very very new to Spark Structured Streaming, and not a data engineer I would appreciate guidance on how to efficiently process streaming data and emit only changed aggregate results over multiple time windows.Input Stream:Source: A...

Data Engineering

815 Views
3 replies
4 kudos

12-13-2025 8:09:17 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

12-14-2025 12:53:23 PM

4 kudos

I would implement stateful streaming by using transformWithStateInPandas to keep the state and implement the logic there. I would avoid doing stream-stream JOINs.

4 kudos

12-14-2025 12:53:23 PM

2 More Replies

by chirag_nagar • New Contributor

07-21-2025 2:08:11 AM

5190 Views
12 replies
2 kudos

Seeking Guidance on Migrating Informatica PowerCenter Workflows to Databricks using Lakebridge

Hi everyone,I hope you're doing well.I'm currently exploring options to migrate a significant number of Informatica PowerCenter workflows and mappings to Databricks. During my research, I came across Lakebridge, especially its integration with BladeB...

Data Engineering

5190 Views
12 replies
2 kudos

07-21-2025 2:08:11 AM

View Replies

Latest Reply

AnnaKing
Databricks Partner

12-01-2025 9:15:52 PM

2 kudos

Hi Chirag. At Kanerika Inc,, we've built a migration accelerator that automates 80% of the Informatica to Databricks migration process, saving you significant time, effort, and resources. You can check out the demo video of the same here - https://ww...

2 kudos

12-01-2025 9:15:52 PM

11 More Replies

by bercaakbayir • New Contributor

12-15-2025 2:05:58 AM

244 Views
1 replies
0 kudos

Data Ingestion - Missing Permission

Hi, I would like to use Data Ingestion through fivetran connectors to get data from external data source to databricks but I am getting missing permission error. I already have admin permission. I kindly ask your help regarding to this situation.Look...

Data Engineering

244 Views
1 replies
0 kudos

12-15-2025 2:05:58 AM

View Replies

Latest Reply

Raman_Unifeye
Honored Contributor III

12-15-2025 4:44:42 AM

0 kudos

@bercaakbayir - 2 areas to look at for permissions.Unity Catalog PermissionDestination‑level permissionsPlease Check,UC enabled for your workspace. [Metastore Admin, not workspace Admin]CREATE permissions on the target catalog - User or SP should hav...

0 kudos

12-15-2025 4:44:42 AM

by der • Contributor III

11-04-2025 6:07:43 AM

1865 Views
7 replies
5 kudos

Resolved! EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

I want to read an Excel xlsx file on DBR 17.3. On the Cluster the library dev.mauch:spark-excel_2.13:4.0.0_0.31.2 is installed. V1 Implementation works fine:df = spark.read.format("dev.mauch.spark.excel").schema(schema).load(excel_file) display(df)V2...

Data Engineering

1865 Views
7 replies
5 kudos

11-04-2025 6:07:43 AM

View Replies

Latest Reply

der
Contributor III

12-15-2025 2:28:21 AM

5 kudos

I reached out to Databricks support and they fixed it with December 2025 maintenance update. Now the open source excel reader and the new build in should work.https://learn.microsoft.com/en-gb/azure/databricks/query/formats/excel

5 kudos

12-15-2025 2:28:21 AM

6 More Replies

by dvd_lg_bricks • New Contributor III

12-08-2025 11:10:06 PM

1421 Views
10 replies
3 kudos

Resolved! Questions About Workers and Executors Configuration in Databricks

Hi everyone, sorry, I’m new here. I’m considering migrating to Databricks, but I need to clarify a few things first.When I define and launch an application, I see that I can specify the number of workers, and then later configure the number of execut...

Data Engineering

1421 Views
10 replies
3 kudos

12-08-2025 11:10:06 PM

View Replies

Latest Reply

Abeshek
Databricks Partner

12-12-2025 8:01:08 AM

3 kudos

Your Databricks question about workers versus executors. Many teams encounter the same sizing and configuration issues when evaluating a migration. At Kanerika, we help companies plan cluster architecture, optimize Spark workloads, and avoid overspen...

3 kudos

12-12-2025 8:01:08 AM

9 More Replies

by michal1228 • Databricks Partner

12-11-2025 2:06:07 AM

737 Views
4 replies
0 kudos

Import Python Modules with Git Folder Error

Dear Databricks Community, We encountered Bug in behaviour of import method explained in documentation https://learn.microsoft.com/en-us/azure/databricks/files/workspace-modules#autoreload-for-python-modules. Couple months ago we migrated our pipelin...

Data Engineering

737 Views
4 replies
0 kudos

12-11-2025 2:06:07 AM

View Replies

Latest Reply

michal1228
Databricks Partner

12-11-2025 2:41:36 AM

0 kudos

We're using DBR version 16.4

0 kudos

12-11-2025 2:41:36 AM

3 More Replies

by Fatimah-Tariq • New Contributor III

12-10-2025 12:04:32 AM

1660 Views
7 replies
4 kudos

Resolved! Writing to Foreign catalog

I have a running notebook job where I am doing some processing and writing the tables in a foreign catalog. It has been running successfully for about an year. The job is scheduled and runs on job cluster with DBR 16.2Recently, I had to add new noteb...

Data Engineering

1660 Views
7 replies
4 kudos

12-10-2025 12:04:32 AM

View Replies

Latest Reply

Fatimah-Tariq
New Contributor III

12-12-2025 4:25:39 AM

4 kudos

Thank you @Louis_Frolio! your suggestions really helped me understanding the scenario.

4 kudos

12-12-2025 4:25:39 AM

6 More Replies

by skuvisk • New Contributor II

12-11-2025 5:46:30 AM

500 Views
2 replies
1 kudos

Resolved! CLS function with lookup fails on dates

Hello,I'm conducting research on utilizing CLS in a project. We are implementing a lookup table to determine what tags a user can see. The CLS function looks like this:CREATE OR REPLACE FUNCTION {catalog}.{schema}.mask_column(value VARIANT, tag STRIN...

Data Engineering

500 Views
2 replies
1 kudos

12-11-2025 5:46:30 AM

View Replies

Latest Reply

skuvisk
New Contributor II

12-12-2025 1:24:45 AM

1 kudos

Thank you for an insightful answer @Poorva21. I conclude from your reasoning that this is the result of an optimization/engine error. It seems like I will need to resort to a workaround for the date columns then...

1 kudos

12-12-2025 1:24:45 AM

1 More Replies

by Jarno • New Contributor III

12-10-2025 1:47:05 AM

1752 Views
4 replies
1 kudos

Dangerous implicit type conversions on 17.3 LTS.

Starting with DBR 17 running Spark 4.0, spark.sql.ansi.enabled is set to true by default. With the flag enabled, strings are implicitly converted to numbers in a very dangerous manner. ConsiderSELECT 123='123';SELECT 123='123X';The first one is succe...

Data Engineering

1752 Views
4 replies
1 kudos

12-10-2025 1:47:05 AM

View Replies

Latest Reply

Jarno
New Contributor III

12-11-2025 11:32:04 PM

1 kudos

FYI, it seems I was mistaken about the behaviour of '::' on Spark 4.0.1. It does indeed work like CAST on both DBR 17.3 and Spark 4.0.1 and raises an exception on '123X'::int. The '?::' operator seems to be a Databricks only extension at the moment (...

1 kudos

12-11-2025 11:32:04 PM

3 More Replies

by prashant151 • New Contributor II

12-08-2025 11:27:21 PM

518 Views
2 replies
3 kudos

Resolved! Using Init Scipt to execute python notebook at all-purpose cluster level

HiWe have setup.py in my databricks workspace.This script is executed in other transformation scripts using%run /Workspace/Common/setup.pywhich consume lot of time. This setup.py internally calls other utilities notebooks using %run%run /Workspace/Co...

Data Engineering

518 Views
2 replies
3 kudos

12-08-2025 11:27:21 PM

View Replies

Latest Reply

iyashk-DB
Databricks Employee

12-11-2025 9:15:59 PM

3 kudos

You can’t “%run a notebook” from a cluster init script—init scripts are shell-only and meant for environment setup (install libs, set env vars), not for executing notebooks or sharing Python state across sessions. +1 to what @Raman_Unifeye has told. ...

3 kudos

12-11-2025 9:15:59 PM

1 More Replies

by nick_heybuddy • New Contributor II

12-11-2025 1:54:30 PM

397 Views
1 replies
2 kudos

Notebooks suddenly fails to retrieve Databricks secrets

At around 5:30 am (UTC+11) this morning, a number of our scheduled serverless notebook jobs started failing when attempting to retrieve Databricks secrets.We are able to retrieve the secrets using the databricks CLI and the jobs are run as a user tha...

Data Engineering

397 Views
1 replies
2 kudos

12-11-2025 1:54:30 PM

View Replies

Latest Reply

liu
Databricks Partner

12-11-2025 5:59:29 PM

2 kudos

me tooBut it looks like there hasn't been any official reply regarding this matter yet.

2 kudos

12-11-2025 5:59:29 PM

by demo-user • New Contributor III

12-11-2025 10:30:34 AM

477 Views
3 replies
0 kudos

Connecting to an S3 compatible bucket

Hi everyone,I’m trying to connect Databricks to an S3-compatible bucket using a custom endpoint URL and access keys.I’m using an Express account with Serverless SQL Warehouses, but the only external storage options I see are AWS IAM roles or Cloudfla...

Data Engineering

477 Views
3 replies
0 kudos

12-11-2025 10:30:34 AM

View Replies

Latest Reply

Raman_Unifeye
Honored Contributor III

12-11-2025 3:28:53 PM

0 kudos

Serverless compute does not support setting most Apache Spark configuration properties irrespective of Enterprise Tier as dB fully manages the underlying infrastructure.

0 kudos

12-11-2025 3:28:53 PM

2 More Replies

by lucami • Contributor

12-11-2025 8:23:10 AM

807 Views
3 replies
4 kudos

Resolved! What's the difference between dbmanagedidentity and a storage credential based on managed identity?

I’m looking for guidance on the differences between:dbmanagedidentity (the workspace-managed identity), andUnity Catalog storage credentials based on Azure Managed IdentitySpecifically, I’d like to understand:What are the key differences between thes...

Data Engineering

807 Views
3 replies
4 kudos

12-11-2025 8:23:10 AM

View Replies

Latest Reply

Raman_Unifeye
Honored Contributor III

12-11-2025 3:22:05 PM

4 kudos

use dbmanageidentity for non‑storage Azure services, such as Cosmos DB, Azure SQL, Event Hub, Key vault.

4 kudos

12-11-2025 3:22:05 PM

2 More Replies

by Malthe • Valued Contributor II

08-15-2025 3:39:32 AM

1059 Views
5 replies
6 kudos

Self-referential foreign key constraint for streaming tables

When defining a streaming tables using DLT (declarative pipelines), we can provide a schema which lets us define primary and foreign key constraints.However, references to self, i.e. the defining table, are not currently allowed (you get a "table not...

Data Engineering

1059 Views
5 replies
6 kudos

08-15-2025 3:39:32 AM

View Replies

Latest Reply

Malthe
Valued Contributor II

08-15-2025 10:49:29 PM

6 kudos

Each of these workarounds give up the optimizations that are enabled by the use of key constraints.

6 kudos

08-15-2025 10:49:29 PM

4 More Replies

by RobFer1985 • New Contributor

12-10-2025 10:46:09 AM

483 Views
1 replies
0 kudos

Databricks pipeline fails expectation on execute python script, throws error: Update FAILES

Hi Community,I'm new to Databricks and am trying to make and implement pipeline expectations, The pipelines work without errors and my job works. I've tried multiple ways to implement expectations, sql and python. I keep resolving the errors but end ...

Data Engineering

483 Views
1 replies
0 kudos

12-10-2025 10:46:09 AM

View Replies

Latest Reply

emma_s
Databricks Employee

12-11-2025 5:59:36 AM

0 kudos

Hey, I think it may be the row_count condition causing the issue. The expectation runs on each row and sees if the record meets the criteria in the expectation, so you're effectively asking count * on each record, which will always evaluate to 1 and...

0 kudos

12-11-2025 5:59:36 AM

Databricks Community

Forum Posts

Resolved! Spark structured streaming- calculate signal, help required! 🙏

Seeking Guidance on Migrating Informatica PowerCenter Workflows to Databricks using Lakebridge

Data Ingestion - Missing Permission

Resolved! EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

Resolved! Questions About Workers and Executors Configuration in Databricks

Import Python Modules with Git Folder Error

Resolved! Writing to Foreign catalog

Resolved! CLS function with lookup fails on dates

Dangerous implicit type conversions on 17.3 LTS.

Resolved! Using Init Scipt to execute python notebook at all-purpose cluster level

Notebooks suddenly fails to retrieve Databricks secrets

Connecting to an S3 compatible bucket

Resolved! What's the difference between dbmanagedidentity and a storage credential based on managed identity?

Self-referential foreign key constraint for streaming tables

Databricks pipeline fails expectation on execute python script, throws error: Update FAILES

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template

Use .R file in data pipeline