Data Engineering

Forum Posts

Sorted by:

by erigaud • Honored Contributor

12-02-2024 7:08:32 AM

2879 Views
8 replies
7 kudos

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Hello everyone !Since Databricks Asset Bundles can now be used to deploy dashboards, I'm wondering how to pass parameters so that the queries for the dev dashboard query the dev catalog, and the dashboard in stg query the stg catalog etc.Is there any...

Data Engineering

2879 Views
8 replies
7 kudos

12-02-2024 7:08:32 AM

View Replies

Latest Reply

jelmer
New Contributor II

31m ago

7 kudos

Has this been added yet ? having dashboards in asset bundles without support for parameterization is borderline broken

7 kudos

31m ago

7 More Replies

by Danish11052000 • New Contributor II

13 hours ago

32 Views
2 replies
0 kudos

Looking for Advice: Robust Backup Strategy for Databricks System Tables

HI,I’m planning to build a backup system for all Databricks system tables (audit, usage, price, history, etc.) to preserve data beyond retention limits. Currently, I’m using Spark Streaming with readStream + writeStream and checkpointing in LakeFlow ...

Data Engineering

32 Views
2 replies
0 kudos

13 hours ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

13 hours ago

0 kudos

Greetings @Danish11052000 , here’s a pragmatic way to choose, based on the nature of Databricks system tables and the guarantees you want. Bottom line For ongoing replication to preserve data beyond free retention, a Lakeflow Declarative Pipeline w...

0 kudos

13 hours ago

1 More Replies

by aiohi • New Contributor

10 hours ago

16 Views
1 replies
0 kudos

Claude Access to Workspace and Catalog

I have a question, if we have a Claude corporate account are we able to link that directly to the Playground of Databricks? So that we would not have to add files separately that are already available in our workspace or catalog.

Data Engineering

16 Views
1 replies
0 kudos

10 hours ago

View Replies

Latest Reply

MuthuLakshmi
Databricks Employee

3 hours ago

0 kudos

@aiohi yes you should be able to access the files available. https://www.databricks.com/blog/anthropic-claude-37-sonnet-now-natively-available-databricks https://support.claude.com/en/articles/12430928-using-databricks-for-data-analysis Docs for your...

0 kudos

3 hours ago

by Ericsson • New Contributor II

12-01-2021 8:45:17 AM

4761 Views
3 replies
1 kudos

SQL week format issue its not showing result as 01(ww)

Hi Folks,I've requirement to show the week number as ww format. Please see the below codeselect weekofyear(date_add(to_date(current_date, 'yyyyMMdd'), +35)). also plz refre the screen shot for result.

Data Engineering

4761 Views
3 replies
1 kudos

12-01-2021 8:45:17 AM

View Replies

Latest Reply

Lauri
New Contributor III

12-01-2021 10:33:59 AM

1 kudos

You can use lpad() to achieve the 'ww' format.

1 kudos

12-01-2021 10:33:59 AM

2 More Replies

by vartyg • Visitor

12 hours ago

22 Views
1 replies
0 kudos

Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse

We have a scenario where we need to mirror thousands of tables from on-premises Db2 databases to an Azure Lakehouse. The goal is to create mirror Delta tables in the Lakehouse.Since LakeFlow Connect currently does not support direct mirroring from on...

Data Engineering

22 Views
1 replies
0 kudos

12 hours ago

View Replies

Latest Reply

bidek56
Contributor

11 hours ago

0 kudos

Just use https://flink.apache.org

0 kudos

11 hours ago

by hgm251 • Visitor

16 hours ago

53 Views
2 replies
1 kudos

online tables to synced table, why is it creating a different service principal everytime?

Hello!We started to move our online tables to synced_tables. We just couldnt figure out why it is creating a new service principal everytime we ran the same code we use for online tables?try: fe.create_feature_spec(name=feature_spec_name ...

Data Engineering

53 Views
2 replies
1 kudos

16 hours ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

13 hours ago

1 kudos

Greetings @hgm251 , here are some things to consider. Things are working as designed: when you create a new Feature Serving or Model Serving endpoint, Databricks automatically provisions a dedicated service principal for that endpoint, and a fresh...

1 kudos

13 hours ago

1 More Replies

by Mathias_Peters • Contributor II

yesterday

29 Views
1 replies
0 kudos

Reading MongoDB collections into an RDD

Hi, for a Spark job which does some custom computation, I need to access data from a MongoDB collection and access the elements as of type Document. The reason for this is, that I want to apply some custom type serialization which is already implemen...

Data Engineering

29 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

13 hours ago

0 kudos

Greeting @Mathias_Peters , here are some suggestions for your consideration. Analysis You're encountering a common challenge when migrating to newer versions of the MongoDB Spark Connector. The architecture changed significantly between versions 2.x ...

0 kudos

13 hours ago

by pooja_bhumandla • New Contributor III

17 hours ago

21 Views
1 replies
0 kudos

Broadcast Join Failure in Streaming: Failed to store executor broadcast in BlockManager

Hi Databricks Community,I’m running a Structured Streaming job in Databricks with foreachBatch writing to a Delta table.Failed to store executor broadcast spark_join_relation_1622863(size = Some(67141632)) in BlockManager with storageLevel=StorageLev...

Data Engineering

21 Views
1 replies
0 kudos

17 hours ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

13 hours ago

0 kudos

Greetings @pooja_bhumandla , here are some helpful hints and tips. Diagnosis Your error indicates that a broadcast join operation is attempting to send ~64MB of data to executors, but the BlockManager cannot store it due to memory constraints. This c...

0 kudos

13 hours ago

by pabloratache • New Contributor

yesterday

49 Views
4 replies
2 kudos

Resolved! [FREE TRIAL] Missing All-Purpose Clusters Access - New Account

Issue Description: I created a new Databricks Free Trial account ("For Work" plan with $400 credits) but I don't have access to All-Purpose Clusters or PySpark compute. My workspace only shows SQL-only features.Current Setup:- Account Email: ronel.ra...

Data Engineering

49 Views
4 replies
2 kudos

yesterday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

13 hours ago

2 kudos

Ah, got it @pabloratache , I did some digging and here is what I found (learned a few things myself). Thanks for the detailed context — this behavior is expected for the current Databricks 14‑day Free Trial (“For Work” plan). What’s happening with ...

2 kudos

13 hours ago

3 More Replies

by SahiSammu • New Contributor

19 hours ago

58 Views
2 replies
0 kudos

Resolved! Auto Loader vs Batch for Large File Loads

Hi everyone,I'm seeing a dramatic difference in processing times between batch and streaming (Auto Loader) approaches for reading about 250,000 files from S3 in Databricks. My goal is to read metadata from these files and register it as a table (even...

Data Engineering

autoloader

Directory Listing

ingestion

58 Views
2 replies
0 kudos

19 hours ago

View Replies

Latest Reply

SahiSammu
New Contributor

13 hours ago

0 kudos

Thank you, Anudeep.I plan to tune Auto Loader by increasing the maxFilesPerTrigger parameter to optimize performance. My decision to use Auto Loader is primarily driven by its built-in backup functionality via cloudFiles.cleanSource.moveDestination, ...

0 kudos

13 hours ago

1 More Replies

by noorbasha534 • Valued Contributor II

03-09-2025 4:36:59 PM

3009 Views
1 replies
0 kudos

Databricks Jobs Failure Notification to Azure DevOps as incident

Dear all,Has anyone tried sending Databricks Jobs Failure Notification to Azure DevOps as incident? I see webhook as a OOTB destination for jobs. I am thinking to leverage it. But, like to hear any success stories of it or any other smart approaches....

Data Engineering

3009 Views
1 replies
0 kudos

03-09-2025 4:36:59 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

13 hours ago

0 kudos

Yes, there are successful approaches and best practices for sending Databricks Job Failure notifications to Azure DevOps as incidents, primarily by leveraging the webhook feature as an out-of-the-box (OOTB) destination in Databricks Jobs. The workflo...

0 kudos

13 hours ago

by aonurdemir • Contributor

a week ago

137 Views
3 replies
5 kudos

Resolved! Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

Data Engineering

137 Views
3 replies
5 kudos

a week ago

View Replies

Latest Reply

K_Anudeep
Databricks Employee

Wednesday

5 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

5 kudos

Wednesday

2 More Replies

by der • Contributor

17 hours ago

20 Views
1 replies
0 kudos

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

I want to read an Excel xlsx file on DBR 17.3. On the Cluster the library dev.mauch:spark-excel_2.13:4.0.0_0.31.2 is installed. V1 Implementation works fine:df = spark.read.format("dev.mauch.spark.excel").schema(schema).load(excel_file) display(df)V2...

Data Engineering

20 Views
1 replies
0 kudos

17 hours ago

View Replies

Latest Reply

der
Contributor

15 hours ago

0 kudos

If I build the spark-excel library with another short name (example "excelv2"), everything works fine. https://github.com/nightscape/spark-excel/issues/896#issuecomment-3486861693

0 kudos

15 hours ago

by Dhruv-22 • Contributor II

Thursday

219 Views
6 replies
6 kudos

Reading empty json file in serverless gives error

I ran a databricks notebook to do incremental loads from files in raw layer to bronze layer tables. Today, I encountered a case where the delta file was empty. I tried running it manually on the serverless compute and encountered an error.df = spark....

Data Engineering

219 Views
6 replies
6 kudos

Thursday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

Friday

6 kudos

Hello @Dhruv-22 , Can you share the schema of the df? Do you have a _corrupt_record column in your dataframe? If yes.. where are you getting it from, because you said its an empty file correct?As per the design ,Spark blocks queries that only referen...

6 kudos

Friday

5 More Replies

by vinaykumar • New Contributor III

02-18-2023 11:43:28 AM

10523 Views
7 replies
0 kudos

Log files are not getting deleted automatically after logRetentionDuration internal

Hi team Log files are not getting deleted automatically after logRetentionDuration internal from delta log folder and after analysis , I see checkpoint files are not getting created after 10 commits . Below table properties using spark.sql( f""" ...

Data Engineering

10523 Views
7 replies
0 kudos

02-18-2023 11:43:28 AM

View Replies

Latest Reply

alex307
Visitor

17 hours ago

0 kudos

Any body get any solution?

0 kudos

17 hours ago

6 More Replies

Databricks Community

Forum Posts

Databricks asset bundles and Dashboards - pass parameters depending on bundle target

Looking for Advice: Robust Backup Strategy for Databricks System Tables

Claude Access to Workspace and Catalog

SQL week format issue its not showing result as 01(ww)

Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse

online tables to synced table, why is it creating a different service principal everytime?

Reading MongoDB collections into an RDD

Broadcast Join Failure in Streaming: Failed to store executor broadcast in BlockManager

Resolved! [FREE TRIAL] Missing All-Purpose Clusters Access - New Account

Resolved! Auto Loader vs Batch for Large File Loads

Databricks Jobs Failure Notification to Azure DevOps as incident

Resolved! Broken s3 file paths in File Notifications for auto loader

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

Reading empty json file in serverless gives error

Log files are not getting deleted automatically after logRetentionDuration internal

Join Us as a Local Community Builder!

[FREE TRIAL] Missing All-Purpose Clusters Access -...

Auto Loader vs Batch for Large File Loads

Missing warehouse id/metadata for the system compu...

Cant view DAB deployed pipelines in Databricks UI

Deploying using Databricks asset bundles (DABs) in...