Data Engineering

Forum Posts

Sorted by:

by abetogi • New Contributor III

06-28-2023 11:32:45 AM

2010 Views
3 replies
2 kudos

AI

At Chevron we actively use Databricks to provide answers to business users. It was extremely interesting to see the use LakeHouseIQ initiatives as it can expedite how fast our users can receive their answers/reports. Is there any documentation that I...

Data Engineering

2010 Views
3 replies
2 kudos

06-28-2023 11:32:45 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

12-05-2025 4:36:48 AM

2 kudos

Guys, this thread was created in 2023. And the user who created it was last seen in 2023. I think there’s no point in resurrecting this thread

2 kudos

12-05-2025 4:36:48 AM

2 More Replies

by radha_krishna • New Contributor

12-04-2025 3:02:16 AM

809 Views
4 replies
1 kudos

"ai_parse_document()" is not a full OCR engine ? It's not extracting text from high quality image

I used "ai_parse_document()" to parse a PNG file that contains cat images and text. From the image, I wanted to extract all the cat names, but the response returned nothing. It seems that "ai_parse_document()" does not support rich image extraction....

Data Engineering

809 Views
4 replies
1 kudos

12-04-2025 3:02:16 AM

View Replies

Latest Reply

Raman_Unifeye
Honored Contributor III

12-04-2025 2:51:37 PM

1 kudos

@szymon_dybczak - yes, as it relies on AI models, there are chances of missing few cases due to non-deterministic nature of it. I have used it with vast number of PDFs in anger and it has worked pretty well in all those cases. Have not tried with PNG...

1 kudos

12-04-2025 2:51:37 PM

3 More Replies

by Michael_Galli • Databricks Partner

04-22-2022 2:46:57 AM

15871 Views
5 replies
8 kudos

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?

Data Engineering

15871 Views
5 replies
8 kudos

04-22-2022 2:46:57 AM

View Replies

Latest Reply

vr
Valued Contributor

12-04-2025 2:50:57 PM

8 kudos

Interesting that Microsoft deleted this project. Was there any announcement as to when, why, and what to do now?

8 kudos

12-04-2025 2:50:57 PM

4 More Replies

by Ravikumashi • Contributor

08-15-2023 8:00:05 AM

3541 Views
4 replies
1 kudos

Resolved! Issue with Logging Spark Events to LogAnalytics after Upgrading to Databricks 11.3 LTS

We have recently been in the process of upgrading our Databricks clusters to version 11.3 LTS. As part of this upgrade, we have been working on integrating the logging of Spark events to LogAnalytics using the repository available at https://github.c...

Data Engineering

3541 Views
4 replies
1 kudos

08-15-2023 8:00:05 AM

View Replies

Latest Reply

vr
Valued Contributor

12-04-2025 2:41:47 PM

1 kudos

Anyone knows why was this repository deleted?https://github.com/mspnp/spark-monitoring

1 kudos

12-04-2025 2:41:47 PM

3 More Replies

by LeoGaller • New Contributor II

04-29-2024 11:55:20 AM

10022 Views
5 replies
5 kudos

Resolved! What are the options for "spark_conf.spark.databricks.cluster.profile"?

Hey guys, I'm trying to find what are the options we can pass to spark_conf.spark.databricks.cluster.profileI know looking around that some of the available configs are singleNode and serverless, but there are others?Where is the documentation of it?...

Data Engineering

10022 Views
5 replies
5 kudos

04-29-2024 11:55:20 AM

View Replies

Latest Reply

LeoGallerDbx
Databricks Employee

12-04-2025 11:06:56 AM

5 kudos

Looking internally, I was able to find the following: For single node mode: the config should be set to 'singleNode' For standard mode: the config should NOT be set to 'singleNode' For serverless mode: the config should be set to 'serverless' So, in ...

5 kudos

12-04-2025 11:06:56 AM

4 More Replies

by deng_dev • New Contributor III

12-04-2025 3:57:27 AM

1369 Views
3 replies
3 kudos

Resolved! Databricks AutoLoader IncrementalListing mode changes

Hi everyone!I wan investigating how Databricks AutoLoader IncrementalListing mode changes will impact my current autoloader streams. Currently all of them are set to cloudFiles.useIncrementalListing: auto. So I wanted to check if any of streams is ac...

Data Engineering

1369 Views
3 replies
3 kudos

12-04-2025 3:57:27 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

12-04-2025 4:18:23 AM

3 kudos

Hi @deng_dev ,When cloudFiles.useIncrementalListing is set to auto, Auto Loader automatically detects whether a given directory is applicable for incremental listing by checking and comparing file paths of previously completed directory listings.To e...

3 kudos

12-04-2025 4:18:23 AM

2 More Replies

by Pratikmsbsvm • Contributor

12-04-2025 6:28:03 AM

2989 Views
2 replies
2 kudos

Resolved! Establishing a Connection between ADLS Gen2, Databricks and ADF In Microsoft Azure

Hello,May, Someone please help me with establishing connection between ADLS Gen2, Databricks and ADF, full steps if possibble. Do I need to route through key-vault, this is i am first time doing in production,.May somebody please share detailed step ...

Data Engineering

2989 Views
2 replies
2 kudos

12-04-2025 6:28:03 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

12-04-2025 7:51:21 AM

2 kudos

For a production environment (ADF as orchestrator, ADLS Gen2 as storage, Databricks for PySpark transformations), follow Microsoft-recommended best practices:Databricks → ADLS Gen2: Use Unity Catalog with Azure Managed Identity (via Access Connector)...

2 kudos

12-04-2025 7:51:21 AM

1 More Replies

by sumit2jha • New Contributor III

07-24-2023 8:39:09 PM

8148 Views
7 replies
5 kudos

Resolved! ADE 2.1 Unable to run Classroom-Setup-3.1

%run ../Includes/Classroom-Setup-3.1After running the above command, getting this error message, attached error screenshot.AnalysisException: You are trying to read a Delta table `spark_catalog`.`dbacademy_sumit_s_jha_hk_ey_com_adewd_3_1`.`date_looku...

Data Engineering

8148 Views
7 replies
5 kudos

07-24-2023 8:39:09 PM

View Replies

Latest Reply

sumit2jha
New Contributor III

07-28-2023 2:33:16 AM

5 kudos

First run this file before starting. Problem will be solved

5 kudos

07-28-2023 2:33:16 AM

6 More Replies

by Mathias_Peters • Contributor II

11-20-2025 9:03:06 AM

478 Views
2 replies
1 kudos

Resolved! Streamed DLT Pipeline using a lookup table

Hi, I need to join three streams/streamed data sets in a DLT pipeline. I am reading from a Kinesis data stream a sequence of events per group key. The logically first of the events per group contains a marker which determines whether that group is re...

Data Engineering

478 Views
2 replies
1 kudos

11-20-2025 9:03:06 AM

View Replies

Latest Reply

Mathias_Peters
Contributor II

12-04-2025 6:41:29 AM

1 kudos

hi @mark_ott , thank you for your help. I have a follow up question regarding data completeness and out of order processing. I have decided to go with the delta table option since super low latency is not an issue and since this option has (seemingly...

1 kudos

12-04-2025 6:41:29 AM

1 More Replies

by hidden • New Contributor II

12-04-2025 3:37:30 AM

454 Views
1 replies
0 kudos

Resolved! Delta live tables upsert logic without apply changes or autocdc logic

i want to create delta live tables which should be streaming and i want to use the manual upsert logic without using the apply changes api or autocdc api . how can i do it

Data Engineering

454 Views
1 replies
0 kudos

12-04-2025 3:37:30 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

12-04-2025 6:40:14 AM

0 kudos

Hello @hidden , Creating Streaming Delta Live Tables with Manual Upsert Logic Let’s dig in… this question comes up a lot when folks want upsert behavior in DLT but aren’t using APPLY CHANGES or Auto-CDC. The short version: DLT doesn’t let you drop...

0 kudos

12-04-2025 6:40:14 AM

by dhruvs2 • New Contributor II

10-27-2025 10:48:31 AM

1682 Views
4 replies
5 kudos

How to trigger a Databricks job only after multiple other jobs have completed

We have a use case where Job C should start only after both Job A and Job B have successfully completed.In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.Is there a way to configure something similar in...

Data Engineering

1682 Views
4 replies
5 kudos

10-27-2025 10:48:31 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

10-29-2025 2:40:48 AM

5 kudos

Hi @dhruvs2 .A Lakeflow Job consists of tasks. The tasks can be things like notebooks or other jobs. If you want to orchestrate many jobs, I'd agree that having a job to do this is your best bet . Then you can setup the dependencies as you require.I...

5 kudos

10-29-2025 2:40:48 AM

3 More Replies

by andreacfm • New Contributor II

12-04-2025 5:45:07 AM

672 Views
1 replies
1 kudos

Resolved! Simple append only in DLT

I am facing an issue trying to find a way to insert some computed rows into a table in the context of a dlt pipeline.My use case is extremely simple. Moving from bronze to silver I update several tables using a mix of streaming and materialized table...

Data Engineering

672 Views
1 replies
1 kudos

12-04-2025 5:45:07 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

12-04-2025 6:31:13 AM

1 kudos

Greetings @andreacfm, You’re not missing a thing. What you’re seeing is a known limitation in how DLT/Lakeflow pipelines handle append_flow. It really does expect a streaming source, and the once=True flag only fires during the first run of the pipe...

1 kudos

12-04-2025 6:31:13 AM

by keeplearning • New Contributor II

01-09-2023 3:04:22 AM

38796 Views
5 replies
3 kudos

Resolved! How can I send custom email notification

I am using the edit notification in databricks to send email notification in case of workflow failure or success. How can I add additional information to this report for example if I want to notify about number of rows got processed or added how can ...

Data Engineering

38796 Views
5 replies
3 kudos

01-09-2023 3:04:22 AM

View Replies

Latest Reply

Kundan579
New Contributor II

12-04-2025 4:02:10 AM

3 kudos

Anybody tried to configure the custom email notification using logic app POST URL? asking because i am stuck not able to configure with right way, basically i am deploying the job using DAB, and i have created a logic app with custom email, now i am ...

3 kudos

12-04-2025 4:02:10 AM

4 More Replies

by SRJDB • New Contributor II

12-04-2025 2:02:03 AM

831 Views
1 replies
0 kudos

Why am I getting a cast invalid input error when using display()?

I have a spark data frame. It consists of a single column, in string format, with 28750 values in it. The values are all 10 digits long. I want to look at the data, like this:my_dataframe.display()But this returns the following error:[CAST_INVALID_IN...

Data Engineering

831 Views
1 replies
0 kudos

12-04-2025 2:02:03 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

12-04-2025 3:10:12 AM

0 kudos

Hi @SRJDB ,Could you execute my_dataframe.printSchema() and attach result here?

0 kudos

12-04-2025 3:10:12 AM

by Magesh_Kumar • New Contributor II

12-04-2025 12:39:50 AM

781 Views
3 replies
0 kudos

[CONFIG_NOT_AVAILABLE] Configuration spark.sql.legacy.timeParserPolicy is not available. SQLSTATE:

Running a DBT into the development environment, QA and PROD. Same config is working in QA and PROD but in dev facing this issue [CONFIG_NOT_AVAILABLE] Configuration spark.sql.legacy.timeParserPolicy is not available. SQLSTATE: and the compute type is...

Data Engineering

781 Views
3 replies
0 kudos

12-04-2025 12:39:50 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

12-04-2025 2:34:07 AM

0 kudos

SET legacy_time_parser_policy = legacy; https://docs.databricks.com/aws/en/sql/language-manual/parameters/legacy_time_parser_policy https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup your_profile_name: target: dev outputs...

0 kudos

12-04-2025 2:34:07 AM

2 More Replies

Databricks Community

Forum Posts

AI

"ai_parse_document()" is not a full OCR engine ? It's not extracting text from high quality image

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Resolved! Issue with Logging Spark Events to LogAnalytics after Upgrading to Databricks 11.3 LTS

Resolved! What are the options for "spark_conf.spark.databricks.cluster.profile"?

Resolved! Databricks AutoLoader IncrementalListing mode changes

Resolved! Establishing a Connection between ADLS Gen2, Databricks and ADF In Microsoft Azure

Resolved! ADE 2.1 Unable to run Classroom-Setup-3.1

Resolved! Streamed DLT Pipeline using a lookup table

Resolved! Delta live tables upsert logic without apply changes or autocdc logic

How to trigger a Databricks job only after multiple other jobs have completed

Resolved! Simple append only in DLT

Resolved! How can I send custom email notification

Why am I getting a cast invalid input error when using display()?

[CONFIG_NOT_AVAILABLE] Configuration spark.sql.legacy.timeParserPolicy is not available. SQLSTATE:

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template

Use .R file in data pipeline