cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

abetogi
by New Contributor III
  • 2010 Views
  • 3 replies
  • 2 kudos

AI

At Chevron we actively use Databricks to provide answers to business users. It was extremely interesting to see the use LakeHouseIQ initiatives as it can expedite how fast our users can receive their answers/reports. Is there any documentation that I...

  • 2010 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Guys, this thread was created in 2023. And the user who created it was last seen in 2023. I think there’s no point in resurrecting this thread

  • 2 kudos
2 More Replies
radha_krishna
by New Contributor
  • 809 Views
  • 4 replies
  • 1 kudos

"ai_parse_document()" is not a full OCR engine ? It's not extracting text from high quality image

 I used "ai_parse_document()" to parse a PNG file that contains cat images and text. From the image, I wanted to extract all the cat names, but the response returned nothing. It seems that "ai_parse_document()" does not support rich image extraction....

  • 809 Views
  • 4 replies
  • 1 kudos
Latest Reply
Raman_Unifeye
Honored Contributor III
  • 1 kudos

@szymon_dybczak - yes, as it relies on AI models, there are chances of missing few cases due to non-deterministic nature of it. I have used it with vast number of PDFs in anger and it has worked pretty well in all those cases. Have not tried with PNG...

  • 1 kudos
3 More Replies
Michael_Galli
by Databricks Partner
  • 15871 Views
  • 5 replies
  • 8 kudos

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?

  • 15871 Views
  • 5 replies
  • 8 kudos
Latest Reply
vr
Valued Contributor
  • 8 kudos

Interesting that Microsoft deleted this project. Was there any announcement as to when, why, and what to do now?

  • 8 kudos
4 More Replies
Ravikumashi
by Contributor
  • 3541 Views
  • 4 replies
  • 1 kudos

Resolved! Issue with Logging Spark Events to LogAnalytics after Upgrading to Databricks 11.3 LTS

We have recently been in the process of upgrading our Databricks clusters to version 11.3 LTS. As part of this upgrade, we have been working on integrating the logging of Spark events to LogAnalytics using the repository available at https://github.c...

  • 3541 Views
  • 4 replies
  • 1 kudos
Latest Reply
vr
Valued Contributor
  • 1 kudos

Anyone knows why was this repository deleted?https://github.com/mspnp/spark-monitoring

  • 1 kudos
3 More Replies
LeoGaller
by New Contributor II
  • 10022 Views
  • 5 replies
  • 5 kudos

Resolved! What are the options for "spark_conf.spark.databricks.cluster.profile"?

Hey guys, I'm trying to find what are the options we can pass to spark_conf.spark.databricks.cluster.profileI know looking around that some of the available configs are singleNode and serverless, but there are others?Where is the documentation of it?...

  • 10022 Views
  • 5 replies
  • 5 kudos
Latest Reply
LeoGallerDbx
Databricks Employee
  • 5 kudos

Looking internally, I was able to find the following: For single node mode: the config should be set to 'singleNode' For standard mode: the config should NOT be set to 'singleNode' For serverless mode: the config should be set to 'serverless' So, in ...

  • 5 kudos
4 More Replies
deng_dev
by New Contributor III
  • 1369 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks AutoLoader IncrementalListing mode changes

Hi everyone!I wan investigating how Databricks AutoLoader IncrementalListing mode changes will impact my current autoloader streams. Currently all of them are set to cloudFiles.useIncrementalListing: auto. So I wanted to check if any of streams is ac...

deng_dev_0-1764849398594.png
  • 1369 Views
  • 3 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @deng_dev ,When cloudFiles.useIncrementalListing is set to auto, Auto Loader automatically detects whether a given directory is applicable for incremental listing by checking and comparing file paths of previously completed directory listings.To e...

  • 3 kudos
2 More Replies
Pratikmsbsvm
by Contributor
  • 2989 Views
  • 2 replies
  • 2 kudos

Resolved! Establishing a Connection between ADLS Gen2, Databricks and ADF In Microsoft Azure

Hello,May, Someone please help me with establishing connection between ADLS Gen2, Databricks and ADF, full steps if possibble. Do I need to route through key-vault, this is i am first time doing in production,.May somebody please share detailed step ...

  • 2989 Views
  • 2 replies
  • 2 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 2 kudos

For a production environment (ADF as orchestrator, ADLS Gen2 as storage, Databricks for PySpark transformations), follow Microsoft-recommended best practices:Databricks → ADLS Gen2: Use Unity Catalog with Azure Managed Identity (via Access Connector)...

  • 2 kudos
1 More Replies
sumit2jha
by New Contributor III
  • 8148 Views
  • 7 replies
  • 5 kudos

Resolved! ADE 2.1 Unable to run Classroom-Setup-3.1

%run ../Includes/Classroom-Setup-3.1After running the above command, getting this error message, attached error screenshot.AnalysisException: You are trying to read a Delta table `spark_catalog`.`dbacademy_sumit_s_jha_hk_ey_com_adewd_3_1`.`date_looku...

  • 8148 Views
  • 7 replies
  • 5 kudos
Latest Reply
sumit2jha
New Contributor III
  • 5 kudos

First run this file before starting. Problem will be solved 

  • 5 kudos
6 More Replies
Mathias_Peters
by Contributor II
  • 478 Views
  • 2 replies
  • 1 kudos

Resolved! Streamed DLT Pipeline using a lookup table

Hi, I need to join three streams/streamed data sets in a DLT pipeline. I am reading from a Kinesis data stream a sequence of events per group key. The logically first of the events per group contains a marker which determines whether that group is re...

  • 478 Views
  • 2 replies
  • 1 kudos
Latest Reply
Mathias_Peters
Contributor II
  • 1 kudos

hi @mark_ott , thank you for your help. I have a follow up question regarding data completeness and out of order processing. I have decided to go with the delta table option since super low latency is not an issue and since this option has (seemingly...

  • 1 kudos
1 More Replies
hidden
by New Contributor II
  • 454 Views
  • 1 replies
  • 0 kudos

Resolved! Delta live tables upsert logic without apply changes or autocdc logic

i want to create  delta live tables which should be streaming  and i want to use the manual upsert logic without using the apply changes api or autocdc api . how can i do it  

  • 454 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hello @hidden ,  Creating Streaming Delta Live Tables with Manual Upsert Logic   Let’s dig in… this question comes up a lot when folks want upsert behavior in DLT but aren’t using APPLY CHANGES or Auto-CDC. The short version: DLT doesn’t let you drop...

  • 0 kudos
dhruvs2
by New Contributor II
  • 1682 Views
  • 4 replies
  • 5 kudos

How to trigger a Databricks job only after multiple other jobs have completed

We have a use case where Job C should start only after both Job A and Job B have successfully completed.In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.Is there a way to configure something similar in...

  • 1682 Views
  • 4 replies
  • 5 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 5 kudos

Hi @dhruvs2  .A Lakeflow Job consists of tasks. The tasks can be things like notebooks or other jobs. If you want to orchestrate many jobs, I'd agree that having a job to do this is your best bet . Then you can setup the dependencies as you require.I...

  • 5 kudos
3 More Replies
andreacfm
by New Contributor II
  • 672 Views
  • 1 replies
  • 1 kudos

Resolved! Simple append only in DLT

I am facing an issue trying to find a way to insert some computed rows into a table in the context of a dlt pipeline.My use case is extremely simple. Moving from bronze to silver I update several tables using a mix of streaming and materialized table...

  • 672 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @andreacfm,  You’re not missing a thing. What you’re seeing is a known limitation in how DLT/Lakeflow pipelines handle append_flow. It really does expect a streaming source, and the once=True flag only fires during the first run of the pipe...

  • 1 kudos
keeplearning
by New Contributor II
  • 38796 Views
  • 5 replies
  • 3 kudos

Resolved! How can I send custom email notification

I am using the edit notification in databricks to send email notification in case of workflow failure or success. How can I add additional information to this report for example if I want to notify about number of rows got processed or added how can ...

  • 38796 Views
  • 5 replies
  • 3 kudos
Latest Reply
Kundan579
New Contributor II
  • 3 kudos

Anybody tried to configure the custom email notification using logic app POST URL? asking because i am stuck not able to configure with right way, basically i am deploying the job using DAB, and i have created a logic app with custom email, now i am ...

  • 3 kudos
4 More Replies
SRJDB
by New Contributor II
  • 831 Views
  • 1 replies
  • 0 kudos

Why am I getting a cast invalid input error when using display()?

I have a spark data frame. It consists of a single column, in string format, with 28750 values in it. The values are all 10 digits long. I want to look at the data, like this:my_dataframe.display()But this returns the following error:[CAST_INVALID_IN...

  • 831 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @SRJDB ,Could you execute my_dataframe.printSchema() and attach result here? 

  • 0 kudos
Magesh_Kumar
by New Contributor II
  • 781 Views
  • 3 replies
  • 0 kudos

[CONFIG_NOT_AVAILABLE] Configuration spark.sql.legacy.timeParserPolicy is not available. SQLSTATE:

Running a DBT into the development environment, QA and PROD. Same config is working in QA and PROD but in dev facing this issue [CONFIG_NOT_AVAILABLE] Configuration spark.sql.legacy.timeParserPolicy is not available. SQLSTATE: and the compute type is...

  • 781 Views
  • 3 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 0 kudos

  SET legacy_time_parser_policy = legacy; https://docs.databricks.com/aws/en/sql/language-manual/parameters/legacy_time_parser_policy  https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup your_profile_name: target: dev outputs...

  • 0 kudos
2 More Replies
Labels