cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Digvijay_11
by New Contributor
  • 31 Views
  • 1 replies
  • 0 kudos

Lakeflow Spark Declarative Pipeline

How we can run a SDP pipeline in parallel manner with dynamic parameter parsing on pipeline level. How we can consume job level parameter in Pipeline. If similar name parameters are defined in pipeline level then job level parameters are getting over...

  • 31 Views
  • 1 replies
  • 0 kudos
Latest Reply
osingh
Contributor
  • 0 kudos

To run an SDP (Spark Declarative Pipeline) in parallel with dynamic parameters, you need to understand that SDP is "smart"—it builds a dependency graph and runs everything it can at the same time by default.Here is a simple breakdown of how to handle...

  • 0 kudos
r0nald
by New Contributor II
  • 10684 Views
  • 5 replies
  • 1 kudos

UDF not working inside transform() & lambda (SQL)

Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?%sqlcreate or replace function status_map(status int)returns stringreturn map(10, "STATU...

  • 10684 Views
  • 5 replies
  • 1 kudos
Latest Reply
marcogrcr
Visitor
  • 1 kudos

Scoped variables in a transform() are not accessible by UDFs. However, you can workaround this using explode():# equivalent of: select transform(arr, e -> status_map(e.v1)) from s1 select collect_list(status_map(status_id)) from explode((select trans...

  • 1 kudos
4 More Replies
dpc
by Contributor
  • 25 Views
  • 1 replies
  • 0 kudos

Case insensitive data

For all it's positives, one of the first general issues we had with databricks was case sensitivity.We have a lot of data specific filters in our codeProblem is, we land and view data from lots of different case insensitive source systems e.g. SQL Se...

  • 25 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @dpc ,I think you can try to use a collation for that purpose. A collation is a set of rules that determines how string comparisons are performed. Collations are used to compare strings in a case-insensitive, accent-insensitive, or trailing space ...

  • 0 kudos
seefoods
by Valued Contributor
  • 60 Views
  • 3 replies
  • 2 kudos

Resolved! write both logging error Pyspark and Python exceptions

Hello guyz, Happy new year and best wishes for all of us. I am catching both Pyspark and Python exceptions but i want to write this logging error inside a delta table when i logging. Someone knows the best practise for this ? Thanks Cordially, 

  • 60 Views
  • 3 replies
  • 2 kudos
Latest Reply
seefoods
Valued Contributor
  • 2 kudos

Thanks a lot @szymon_dybczak 

  • 2 kudos
2 More Replies
DataGuy2
by Visitor
  • 22 Views
  • 0 replies
  • 0 kudos

Databricks notebook Issue

Hello Databricks Community,I’m facing multiple issues while working in Azure Databricks notebooks, and I’d appreciate guidance or troubleshooting suggestions.Issue 1: Failed to reconnectWhile running a notebook, I frequently see a “Failed to reconnec...

  • 22 Views
  • 0 replies
  • 0 kudos
Digvijay_11
by New Contributor
  • 32 Views
  • 1 replies
  • 0 kudos

Few queries on Autoloader

How to retrieve filename and file path from the trigger and consume in Databricks Notebook dynamicallyIf the same file is being modified with no change in name but in data then will this trigger work? If not what is the walkaround?In landing we are g...

  • 32 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Digvijay_11 ,1. You can use metadata column for that purpose File metadata column - Azure Databricks | Microsoft Learn2. With the default setting (cloudFiles.allowOverwrites = false), files are processed exactly once. When a file is appended to o...

  • 0 kudos
ChristianRRL
by Honored Contributor
  • 59 Views
  • 2 replies
  • 1 kudos

Spark Declarative Pipelines use in All-purpose compute?

Hi there, I know Spark Declarative Pipelines (previously DLT) has undergone some changes since last year and is even now open source (announcement). For a long while, I know that SDP/DLT was locked to only working with job compute. I'm wondering, wit...

  • 59 Views
  • 2 replies
  • 1 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 1 kudos

Not today and likely not in future as well. 1. If you create a pipeline yourself, the pipeline creates it's own compute. This can be classic (ie you choose the nodes/node-types) or serverless (recommended) 2. If you define a Streaming Table or Materi...

  • 1 kudos
1 More Replies
PabloCSD
by Valued Contributor II
  • 124 Views
  • 4 replies
  • 1 kudos

How to use/install a driver in Spark Declarative Pipelines (ETL)?

Salutations,I'm using SDP for an ETL that extracts data from HANA and put it in the Unity Catalog. I defined a Policy with the needed driver:But I get this error:An error occurred while calling o1013.load. : java.lang.ClassNotFoundException: com.sap....

PabloCSD_0-1768228884826.png
  • 124 Views
  • 4 replies
  • 1 kudos
Latest Reply
anshu_roy
Databricks Employee
  • 1 kudos

  Hello,It is recommended that you upload libraries to source locations that support installation onto compute with standard access mode (formerly shared access mode), as this is the recommended mode for all workloads. Please refer the documentation ...

  • 1 kudos
3 More Replies
ChristianRRL
by Honored Contributor
  • 64 Views
  • 2 replies
  • 3 kudos

Resolved! Serverless Compute Spark Version Flexibility?

Hi there, I'm wondering what determines the Serverless Compute spark version? Is it based on the current DBR LTS? And is there a way to modify the spark version for serverless compute?For example, when I check the spark version for our serverless com...

ChristianRRL_0-1768409059721.png ChristianRRL_1-1768409577998.png
  • 64 Views
  • 2 replies
  • 3 kudos
Latest Reply
Databricks77
  • 3 kudos

Serverless compute always run on the latest runtime version. You cannot choose it like in standard compute.

  • 3 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 131 Views
  • 4 replies
  • 7 kudos

Resolved! Testing Spark Declarative Pipeline in Docker Container > PySparkRuntimeError

Hi there, I see via an announcement last year that Spark Declarative Pipeline (previously DLT) was getting open sourced into Apache Spark, and I see that this recently is true as of Apache 4.1:Spark Declarative Pipelines Programming Guide I'm trying ...

ChristianRRL_0-1768361209159.png
  • 131 Views
  • 4 replies
  • 7 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 7 kudos

Hi @ChristianRRL ,In addition to @osingh 's answers, check out this old but good blog post about how to structure the pipelines's code to enable dev and test cycle: https://www.databricks.com/blog/applying-software-development-devops-best-practices-d...

  • 7 kudos
3 More Replies
Chandana_Ramesh
by New Contributor II
  • 118 Views
  • 3 replies
  • 1 kudos

Lakebridge SetUp Issue

Hi,I'm getting the below error upon executing databricks labs lakebridge analyze command. All the dependencies have been installed before execution of the command. Can someone please give a solution, or suggest if anything is missing? Below attached ...

  • 118 Views
  • 3 replies
  • 1 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 1 kudos

Hi Chandana_Ramesh, Please rerun the command with a --debug flag and share the command and the whole output. From the message that you shared, it looks like the Analyzer.exe binary is not accessible: Verify the binary exists and is accessible: C:\Use...

  • 1 kudos
2 More Replies
Anish_2
by New Contributor II
  • 66 Views
  • 2 replies
  • 0 kudos

daabricks workflow design

Hello Team,I have use-case in which i want to trigger another dlt pipeline if 1 table got succeded in my parent dlt pipeline. I dont want to create pipeline to pipeline dependency. Is there any way to create table to pipeline dependency?Thank youAnis...

Data Engineering
deltalivetable
workflowdesign
  • 66 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

@Anish_2 - TUT is the solution. in TUT, instead of the parent pipeline "pushing" a notification, the child job is "pulled" into action by a metadata change.Set it up as below.Create a Databricks Job and add a Pipeline task pointing to your Secondary ...

  • 0 kudos
1 More Replies
Alf01
by New Contributor
  • 128 Views
  • 1 replies
  • 1 kudos

Databricks Serverless Pipelines - Incremental Refresh Doubts

Hello everyone,I would like to clarify some doubts regarding how Databricks Pipelines (DLT) behave when using serverless pipelines with incremental updates.In general, incremental processing is enabled and works as expected. However, I have observed ...

  • 128 Views
  • 1 replies
  • 1 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 1 kudos

Hi Alf0 and welcome to the Databricks Community!The Lakeflow Spark Declarative Pipelines (SDP) cost model considers multiple factors when deciding whether to perform an incremental refresh or a full recompute. It makes a best-effort attempt to increm...

  • 1 kudos
Dhruv-22
by Contributor II
  • 142 Views
  • 4 replies
  • 0 kudos

Feature request: Allow to set value as null when not present in schema evolution

I want to raise a feature request as follows.Currently, in the Automatic schema evolution for merge when a column is not present in the source dataset it is not changed in the target dataset. For e.g.%sql CREATE OR REPLACE TABLE edw_nprd_aen.bronze.t...

Dhruv22_0-1767970990008.png Dhruv22_1-1767971051176.png Dhruv22_2-1767971116934.png Dhruv22_3-1767971213212.png
  • 142 Views
  • 4 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 0 kudos

@Dhruv-22 ProblemWhen using MERGE INTO ... WITH SCHEMA EVOLUTION, if a column exists in the target table but is not present in the source dataset, that column is left unchanged on matched rows.Solution ThinkingThis can be emulated by introspecting th...

  • 0 kudos
3 More Replies
Dhruv-22
by Contributor II
  • 112 Views
  • 3 replies
  • 0 kudos

Merge with schema evolution fails because of upper case columns

The following is a minimal reproducible example of what I'm facing right now.%sql CREATE OR REPLACE TABLE edw_nprd_aen.bronze.test_table ( id INT ); INSERT INTO edw_nprd_aen.bronze.test_table VALUES (1); SELECT * FROM edw_nprd_aen.bronze.test_tab...

Dhruv22_0-1768233514715.png Dhruv22_1-1768233551139.png Dhruv22_0-1768234077162.png
  • 112 Views
  • 3 replies
  • 0 kudos
Latest Reply
css-1029
New Contributor
  • 0 kudos

Hi @Dhruv-22,It's actually not a bug. Let me explain what's happening.The Root CauseThe issue stems from how schema evolution works with Delta Lake's MERGE statement, combined with Spark SQL's case-insensitivity settings.Here's the key insight: spark...

  • 0 kudos
2 More Replies
Labels