cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Dhruv-22
by Contributor III
  • 1130 Views
  • 10 replies
  • 0 kudos

Merge with schema evolution fails because of upper case columns

The following is a minimal reproducible example of what I'm facing right now.%sql CREATE OR REPLACE TABLE edw_nprd_aen.bronze.test_table ( id INT ); INSERT INTO edw_nprd_aen.bronze.test_table VALUES (1); SELECT * FROM edw_nprd_aen.bronze.test_tab...

Dhruv22_0-1768233514715.png Dhruv22_1-1768233551139.png Dhruv22_0-1768234077162.png
  • 1130 Views
  • 10 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Dhruv-22 , I did check with our product teams and they agree with what I wrote above, and that if you have a support contract to open a ticket about it. They are aware of this behavior and the workaround needed. However, they haven't seen this af...

  • 0 kudos
9 More Replies
cdn_yyz_yul
by Contributor II
  • 1259 Views
  • 13 replies
  • 4 kudos

Resolved! unionbyname several streaming dataframes of different sources

Is the following type of union safe with spark structured streaming?union multiple streaming dataframes, and each from a different source.Anything better solution ?for example, df1 = spark.readStream.table(f"{bronze_catalog}.{bronze_schema}.table1") ...

  • 1259 Views
  • 13 replies
  • 4 kudos
Latest Reply
cdn_yyz_yul
Contributor II
  • 4 kudos

Thanks @Kirankumarbs, Thanks @SteveOstrowski You have provided very useful information. 

  • 4 kudos
12 More Replies
ChrisHunt
by New Contributor III
  • 534 Views
  • 2 replies
  • 1 kudos

Resolved! How to stop Databricks adding quotes to multi-line selections

I'm using a query to generate some YML code from my tables, and running into an annoying behaviour. Here's a simplified example...Run this query in a notebook or the SQL editor:SELECT 'foo\nbar' FROM system.information_schema.tables LIMIT 10You get a...

foobar.png
  • 534 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @ChrisHunt, As Ale_Armillotta mentioned, any field that contains a newline is wrapped in double quotes so that each row still represents a single CSV record. I think that is a logical and expected behaviour. There is currently no setting to turn t...

  • 1 kudos
1 More Replies
CodeInYellow
by New Contributor II
  • 388 Views
  • 2 replies
  • 2 kudos

Resolved! Pool Max Capacity and Cluster Creation

Hello,I have a theoretical question for which I have not been able to find a clear answer in the documentation.When a cluster is created using an instance pool, what exactly is checked when the pool is asked to provide nodes?More specifically, does t...

  • 388 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Greetings @CodeInYellow , I did some research and here is what I found.  Your first scenario is correct: the pool checks actual current usage, not possible future usage across attached clusters. Using your example with a pool max capacity of 23: Clu...

  • 2 kudos
1 More Replies
maikel
by Contributor II
  • 567 Views
  • 6 replies
  • 4 kudos

Resolved! Job description

Hello!Is there a way to add some job description with some information about parameters meaning e.g.? Or only notebook which is the source of job can be used for that?Thank you!

  • 567 Views
  • 6 replies
  • 4 kudos
Latest Reply
maikel
Contributor II
  • 4 kudos

OK! I found it:resources: jobs: example_job: name: example_job${bundle.target} description: "y description"Thanks a lot!

  • 4 kudos
5 More Replies
drag7ter
by Contributor
  • 12698 Views
  • 8 replies
  • 4 kudos

Resolved! foreachBatch doesn't work in structured streaming

I' m trying to print out number of rows in the batch, but seems it doesn't work properly. I have 1 node compute optimized cluster and run in notebook this code:# Logging the row count using a streaming-friendly approach def log_row_count(batch_df, ba...

Capture.PNG
  • 12698 Views
  • 8 replies
  • 4 kudos
Latest Reply
Malthe
Valued Contributor II
  • 4 kudos

@szymon_dybczak in my testing, the print output does not appear anywhere. There is no trace of them anywhere,  neither in the notebook or in driver logs.

  • 4 kudos
7 More Replies
Seunghyun
by Contributor
  • 462 Views
  • 3 replies
  • 1 kudos

Resolved! Issue with SQL Alert Task in Databricks Asset Bundles: Unknown Alert ID and alerts-v2 URL Mismatch

Hello,I have deployed a Databricks SQL Alert using Databricks Asset Bundles (DABs), and I’ve also deployed a Job to execute this alert through the same bundle.Below is a snippet of the task configuration I used:```- task_key: "alerts_error"sql_task:a...

  • 462 Views
  • 3 replies
  • 1 kudos
Latest Reply
pradeep_singh
Contributor III
  • 1 kudos

You will probably have to come up with a solution that involves running the SQL that creates the alert instead of using DABS to create the alert until Alert V2 becomes GA . 

  • 1 kudos
2 More Replies
Danny_Lee
by Databricks Partner
  • 537 Views
  • 3 replies
  • 2 kudos

Resolved! DAB YML Samples

Hi all,I recently read this post and it was insightful for me because I had never seen the extension of an existing cluster by inheriting a previously defined cluster and then adding on top. It made me wonder whether others might also be interested i...

Danny_Lee_0-1771543900750.png Danny_Lee_1-1771544096230.png
  • 537 Views
  • 3 replies
  • 2 kudos
Latest Reply
Danny_Lee
Databricks Partner
  • 2 kudos

Fantastic @SteveOstrowski !  This is exactly the kind of resource that I was looking for - it lets me understand what's possible and gives me the kind of templates that I can use to get ahead of my work!  Thank you!

  • 2 kudos
2 More Replies
truongtran
by New Contributor II
  • 456 Views
  • 3 replies
  • 1 kudos

SQS messages disappear immediately when File Events enabled on External Location with pre-provisione

EnvironmentCloud: AWSUnity Catalog: EnabledAuto Loader Mode: File Notification (Legacy) with pre-provisioned SQSProblemWe are using Auto Loader in legacy file notification mode with a pre-provisioned SQS queue (cloudFiles.useNotifications = true + cl...

  • 456 Views
  • 3 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @truongtran, Thank you for the thorough write-up with the environment details and reproducible scenarios -- that makes it much easier to pinpoint what is happening. WHAT IS HAPPENING When you enable File Events on an External Location in Unity Cat...

  • 1 kudos
2 More Replies
DitchT
by New Contributor III
  • 383 Views
  • 3 replies
  • 2 kudos

Resolved! ODBC Parameterization issue (Basic .NET)

Hey all,I have some rather basic C# code that I'm running against the newest DataBricks ODBC driver, attempting to insert parameterized queries.I see the option to disable parameterized queries in the documentation. UseNativeQuery=false, FastSQLPrepa...

  • 383 Views
  • 3 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi,  I haven't come across this issue myself but according to some internal resources I think the following fix may work. This is a known issue introduced in ODBC driver version 2.8.0. The root cause is that the default for EnableNativeParameterizedQ...

  • 2 kudos
2 More Replies
Nmtc9to5
by New Contributor II
  • 544 Views
  • 4 replies
  • 3 kudos

Resolved! Multiples Instances of a Databricks Asset Bundle

Hi everyone.I'm new to Databricks Asset Bundling.I'm trying to generate a parameterized DAB template, like a class in OOP, to allow the instantiation of multiple independent Lakeflow pipelines. However, when deploying the resources, even after changi...

Data Engineering
DAB
DABs
Databricks Asset Bundle
  • 544 Views
  • 4 replies
  • 3 kudos
Latest Reply
emma_s
Databricks Employee
  • 3 kudos

Hey,  As others have said you can't really do what you're trying to do via DABs. You have to specify each object for deployment and if you redeploy it will overwrite the old objects. There are two potential ways you would deploy the pipelines via DAB...

  • 3 kudos
3 More Replies
malterializedvw
by New Contributor III
  • 361 Views
  • 2 replies
  • 1 kudos

Resolved! Effects of materialized view with Cluster BY

Hi folks,I have a question on whether I am using materialized views right.Our pipeline looks like this:1. A spark job creates a table `source` with columns a, b and c.2. A materialized view `target`  is created on `source`. I want to partition it by ...

  • 361 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @malterializedvw , I did some digging and have some helpful hints for your to consider as you work through your scenario.  Your MV definition looks syntactically fine, but there are a few things I’d check. First, CLUSTER BY on a materialize...

  • 1 kudos
1 More Replies
maikel
by Contributor II
  • 370 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks orchestration job

Hello Community,We are currently building a system in Databricks where multiple tasks are combined into a single job that produces final output data.So far, our approach is based on Python notebooks (with asset bundles) that orchestrate the workflow....

  • 370 Views
  • 3 replies
  • 1 kudos
Latest Reply
maikel
Contributor II
  • 1 kudos

Hello @aleksandra_ch,thanks a lot for your response! Very helpful! One thing I would like to ask - by Lakeflow Spark Declarative Pipelines do you mean the chain of jobs to perform some data engineering operations?Thank you!

  • 1 kudos
2 More Replies
Seunghyun
by Contributor
  • 270 Views
  • 1 replies
  • 0 kudos

Issue with ValueError: unknown: unknown when using Databricks SDK for Python (AccountClient)

Hello, I am currently developing a Python script using the Databricks SDK to manage Service Principal secrets within a Databricks Notebook environment.I am using M2M (Machine-to-Machine) authentication, and the Service Principal in use has been grant...

  • 270 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Seunghyun, This error is likely caused by the Databricks SDK’s authentication step, not by the Service Principals API itself. The SDK is trying to obtain an OAuth token for your AccountClient, the token request is failing, and the error payload c...

  • 0 kudos
prafulja
by New Contributor II
  • 325 Views
  • 3 replies
  • 1 kudos

Resolved! Found issue with DLT for each batch Sink.

We are creating a Bronze table on top of ADLS data using Auto Loader with DLT. After that, we create the Silver table using a for-each-batch sink. Finally, we create the Gold table through a DLT materialized view.However, when creating the Gold table...

  • 325 Views
  • 3 replies
  • 1 kudos
Latest Reply
prafulja
New Contributor II
  • 1 kudos

Thank you for sharing the detailed explanation. I was following the same approach, but the challenge was that with foreachBatch, SDP wasn’t able to reliably track whether the table had been created or not. When I tried Option 3 (without using the LIV...

  • 1 kudos
2 More Replies
Labels