Data Engineering

Forum Posts

Sorted by:

by deepu1 • Visitor

8m ago

1 Views
0 replies
0 kudos

DLT Gold aggregation with apply_change

I am building a Gold table using Delta Live Tables (DLT). The Gold table contains aggregated data derived from a Silver table. Aggregation happens monthly. However, the requirement is Only the current (year, month) should be recalculated. Previous mo...

Data Engineering

1 Views
0 replies
0 kudos

8m ago

by Chandana_Ramesh • New Contributor II

2 hours ago

16 Views
0 replies
0 kudos

Lakebridge SetUp Issue

Hi,I'm getting the below error upon executing databricks labs lakebridge analyze command. All the dependencies have been installed before execution of the command. Can someone please give a solution, or suggest if anything is missing? Below attached ...

Data Engineering

16 Views
0 replies
0 kudos

2 hours ago

by vamsi_simbus • New Contributor III

2 hours ago

12 Views
0 replies
0 kudos

Databricks App deployment fails: mysqlclient build error (pkg-config not found)

Hi Community Members,I’m deploying a Python project as a Databricks App, but deployment fails during dependency installation with: ERROR: Failed to build 'MySQL client' pkg-config: not found Exception: Can not find valid pkg-config nameThe dependenc...

Data Engineering

12 Views
0 replies
0 kudos

2 hours ago

by vamsi_simbus • New Contributor III

2 hours ago

16 Views
0 replies
0 kudos

Databricks App deployment failing due to mysqlclient build error (pkg-config not found)

Hi Team,I am trying to deploy a Python project as a Databricks App, but the deployment fails during dependency installation with the following error: ERROR: Failed to build 'mysqlclient' when getting requirements to build wheel pkg-config: not found ...

Data Engineering

16 Views
0 replies
0 kudos

2 hours ago

by AJ270990 • Contributor II

5 hours ago

22 Views
0 replies
0 kudos

All purpose cluster, SQL Warehouse and Job Cluster are not executing the code

All purpose cluster, SQL Warehouse and Job Cluster are not executing the spark code in Pro and Classic mode. When switched to Serverless mode they are able to execute the code. When checked with Networking team there were no subnet changes recently. ...

Data Engineering

22 Views
0 replies
0 kudos

5 hours ago

by HarishKumarM • Visitor

yesterday

34 Views
1 replies
0 kudos

Zerobus Connector Issue

I was trying to implement the example posted on the below link for Zerobus connector to test its functionality on my free edition workspace but unfortunately I am getting below error.Reference Code: https://learn.microsoft.com/en-us/azure/databricks/...

Data Engineering

34 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

0 kudos

Hey @HarishKumarM , I did some digging and found some helpful information to help you troubleshoot. What the error means Your workspace isn’t currently enrolled in the Zerobus Ingest preview. Even though Zerobus is labeled a Public Preview, it’s st...

0 kudos

yesterday

by vijsharm • New Contributor II

yesterday

46 Views
4 replies
0 kudos

checkpoint changes not working on my databricks job

Hi,I do have a job processing kafka stream using kafka.readstream process and due to some issue we changed the checkpoint path to other path and it pulled all the records and later when i changed to the original checkpoint location it is not pulling ...

Data Engineering

checkpoint

46 Views
4 replies
0 kudos

yesterday

View Replies

Latest Reply

cgrant
Databricks Employee

yesterday

0 kudos

When you swapped back to the old checkpoint, were any records flowing through, and were batches completing? It's possible that you've accumulated a big backlog with the old checkpoint, and/or records in Kafka have expired. And the "startingOffsets" o...

0 kudos

yesterday

3 More Replies

by csondergaardp • New Contributor

Friday

67 Views
1 replies
0 kudos

[PATH_NOT_FOUND] Structured Streaming uses wrong checkpoint location

I'm trying to perform a simple example using structured streaming on a directory created as a Volume. The use case is purely educational; I am investigating various forms of triggers. Basic info:Catalog: "dev_catalog"Schema: "stream"Volume: "streamin...

Data Engineering

67 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

cgrant
Databricks Employee

yesterday

0 kudos

Your checkpoint code looks correct. What is the source of `df`? Is it `/Volumes/dev_catalog/default/streaming_basics/` ? The path looks incorrect - add `stream` to it.

0 kudos

yesterday

by SatabrataMuduli • New Contributor II

Saturday

61 Views
1 replies
1 kudos

Unable to Connect to Oracle from Databricks UC Cluster (DBR 15.4) – ORA-12170 Timeout Error

Hi all,I’m trying to connect to an Oracle database from my Databricks UC cluster (DBR 15.4) using the ojdbc8.jar driver, which I’ve installed on the cluster. Here’s the code I’m using:df = spark.read.format("jdbc")\ .option("url", jdbc_url)\ ...

Data Engineering

61 Views
1 replies
1 kudos

Saturday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

yesterday

1 kudos

Hi @SatabrataMuduli ,I'm quite sure this is networking issue. You didn't provide much detalis about your environment , so I'll give you general advise. You cannot reach on premise oracle database unless networking is explicitly configured or your dat...

1 kudos

yesterday

by Dhruv-22 • Contributor II

Friday

77 Views
2 replies
0 kudos

Feature request: Allow to set value as null when not present in schema evolution

I want to raise a feature request as follows.Currently, in the Automatic schema evolution for merge when a column is not present in the source dataset it is not changed in the target dataset. For e.g.%sql CREATE OR REPLACE TABLE edw_nprd_aen.bronze.t...

Data Engineering

77 Views
2 replies
0 kudos

Friday

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

Saturday

0 kudos

@Dhruv-22 ProblemWhen using MERGE INTO ... WITH SCHEMA EVOLUTION, if a column exists in the target table but is not present in the source dataset, that column is left unchanged on matched rows.Solution ThinkingThis can be emulated by introspecting th...

0 kudos

Saturday

1 More Replies

by RevanthV • New Contributor III

Friday

76 Views
3 replies
2 kudos

Data validation with df writes using append mode

Hi Team,Recently i came across a situation where I had to write a huge data and it took 6 hrs to complete...later when i checked the target data , I saw 20% of the total records written incorrectly or corrupted because the source data itself was corr...

Data Engineering

76 Views
3 replies
2 kudos

Friday

View Replies

Latest Reply

RevanthV
New Contributor III

Friday

2 kudos

Hey @K_Anudeep , thanks a lot for tagging me into the GitHub issue.. This is exactly what I want " validate and commit" feature and i se you have already raised a PR for the same with a new option called . I will try this out and check if it satisfie...

2 kudos

Friday

2 More Replies

by ramsai • New Contributor

Friday

94 Views
5 replies
2 kudos

Updating Job Creator to Service Principal

Regarding data governance best practices: I have jobs created by a user who has left the organization, and I need to change the job creator to a service principal. Currently, it seems the only option is to clone the job and update it. Is this the rec...

Data Engineering

94 Views
5 replies
2 kudos

Friday

View Replies

Latest Reply

Sanjeeb2024
Contributor III

Friday

2 kudos

I agree with @nayan_wylde , for auditing, creator is important and it should in immutable by nature.

2 kudos

Friday

4 More Replies

by Rose_15 • New Contributor

Thursday

96 Views
3 replies
0 kudos

Databricks SQL Warehouse fails when streaming ~53M rows via Python (token/session expiry)

Hello Team,I am facing a consistent issue when streaming a large table (~53 million rows) from a Databricks SQL Warehouse using Python (databricks-sql-connector) with OAuth authentication.I execute a single long-running query and fetch data in batche...

Data Engineering

96 Views
3 replies
0 kudos

Thursday

View Replies

Latest Reply

Sanjeeb2024
Contributor III

Friday

0 kudos

Hi @Rose_15 - Thanks for the details. It is better to do a planning like number of tables, size and number of records and better to extract the files to a cloud storage and reload the data using any mechanism. Once your extraction is complete, you wi...

0 kudos

Friday

2 More Replies

by jfvizoso • New Contributor II

09-28-2022 3:20:02 AM

12923 Views
6 replies
0 kudos

Can I pass parameters to a Delta Live Table pipeline at running time?

I need to execute a DLT pipeline from a Job, and I would like to know if there is any way of passing a parameter. I know you can have settings in the pipeline that you use in the DLT notebook, but it seems you can only assign values to them when crea...

Data Engineering

12923 Views
6 replies
0 kudos

09-28-2022 3:20:02 AM

View Replies

Latest Reply

Sudharsan
New Contributor II

Friday

0 kudos

@DeepakAI : May I know, how you resolved it?

0 kudos

Friday

5 More Replies

by Phani1 • Databricks MVP

03-03-2025 4:37:42 AM

3163 Views
8 replies
0 kudos

Triggering DLT Pipelines with Dynamic Parameters

Hi Team,We have a scenario where we need to pass a dynamic parameter to a Spark job that will trigger a DLT pipeline in append mode. Can you please suggest an approach for this?Regards,Phani

Data Engineering

3163 Views
8 replies
0 kudos

03-03-2025 4:37:42 AM

View Replies

Latest Reply

Sudharsan
New Contributor II

Friday

0 kudos

@koji_kawamura : I have more or less the same scenario say I have 3 tables.The sources and targets are different but I would like to use a generic pipeline and pass in the source and target as a parameter and run them parallely. @sas30 : can you be m...

0 kudos

Friday

7 More Replies

Databricks Community

Forum Posts

DLT Gold aggregation with apply_change

Lakebridge SetUp Issue

Databricks App deployment fails: mysqlclient build error (pkg-config not found)

Databricks App deployment failing due to mysqlclient build error (pkg-config not found)

All purpose cluster, SQL Warehouse and Job Cluster are not executing the code

Zerobus Connector Issue

checkpoint changes not working on my databricks job

[PATH_NOT_FOUND] Structured Streaming uses wrong checkpoint location

Unable to Connect to Oracle from Databricks UC Cluster (DBR 15.4) – ORA-12170 Timeout Error

Feature request: Allow to set value as null when not present in schema evolution

Data validation with df writes using append mode

Updating Job Creator to Service Principal

Databricks SQL Warehouse fails when streaming ~53M rows via Python (token/session expiry)

Can I pass parameters to a Delta Live Table pipeline at running time?

Triggering DLT Pipelines with Dynamic Parameters

Join Us as a Local Community Builder!

DLT behaving differently when used with python syn...

Warehouse ID specified in job yaml file for sql ta...

Result Difference Between View and Manually Run Vi...

Unable to update DLT-based materialized view if cl...

AutoLoader Type Widening