Data Engineering

Forum Posts

Sorted by:

by ajgold • New Contributor II

07-21-2025 12:50:38 PM

1537 Views
6 replies
2 kudos

DLT Expectations Alert for Warning

I want to receive an alert via email or Slack when the @Dlt.expect declaration fails the validation check in my DLT pipeline. I only see the option to add an email alert for @Dlt.expect_or_fail failures, but not for warnings.

Data Engineering

1537 Views
6 replies
2 kudos

07-21-2025 12:50:38 PM

View Replies

Latest Reply

RiyazAliM
Honored Contributor

07-21-2025 7:52:01 PM

2 kudos

Hey @ajgold I don't think DLT has this feature yet. You may raise a feature request for Databricks to add it in its future releases over here - https://databricks.aha.io/Cheers!

2 kudos

07-21-2025 7:52:01 PM

5 More Replies

by ande • New Contributor

04-26-2024 7:48:50 AM

2552 Views
2 replies
0 kudos

IP address for accessing external SFTP server

I am trying to pull in data to my Databricks workspace via an external SFTP server. I am using Azure for my compute. To access the SFTP server they need to whitelist my IP address. My IP address in Azure Databricks seems to be constantly changing fro...

Data Engineering

2552 Views
2 replies
0 kudos

04-26-2024 7:48:50 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

04-27-2024 9:08:54 AM

0 kudos

Azure Databricks, like many cloud services, does not provide static IP addresses for outbound connections. This is because the compute resources are dynamically allocated and can change over time. One potential workaround could be to use a Virtual N...

0 kudos

04-27-2024 9:08:54 AM

1 More Replies

by fjrodriguez • New Contributor III

07-23-2025 4:39:18 AM

641 Views
2 replies
0 kudos

Job Preview in ADF

I do have one Spark Job that is triggered via ADF as a usual "Python" activity. Now wanted to move to Job which is under Preview. Normally under linked service level i do have spark config and environment that is needed for the execution of this scri...

Data Engineering

641 Views
2 replies
0 kudos

07-23-2025 4:39:18 AM

View Replies

Latest Reply

radothede
Valued Contributor II

07-23-2025 4:52:19 AM

0 kudos

Hi @fjrodriguez my understanding is You've already created a cluster for Your job. If that's the case, You can put that spark configuration and env variables directly in the cluster Your job is using. If for some reason thats not possible, then You c...

0 kudos

07-23-2025 4:52:19 AM

1 More Replies

by jdlogos • New Contributor III

03-17-2025 7:44:37 AM

4847 Views
5 replies
2 kudos

apply_changes_from_snapshot with expectations

Hi,Question: Are expectations supposed to function in conjunction with create_streaming_table() and apply_changes_from_snapshot?Our team is investigating Delta Live Tables and we have a working prototype using Autoloader to ingest some files from a m...

Data Engineering

4847 Views
5 replies
2 kudos

03-17-2025 7:44:37 AM

View Replies

Latest Reply

jbrmn
New Contributor II

07-14-2025 2:03:00 AM

2 kudos

Also facing the same issue - did you find a solution?Thinking I will have to apply expectations at the next stage of the pipeline until this is worked out

2 kudos

07-14-2025 2:03:00 AM

4 More Replies

by dnz • New Contributor

07-31-2024 8:15:58 AM

1340 Views
1 replies
0 kudos

Performance Issue with OPTIMIZE Command for Historical Data Migration Using Liquid Clustering

Hello Databricks Community,I’m experiencing performance issues with the OPTIMIZE command when migrating historical data into a table with liquid clustering. Specifically, I am processing one year’s worth of data at a time. For example:The OPTIMIZE co...

Data Engineering

1340 Views
1 replies
0 kudos

07-31-2024 8:15:58 AM

View Replies

Latest Reply

HimanshuSingh
New Contributor II

07-23-2025 5:18:54 AM

0 kudos

Did you got any solution? If Yes please post it.

0 kudos

07-23-2025 5:18:54 AM

by yuinagam • New Contributor II

07-23-2025 4:30:16 AM

852 Views
2 replies
0 kudos

how can I verify that the result of a dlt will have enough rows before updating the table?

I have a dlt/lakeflow pipeline that creates a table, and I need to make sure that it will only update the resulting materialized view if it will have more than one million records.I've found this, but it seems to only work if I have already updated t...

Data Engineering

852 Views
2 replies
0 kudos

07-23-2025 4:30:16 AM

View Replies

Latest Reply

yuinagam
New Contributor II

07-23-2025 4:44:49 AM

0 kudos

Thank you for the quick reply.Is there a common/recommended/possible way to work around this limitation? I don't mind not using the expectation api if it doesn't support logic that's based on aggregations.

0 kudos

07-23-2025 4:44:49 AM

1 More Replies

by shan-databricks • Databricks Partner

07-22-2025 7:36:05 AM

3646 Views
9 replies
4 kudos

Resolved! Databricks Autoloader BadRecords path Issue

I have one file that has 100 rows and in which two rows are bad data and the remaining 98 rows is good data, but when I use the bad records' path, it completely moves the file to the bad records' path, which has good data as well, and it should move ...

Data Engineering

3646 Views
9 replies
4 kudos

07-22-2025 7:36:05 AM

View Replies

Latest Reply

ShaileshBobay
Databricks Employee

07-23-2025 1:57:57 AM

4 kudos

Why Entire Files Go to badRecordsPath When you enable badRecordsPath in Autoloader or in Spark’s file readers (with formats like CSV/JSON), here’s what happens: Spark expects each data file to be internally well-formed with respect to the declared s...

4 kudos

07-23-2025 1:57:57 AM

8 More Replies

by yit • Databricks Partner

07-17-2025 4:43:59 AM

5265 Views
8 replies
4 kudos

Resolved! Schema evolution for JSON files with AutoLoader

I am using Auto Loader to ingest JSON files into a managed table. Auto Loader saves only the first-level fields as new columns, while nested structs are stored as values within those columns.My goal is to support schema evolution when loading new fi...

Data Engineering

5265 Views
8 replies
4 kudos

07-17-2025 4:43:59 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

07-23-2025 3:28:20 AM

4 kudos

@yit awesome. Glad that you got this solved. I look forward to the next problem .All the best,BS

4 kudos

07-23-2025 3:28:20 AM

7 More Replies

by ZD • New Contributor III

07-22-2025 8:02:05 PM

2404 Views
5 replies
0 kudos

How to replace ${param} by :param

Hello,We previously used ${param} in our SQL queries:SELECT * FROM json.`${source_path}/file.json`However, this syntax is now deprecated. The recommended approach is to use :param instead.But when I attempt to replace ${param} with :param, I encounte...

Data Engineering

2404 Views
5 replies
0 kudos

07-22-2025 8:02:05 PM

View Replies

Latest Reply

radothede
Valued Contributor II

07-22-2025 11:56:51 PM

0 kudos

Hi @ZD Please try this syntax in Your notebook for SQL:%sqldeclare _my_path = 'some_path';select _my_path;

0 kudos

07-22-2025 11:56:51 PM

4 More Replies

by Johannes_E • New Contributor III

07-18-2025 2:16:23 AM

1366 Views
2 replies
1 kudos

Resolved! Job cluster has no permission to create folder in Unity Catalog Volume

Hello everybody,I want to run a job that collects some csv files from a SFTP server and saves them on my Unity Catalog Volume. While my personal cluster defined like the following has access to create folders on the volume my job cluster doesn't.Defi...

Data Engineering

1366 Views
2 replies
1 kudos

07-18-2025 2:16:23 AM

View Replies

Latest Reply

Johannes_E
New Contributor III

07-22-2025 10:46:49 PM

1 kudos

Thank you, that helped although I had to use "SINGLE_USER" instead of "DATA_SECURITY_MODE_DEDICATED". According to the docs (https://docs.databricks.com/api/workspace/clusters/create) "SINGLE_USER" is an alias for "DATA_SECURITY_MODE_DEDICATED".

1 kudos

07-22-2025 10:46:49 PM

1 More Replies

by Aviral-Bhardwaj • Esteemed Contributor III

07-22-2025 7:23:46 AM

1892 Views
4 replies
2 kudos

Resolved! Not able to read data from volume and data is in JSON format

Not able to read data from volume and data is in JSON format data = spark.read.json("/Volumes/mydatabricksaviral/datatesting/datavolume/mytest.json") display(data) ############################################# Py4JJavaError: An error occurred while...

Data Engineering

1892 Views
4 replies
2 kudos

07-22-2025 7:23:46 AM

View Replies

Latest Reply

radothede
Valued Contributor II

07-22-2025 7:35:23 AM

2 kudos

Hi @Aviral-Bhardwaj ,please double check:- if volume path is correct- if you have READ VOLUME permission on this volume- if your cluster has access to unity catalog- if json file exists

2 kudos

07-22-2025 7:35:23 AM

3 More Replies

by Pratikmsbsvm • Contributor

07-22-2025 4:51:01 AM

774 Views
1 replies
0 kudos

Resolved! Low Level Design for Moving Data from Databricks A to Databricks B

Hello Techie,May someone please help me with Low level design point what all we should considered while moving data from One Delta lake instance to another delta lake.For example :-Service principle creation.IP Whitelisting.Any gitlab / devops relate...

Data Engineering

774 Views
1 replies
0 kudos

07-22-2025 4:51:01 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

07-22-2025 9:27:17 AM

0 kudos

Hi @Pratikmsbsvm Here's a brief low-level design checklist for Delta Lake to Delta Lake data migration:1. Security & Authentication- Create service principals for both environments- Set up Azure Key Vault for credential management- Configure IP white...

0 kudos

07-22-2025 9:27:17 AM

by root92 • New Contributor

07-22-2025 8:23:11 AM

1092 Views
1 replies
0 kudos

finishes execution in 6 seconds but the notebook still shows "waiting"

Issue:Although my SQL query completes execution in approximately 2-3 seconds, the notebook interface continues to show "waiting" for an extended period before displaying results. Only way to see the results of my cell execution is by refreshing the w...

Data Engineering

1092 Views
1 replies
0 kudos

07-22-2025 8:23:11 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

07-22-2025 9:08:53 AM

0 kudos

Hi @root92 This is a known Databricks interface issue, not related to query performance or account type.Most Likely Causes:WebSocket connection timeout between browser and DatabricksBrowser memory issues with long-running notebook sessionsNetwork pro...

0 kudos

07-22-2025 9:08:53 AM

by noorbasha534 • Valued Contributor II

07-21-2025 2:10:50 PM

1080 Views
5 replies
0 kudos

in-home built predictive optimization

Hello allHas anyone attempted to look at the internals of predictive optimization and built an in-home solution mimicking its functionality? I understood that there are no plans from Databricks to roll-out this feature for external tables, and hence,...

Data Engineering

Delta Lake

Liquid clustering

predictive optimization

spark

1080 Views
5 replies
0 kudos

07-21-2025 2:10:50 PM

View Replies

Latest Reply

noorbasha534
Valued Contributor II

07-22-2025 4:48:53 AM

0 kudos

@LinlinH thanks for the details. Can you please share any Github link where the community work is put so I can verify if any code can be re-used...

0 kudos

07-22-2025 4:48:53 AM

4 More Replies

by William_Scardua • Valued Contributor

04-23-2025 11:43:27 AM

2506 Views
2 replies
0 kudos

Collecting Job Usage Metrics Without Unity Catalog

hi,I would like to request assistance on how to collect usage metrics and job execution data for my Databricks environment. We are currently not using Unity Catalog, but I would still like to monitor and analyze usageCould you please provide guidance...

Data Engineering

2506 Views
2 replies
0 kudos

04-23-2025 11:43:27 AM

View Replies

Latest Reply

alsetr
Databricks Partner

07-22-2025 6:51:37 AM

0 kudos

Hi @William_Scardua , were you able to collect the job metrics?

0 kudos

07-22-2025 6:51:37 AM

1 More Replies

Databricks Community

Forum Posts

DLT Expectations Alert for Warning

IP address for accessing external SFTP server

Job Preview in ADF

apply_changes_from_snapshot with expectations

Performance Issue with OPTIMIZE Command for Historical Data Migration Using Liquid Clustering

how can I verify that the result of a dlt will have enough rows before updating the table?

Resolved! Databricks Autoloader BadRecords path Issue

Resolved! Schema evolution for JSON files with AutoLoader

How to replace ${param} by :param

Resolved! Job cluster has no permission to create folder in Unity Catalog Volume

Resolved! Not able to read data from volume and data is in JSON format

Resolved! Low Level Design for Moving Data from Databricks A to Databricks B

finishes execution in 6 seconds but the notebook still shows "waiting"

in-home built predictive optimization

Collecting Job Usage Metrics Without Unity Catalog

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template