Data Engineering

Forum Posts

Sorted by:

by Ivaylo • Databricks Partner

08-15-2025 10:29:08 AM

1077 Views
1 replies
1 kudos

Resolved! read_files vs. cloud_file

I was wondering what is the difference between read_files and cloud_file.I can't find explicit explanation or comparison in Databriks Documentation.Best Regards Ivaylo

Data Engineering

1077 Views
1 replies
1 kudos

08-15-2025 10:29:08 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

08-15-2025 10:40:55 AM

1 kudos

## Key Differences Between `read_files` and `cloud_files`### **`read_files` Function**`read_files` is a table-valued function that reads files under a provided location and returns the data in tabular form. It supports reading JSON, CSV, XML, TEXT, B...

1 kudos

08-15-2025 10:40:55 AM

by WiliamRosa • Databricks Partner

08-15-2025 7:25:11 AM

1485 Views
1 replies
4 kudos

Resolved! Recommended approach for handling deletes in a Delta table

What is the recommended approach for handling deletes in a Delta table?I have a table in MySQL (no soft delete flag) that I read and write into Azure as a Delta table. My current flow is:- If an ID exists in both MySQL and the Delta table → update th...

Data Engineering

1485 Views
1 replies
4 kudos

08-15-2025 7:25:11 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

08-15-2025 8:14:06 AM

4 kudos

The recommended way of handling CDC in Databricks is by using the merge command.https://docs.databricks.com/aws/en/sql/language-manual/delta-merge-intoIf you using SQL.-- Delete all target rows that have a match in the source table.> MERGE INTO targe...

4 kudos

08-15-2025 8:14:06 AM

by AmarK • Databricks Employee

02-03-2022 3:10:12 PM

17364 Views
5 replies
0 kudos

Is there a way to programatically retrieve a workspace name ?

Is there a spark command in databricks that will tell me what databricks workspace I am using? I’d like to parameterise my code so that I can update delta lake file paths automatically depending on the workspace (i.e. it picks up the dev workspace na...

Data Engineering

17364 Views
5 replies
0 kudos

02-03-2022 3:10:12 PM

View Replies

Latest Reply

WiliamRosa
Databricks Partner

08-15-2025 6:23:53 AM

0 kudos

To programmatically retrieve the Databricks workspace name from within a notebook, you can use Spark configuration or the notebook context. One method is to read the workspace URL using spark.conf.get("spark.databricks.workspaceUrl") and then extract...

0 kudos

08-15-2025 6:23:53 AM

4 More Replies

by Maxi1693 • New Contributor II

03-08-2024 6:39:50 AM

4965 Views
6 replies
1 kudos

Monitoring structure streaming in externar sink

Hi! Today working trying to collect some metrics to create a splot in my spark structure streaming. It is configured with a trigger(processingTime="30 seconds") and I am trying to collect data with the following Listener Class (just an example). # D...

Data Engineering

4965 Views
6 replies
1 kudos

03-08-2024 6:39:50 AM

View Replies

Latest Reply

WiliamRosa
Databricks Partner

08-15-2025 5:40:41 AM

1 kudos

Hi everyone,I recently worked on a similar requirement and would like to share a structured approach to monitoring Structured Streaming when writing to external sinks.1. Use a Unique Query NameAlways assign a clear and meaningful name to each streami...

1 kudos

08-15-2025 5:40:41 AM

5 More Replies

by dholea • New Contributor II

08-13-2025 11:08:46 AM

1290 Views
3 replies
1 kudos

Help required for executing geospetial query

we have requirement to find a specific distance based on longitude and latitude. Can you please help me with the details step how we can achieve this using pyspark? Thank you.

Data Engineering

1290 Views
3 replies
1 kudos

08-13-2025 11:08:46 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-13-2025 2:06:49 PM

1 kudos

Hi @dholea ,Databricks Runtime 17.1 Beta added native support for Spatial SQL. So for example it let's you calculate distance between coordinates.I think you can try with ST_Distance() function.

1 kudos

08-13-2025 2:06:49 PM

2 More Replies

by sebih • New Contributor II

08-06-2025 6:14:07 AM

668 Views
2 replies
0 kudos

Cannot use join with Enzyme

I suppose I can use incrementalization on pipelines. Supported operators are listed in here: https://docs.databricks.com/aws/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refreshHowever, when I run the pipeline, it do...

Data Engineering

668 Views
2 replies
0 kudos

08-06-2025 6:14:07 AM

View Replies

Latest Reply

sebih
New Contributor II

08-14-2025 5:33:21 AM

0 kudos

Thank you for your reply. Even though we only do one join, we keep getting this error.

0 kudos

08-14-2025 5:33:21 AM

1 More Replies

by turagittech • Contributor

08-14-2025 1:53:44 AM

946 Views
2 replies
1 kudos

Resolved! Managing values that change between development and production

Hi all, when moving from development to testing a production one often needs to handle change values like the blob store or database server being differentI have seen that using widgets can be a useful way to have updateable values for Notebooks and ...

Data Engineering

946 Views
2 replies
1 kudos

08-14-2025 1:53:44 AM

View Replies

Latest Reply

turagittech
Contributor

08-14-2025 3:39:34 AM

1 kudos

Great, thanks. Speed in this case isn't critical as it's not processing massive amounts of data, well I hope not massive amounts at this time. It'll be some batch processes that can't use dlt.

1 kudos

08-14-2025 3:39:34 AM

1 More Replies

by Odoo_ERP • New Contributor II

11-22-2022 1:04:11 AM

4055 Views
2 replies
1 kudos

Odoo ERP customization Odoo is one of the most popular ERP software. It is widely use by companies. Odoo customization mainly includes changing the sy...

Odoo ERP customizationOdoo is one of the most popular ERP software. It is widely use by companies. Odoo customization mainly includes changing the system by including new features and functionalities in accordance with the business needs of the clien...

Data Engineering

4055 Views
2 replies
1 kudos

11-22-2022 1:04:11 AM

View Replies

Latest Reply

danieljogi
New Contributor II

08-14-2025 3:29:55 AM

1 kudos

Odoo ERP customization is process to customize module, CRM, website, POS, report and more to meet the specific business requirement.

1 kudos

08-14-2025 3:29:55 AM

1 More Replies

by Datalight • Contributor

08-12-2025 5:53:36 AM

3849 Views
4 replies
2 kudos

Resolved! Data Transfer using Unity Catalog full implementation

I have to share data between Azure A and Azure B . using unity catalog and delta sharing.Every Time Data comes to Azure A, The same Data can be read by AzureB.How to handle Incremental Load. for change records I think I need to use Merge Statement....

Data Engineering

3849 Views
4 replies
2 kudos

08-12-2025 5:53:36 AM

View Replies

Latest Reply

turagittech
Contributor

08-14-2025 2:02:37 AM

2 kudos

This works well when set up, If you're securely set up in Azure you will need to grant a privatelink to the underlying storage for their service to read data. For enhanced security I'd recommend your catalog for the other party then be in external st...

2 kudos

08-14-2025 2:02:37 AM

3 More Replies

by vishesh_berera • New Contributor III

08-13-2025 12:23:24 PM

586 Views
1 replies
0 kudos

How can we Implement Conditional Logic on SQL Query Output in Job Workflow

I'm trying to create a job where I define a get data task that executes a SQL query. After that, I want to apply conditional logic using an if-else task based on the query output. Specifically, I want to check each row individually—if a condition is ...

Data Engineering

586 Views
1 replies
0 kudos

08-13-2025 12:23:24 PM

View Replies

Latest Reply

BR_DatabricksAI
Databricks Partner

08-14-2025 1:10:19 AM

0 kudos

Hello, I believe the the fixed parameter option is exists and introduced recently in lake flow declarative pipeline where you need to navigate to the configuration section and add parameters.

0 kudos

08-14-2025 1:10:19 AM

by Ramu1821 • New Contributor II

08-11-2025 2:13:27 AM

3185 Views
2 replies
0 kudos

Merge using DLT

I have a requirement where i need only 24 hours data from my delta tablelets call this as latest tablethis latest table should be in sync with sourceso, it should handle all updates and inserts along with delete (if something gets deleted at source, ...

Data Engineering

3185 Views
2 replies
0 kudos

08-11-2025 2:13:27 AM

View Replies

Latest Reply

Ramu1821
New Contributor II

08-14-2025 12:03:56 AM

0 kudos

from pyspark.sql.functions import col, lit, expr, when, to_timestamp, current_timestampfrom pyspark.sql.functions import max as max_import dltfrom pyspark.sql.types import StructType, StructField, StringTypefrom pyspark.sql.utils import AnalysisExcep...

0 kudos

08-14-2025 12:03:56 AM

1 More Replies

by boitumelodikoko • Databricks Partner

09-01-2024 11:56:26 PM

14294 Views
7 replies
4 kudos

Resolved! Databricks Autoloader Checkpoint

Hello Databricks Community,I'm encountering an issue with the Databricks Autoloader where, after running successfully for a period of time, it suddenly stops detecting new files in the source directory. This issue only gets resolved when I reset the ...

Data Engineering

14294 Views
7 replies
4 kudos

09-01-2024 11:56:26 PM

View Replies

Latest Reply

boitumelodikoko
Databricks Partner

08-12-2025 9:48:22 PM

4 kudos

I have found that reducing the number of objects in the landing path (via an archive/cleanup process) is the most reliable fix. Auto Loader's file discovery can bog down in big/"long-lived" landing folders—especially in directory-listing mode—so clea...

4 kudos

08-12-2025 9:48:22 PM

6 More Replies

by User16826987838 • Databricks Employee

06-23-2021 12:29:05 PM

7502 Views
7 replies
8 kudos

Is there a way to integrate Databricks secrets with AWS secrets manager?

Data Engineering

7502 Views
7 replies
8 kudos

06-23-2021 12:29:05 PM

View Replies

Latest Reply

sean-abnormalai
New Contributor II

08-13-2025 4:54:08 PM

8 kudos

Are there any updates on this feature?

8 kudos

08-13-2025 4:54:08 PM

6 More Replies

by boskicl • New Contributor III

03-23-2022 11:04:23 AM

41453 Views
8 replies
12 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

Data Engineering

41453 Views
8 replies
12 kudos

03-23-2022 11:04:23 AM

View Replies

Latest Reply

nvashisth
New Contributor III

08-13-2025 9:40:21 AM

12 kudos

@timo199 , @boskicl I had similar issue and job was getting stuck at Filtering Files for Query indefinitely. I checked SPARK logs and based on that figured out that we had enabled PHOTON acceleration on our cluster for job and datatype of our columns...

12 kudos

08-13-2025 9:40:21 AM

7 More Replies

by sowanth • New Contributor II

08-08-2025 9:31:36 AM

929 Views
3 replies
0 kudos

Spark Memory Configuration– Request for Clarification

Hi Team,I have noticed the following Spark configuration is being applied, though it's not defined in our repo or anywhere in the policies:spark.memory.offHeap.enabled = true spark.memory.offHeap.size = Around 3/4 of the node instance memory (i.e 1-...

Data Engineering

929 Views
3 replies
0 kudos

08-08-2025 9:31:36 AM

View Replies

Latest Reply

sowanth
New Contributor II

08-13-2025 6:26:22 AM

0 kudos

Now I understand how it's automatically configured in our cluster along with the rationale behind this off-heap memory approach.However, I have some concerns about this configuration:General applicability: Most jobs don't actually require 70% off-hea...

0 kudos

08-13-2025 6:26:22 AM

2 More Replies

Databricks Community

Forum Posts

Resolved! read_files vs. cloud_file

Resolved! Recommended approach for handling deletes in a Delta table

Is there a way to programatically retrieve a workspace name ?

Monitoring structure streaming in externar sink

Help required for executing geospetial query

Cannot use join with Enzyme

Resolved! Managing values that change between development and production

Odoo ERP customization Odoo is one of the most popular ERP software. It is widely use by companies. Odoo customization mainly includes changing the sy...

Resolved! Data Transfer using Unity Catalog full implementation

How can we Implement Conditional Logic on SQL Query Output in Job Workflow

Merge using DLT

Resolved! Databricks Autoloader Checkpoint

Is there a way to integrate Databricks secrets with AWS secrets manager?

Resolved! Table write command stuck "Filtering files for query."

Spark Memory Configuration– Request for Clarification

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template