Data Engineering

Forum Posts

Sorted by:

by murtadha_s • Databricks Partner

21m ago

5 Views
0 replies
0 kudos

Default ACL for Jobs and Clusters

Hi,I want to set default ACL that applies to all created jobs and clusters, according to a cluster policy for example, but currently I need to apply my ACL at every created job/cluster separately.is there a way to do that?BR,

Data Engineering

5 Views
0 replies
0 kudos

21m ago

by sdurai • Visitor

yesterday

56 Views
2 replies
0 kudos

Databricks to Salesforce Core (Not cloud)

Hi,Is there any native connector available to connect salesforce core (not cloud) in Databricks? If no native connector, what are all recommended approaches to connect to Salesforce coreThanks,Subashini

Data Engineering

56 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

41m ago

0 kudos

Hi @sdurai, Yes. Databricks has a native Salesforce connector for core Salesforce (Sales Cloud / Service Cloud / Platform objects) via Lakeflow Connect - Salesforce ingestion connector. It lets you create fully managed, incremental pipelines from Sal...

0 kudos

41m ago

1 More Replies

by IM_01 • Contributor II

4 weeks ago

1007 Views
19 replies
3 kudos

Resolved! Lakeflow SDP failed with DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG

Hi,A column was deleted on the source table, when I ran LSDP it failed with error DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG : Streaming read is not supported on tables with read-incompatible schema changes( e.g: rename or drop or datatype ch...

Data Engineering

1007 Views
19 replies
3 kudos

4 weeks ago

View Replies

Latest Reply

gullsher98743
Visitor

54m ago

3 kudos

This looks like a very practical template, especially for teams trying to structure their Data & AI strategy without overcomplicating things. The step-by-step format and examples should be really helpful for workshops and collaborative sessions. Curi...

3 kudos

54m ago

18 More Replies

by mits1 • New Contributor III

yesterday

99 Views
7 replies
0 kudos

Autoloader inserts null rows in delta table while reading json file

Hi,I am exploring Schema inference and Schema evolution using Autoloader.I am reading a single line json file and writing in a delta table which does not exist already (creating it on the fly), using pyspark (below is the code).Code :spark.readStream...

Data Engineering

99 Views
7 replies
0 kudos

yesterday

View Replies

Latest Reply

saurabh18cs
Honored Contributor III

7 hours ago

0 kudos

Hi @mits1 can you try adding this option as well:{"multiLine": "true"}

0 kudos

7 hours ago

6 More Replies

by databrciks • New Contributor II

yesterday

45 Views
1 replies
0 kudos

Parametrize the DLT pipeline for dynamic loading of many tables

I need to load many tables into Bronze layer connecting to sql server DB. How can i pass the tables names dynamically in DLT. Means one code pass many tables and load into bronze layer

Data Engineering

45 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

5 hours ago

0 kudos

Hi @databrciks ,You can use pipeline parameters. Below you'll find some examples:Use parameters with pipelines - Azure Databricks | Microsoft Learn

0 kudos

5 hours ago

by stemill • New Contributor II

3 weeks ago

433 Views
7 replies
0 kudos

update on iceberg table creating duplicate records

We are using databricks to connect to a glue catalog which contains iceberg tables. We are using DBR 17.2 and adding the jars org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.10.0org.apache.iceberg:iceberg-aws-bundle:1.10.0the spark config is then...

Data Engineering

433 Views
7 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

aleksandra_ch
Databricks Employee

a week ago

0 kudos

Hi @stemill , The way of connecting to Iceberg tables managed by Glue catalog that you described is not officially supported. Because spark_catalog is not a generic catalog slot – it’s a special, tightly‑wired session catalog with a lot of assumptio...

0 kudos

a week ago

6 More Replies

by mordex • New Contributor III

yesterday

38 Views
0 replies
0 kudos

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

Title: Databricks workflows for APIs with different frequencies (cluster keeps restarting)Hey everyone,I’m stuck with a Databricks workflow design and could use some advice.Currently, we are calling 70+ APIs Right now the workflow looks something l...

Data Engineering

38 Views
0 replies
0 kudos

yesterday

by BF7 • Contributor

04-28-2025 6:56:25 AM

1619 Views
3 replies
2 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint? How does that affect the behavior of cloud...

Data Engineering

1619 Views
3 replies
2 kudos

04-28-2025 6:56:25 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

04-28-2025 10:57:34 AM

2 kudos

Behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema:When cloudFiles.inferColumnTypes is enabled, Auto Loader attempts to identify the appropriate data types for columns instead of defaulting everything to strings, which i...

2 kudos

04-28-2025 10:57:34 AM

2 More Replies

by Phani1 • Databricks MVP

yesterday

49 Views
0 replies
0 kudos

Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

Hi All, I'm looking to implement an automated, scalable, and auditable purge mechanism on Azure Databricks to manage data retention, deletion and archival policies across our Unity Catalog-governed Delta tables.I've come across various approaches, s...

Data Engineering

49 Views
0 replies
0 kudos

yesterday

by beaglerot • Databricks Partner

Tuesday

130 Views
4 replies
5 kudos

Python Data Source API — worth using?

Hi all,I’ve been looking into the Python Data Source API and wanted to get some feedback from others who may be experimenting with it.One of the more common challenges I run into is working with applications that expose APIs but don’t have out-of-the...

Data Engineering

130 Views
4 replies
5 kudos

Tuesday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

5 kudos

Adding on to @edonaire, which are accurate. @beaglerot , your contacts project is the right use case for the pattern you have. Small data, infrequent changes, direct read into bronze. That works. The real question you're asking is what happens when t...

5 kudos

yesterday

3 More Replies

by Manjusha • New Contributor II

Tuesday

100 Views
3 replies
1 kudos

Running python functions (written using polars) on databricks

Hi,We are planning to re-write our application ( which was originally running in R) in python. We chose to use Polars as they seems to be faster than pandas. We have functions written in R which we are planning to convert to Python.However in one of ...

Data Engineering

100 Views
3 replies
1 kudos

Tuesday

View Replies

Latest Reply

Manjusha
New Contributor II

yesterday

1 kudos

Thank you @Louis_Frolio and @pradeep_singh for the detailed explanation. I will discuss your inputs with the team and get back in case we have more question..

1 kudos

yesterday

2 More Replies

by maikel • Contributor II

Tuesday

86 Views
2 replies
2 kudos

Running Spark Tests

Hello Community!writing to you with the question about what are the best way to run spark unit tests in databricks. Currently we have a set of notebooks which are responsible for doing the operations on the data (joins, merging etc.).Of course to do ...

Data Engineering

86 Views
2 replies
2 kudos

Tuesday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

2 kudos

Great suggestions @lingareddy_Alva regarding Databricks Connect v2! @maikel , A few things to layer on top of that. First, the fact that you already have your functions in a separate directory outside of notebooks is exactly the right foundation. T...

2 kudos

yesterday

1 More Replies

by Malthe • Valued Contributor II

3 weeks ago

216 Views
1 replies
0 kudos

Observable API and Delta Table merge

Using the Observable API on the source dataframe to a Delta Table merge seems to hang indefinitely.Steps to reproduce:Create one or more pyspark.sql.Observation objects.Use DataFrame.observe on the merge source.Run merge.Accessing Observation.get blo...

Data Engineering

216 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

AnthonyAnand
Databricks Partner

yesterday

0 kudos

Hi @Malthe, You have hit a very specific, known behavioral gap in how Apache Spark and Delta Lake interact. To answer your question directly: Yes, the Observable API is effectively incompatible with Delta Table merges when used directly. Why It ...

0 kudos

yesterday

by Ashwin_DSA • Databricks Employee

a week ago

119 Views
1 replies
2 kudos

Is Address Line 4 the place where data goes to die?

I’ve spent the last few years jumping between insurance, healthcare, and retail, and I’ve come to a very painful conclusion that we should never have let humans type their own addresses into a text box. For a pet project, I’m currently looking at a ...

Data Engineering

119 Views
1 replies
2 kudos

a week ago

View Replies

Latest Reply

pradeep_singh
Contributor

Tuesday

2 kudos

I have never worked on this problem but based on previous posts from other community user i have come to know that fuzzy logic can help finding records that are most likely to be same or similar . Here are some links where this has been discussed i g...

2 kudos

Tuesday

by kevinzhang29 • New Contributor II

Tuesday

81 Views
1 replies
1 kudos

Issue with create_auto_cdc_flow Not Updating Business Columns for DELETE Events

We 're currently working with Databricks AUTO CDC in a data pipeline and have encountered an issue with create_auto_cdc_flow (AUTO CDC) when using SCD Type 2. We are using the following configuration: stored_as_scd_type = 2apply_as_deletes = expr("op...

Data Engineering

81 Views
1 replies
1 kudos

Tuesday

View Replies

Latest Reply

pradeep_singh
Contributor

Tuesday

1 kudos

Operation type DELETE means the record is supposed to disappear. If you were using SCD Type 1, the record would be removed from the silver table. When using SCD Type 2, AUTO CDC only updates the lifecycle metadata columns to make the record inactive;...

1 kudos

Tuesday

Databricks Community

Forum Posts

Default ACL for Jobs and Clusters

Databricks to Salesforce Core (Not cloud)

Resolved! Lakeflow SDP failed with DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG

Autoloader inserts null rows in delta table while reading json file

Parametrize the DLT pipeline for dynamic loading of many tables

update on iceberg table creating duplicate records

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

Python Data Source API — worth using?

Running python functions (written using polars) on databricks

Running Spark Tests

Observable API and Delta Table merge

Is Address Line 4 the place where data goes to die?

Issue with create_auto_cdc_flow Not Updating Business Columns for DELETE Events

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template