Data Engineering

Forum Posts

Sorted by:

by murtadha_s • Databricks Partner

2 hours ago

17 Views
1 replies
0 kudos

Default ACL for Jobs and Clusters

Hi,I want to set default ACL that applies to all created jobs and clusters, according to a cluster policy for example, but currently I need to apply my ACL at every created job/cluster separately.is there a way to do that?BR,

Data Engineering

17 Views
1 replies
0 kudos

2 hours ago

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

15m ago

0 kudos

Hi @murtadha_s Can you please clarify what you are after? The second part of your question sounded more like a statement: "but currently I need to apply my ACL at every created job/cluster separately," and that confused me a bit. To make sure we poi...

0 kudos

15m ago

by sai_sakhamuri • Databricks Partner

49m ago

14 Views
1 replies
1 kudos

Resolved! Databricks optimization for query perfomance and pipeline run

I am currently working on optimizing several Spark pipelines and wanted to gather community insights on advanced performance tuning. Typically, my workflow for traditional SQL optimization involves a deep dive into the execution plan to identify bott...

Data Engineering

14 Views
1 replies
1 kudos

49m ago

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

18m ago

1 kudos

Hi @sai_sakhamuri You're clearly past the basics. Let me give you a practitioner-level breakdown of each layer you mentioned, plus a few things that often get overlooked.Spark Catalyst Optimizer — Working With the Rules EngineCatalyst operates in fou...

1 kudos

18m ago

by databrciks • New Contributor III

yesterday

59 Views
3 replies
1 kudos

Resolved! Parametrize the DLT pipeline for dynamic loading of many tables

I need to load many tables into Bronze layer connecting to sql server DB. How can i pass the tables names dynamically in DLT. Means one code pass many tables and load into bronze layer

Data Engineering

59 Views
3 replies
1 kudos

yesterday

View Replies

Latest Reply

databrciks
New Contributor III

28m ago

1 kudos

Hi Ashwin Thanks for the quick response. Yes I want to pass all the tables through config parameter/param file and load that into bronze layerI will try this approach. Thanks

1 kudos

28m ago

2 More Replies

by mits1 • New Contributor III

yesterday

102 Views
8 replies
0 kudos

Autoloader inserts null rows in delta table while reading json file

Hi,I am exploring Schema inference and Schema evolution using Autoloader.I am reading a single line json file and writing in a delta table which does not exist already (creating it on the fly), using pyspark (below is the code).Code :spark.readStream...

Data Engineering

102 Views
8 replies
0 kudos

yesterday

View Replies

Latest Reply

saurabh18cs
Honored Contributor III

8 hours ago

0 kudos

Hi @mits1 can you try adding this option as well:{"multiLine": "true"}

0 kudos

8 hours ago

7 More Replies

by ittzzmalind • New Contributor

44m ago

12 Views
0 replies
0 kudos

Databricks Workspace - Unknow IP access

Azure monitor log showing unknow ip authentication requests to Databricks workspace . -- When searched ip below url, result showing its from AZURE CLOUD : <Region> (the region is same as workspace)https://azureipranges.azurewebsites.net/SearchFor -- ...

Data Engineering

12 Views
0 replies
0 kudos

44m ago

by ittzzmalind • New Contributor

Friday

129 Views
2 replies
0 kudos

DLT Pipeline Error -key not found: all_info_dlt_cx_utils_cod resulting in a NoSuchElementException.

Databricks ETL pipeline, specifically an error with the @DP.expectorfail decorator causing the pipeline update to fail. The error message indicated a 'key not found: all_info_dlt_cx_utils_cod ' resulting in a NoSuchElementException.Note: if we commen...

Data Engineering

129 Views
2 replies
0 kudos

Friday

View Replies

Latest Reply

ittzzmalind
New Contributor

an hour ago

0 kudos

@MoJaMa Thanks for the reply, The issue was in the code, corrected code worked

0 kudos

an hour ago

1 More Replies

by bi123 • Visitor

an hour ago

18 Views
0 replies
0 kudos

How to import python modules in a notebook?

I have a job with a notebook task that utilizes python modules in another folder than the notebook itself. When I try to import the module in the notebook, it raises module not found error. I solved the problem using sys.pathBut I am curious if there...

Data Engineering

18 Views
0 replies
0 kudos

an hour ago

by Eliza_geo • Visitor

an hour ago

19 Views
0 replies
0 kudos

Databricks SQL Alerts and Jobs Integration: Legacy vs V2

Hi,I’m facing a challenge integrating Databricks SQL alerts with a Databricks Job.After reviewing the documentation, I arrived at the following understanding and would appreciate confirmation from the community or Databricks team:Legacy SQL alerts ca...

Data Engineering

19 Views
0 replies
0 kudos

an hour ago

by sdurai • Visitor

yesterday

59 Views
2 replies
0 kudos

Databricks to Salesforce Core (Not cloud)

Hi,Is there any native connector available to connect salesforce core (not cloud) in Databricks? If no native connector, what are all recommended approaches to connect to Salesforce coreThanks,Subashini

Data Engineering

59 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

2 hours ago

0 kudos

Hi @sdurai, Yes. Databricks has a native Salesforce connector for core Salesforce (Sales Cloud / Service Cloud / Platform objects) via Lakeflow Connect - Salesforce ingestion connector. It lets you create fully managed, incremental pipelines from Sal...

0 kudos

2 hours ago

1 More Replies

by IM_01 • Contributor II

4 weeks ago

1010 Views
19 replies
3 kudos

Resolved! Lakeflow SDP failed with DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG

Hi,A column was deleted on the source table, when I ran LSDP it failed with error DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG : Streaming read is not supported on tables with read-incompatible schema changes( e.g: rename or drop or datatype ch...

Data Engineering

1010 Views
19 replies
3 kudos

4 weeks ago

View Replies

Latest Reply

gullsher98743
Visitor

2 hours ago

3 kudos

This looks like a very practical template, especially for teams trying to structure their Data & AI strategy without overcomplicating things. The step-by-step format and examples should be really helpful for workshops and collaborative sessions. Curi...

3 kudos

2 hours ago

18 More Replies

by stemill • New Contributor II

3 weeks ago

435 Views
7 replies
0 kudos

update on iceberg table creating duplicate records

We are using databricks to connect to a glue catalog which contains iceberg tables. We are using DBR 17.2 and adding the jars org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.10.0org.apache.iceberg:iceberg-aws-bundle:1.10.0the spark config is then...

Data Engineering

435 Views
7 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

aleksandra_ch
Databricks Employee

a week ago

0 kudos

Hi @stemill , The way of connecting to Iceberg tables managed by Glue catalog that you described is not officially supported. Because spark_catalog is not a generic catalog slot – it’s a special, tightly‑wired session catalog with a lot of assumptio...

0 kudos

a week ago

6 More Replies

by mordex • New Contributor III

yesterday

40 Views
0 replies
0 kudos

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

Title: Databricks workflows for APIs with different frequencies (cluster keeps restarting)Hey everyone,I’m stuck with a Databricks workflow design and could use some advice.Currently, we are calling 70+ APIs Right now the workflow looks something l...

Data Engineering

40 Views
0 replies
0 kudos

yesterday

by BF7 • Contributor

04-28-2025 6:56:25 AM

1619 Views
3 replies
2 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint? How does that affect the behavior of cloud...

Data Engineering

1619 Views
3 replies
2 kudos

04-28-2025 6:56:25 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

04-28-2025 10:57:34 AM

2 kudos

Behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema:When cloudFiles.inferColumnTypes is enabled, Auto Loader attempts to identify the appropriate data types for columns instead of defaulting everything to strings, which i...

2 kudos

04-28-2025 10:57:34 AM

2 More Replies

by Phani1 • Databricks MVP

yesterday

49 Views
0 replies
0 kudos

Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

Hi All, I'm looking to implement an automated, scalable, and auditable purge mechanism on Azure Databricks to manage data retention, deletion and archival policies across our Unity Catalog-governed Delta tables.I've come across various approaches, s...

Data Engineering

49 Views
0 replies
0 kudos

yesterday

by beaglerot • Databricks Partner

Tuesday

133 Views
4 replies
5 kudos

Python Data Source API — worth using?

Hi all,I’ve been looking into the Python Data Source API and wanted to get some feedback from others who may be experimenting with it.One of the more common challenges I run into is working with applications that expose APIs but don’t have out-of-the...

Data Engineering

133 Views
4 replies
5 kudos

Tuesday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

5 kudos

Adding on to @edonaire, which are accurate. @beaglerot , your contacts project is the right use case for the pattern you have. Small data, infrequent changes, direct read into bronze. That works. The real question you're asking is what happens when t...

5 kudos

yesterday

3 More Replies

Databricks Community

Forum Posts

Default ACL for Jobs and Clusters

Resolved! Databricks optimization for query perfomance and pipeline run

Resolved! Parametrize the DLT pipeline for dynamic loading of many tables

Autoloader inserts null rows in delta table while reading json file

Databricks Workspace - Unknow IP access

DLT Pipeline Error -key not found: all_info_dlt_cx_utils_cod resulting in a NoSuchElementException.

How to import python modules in a notebook?

Databricks SQL Alerts and Jobs Integration: Legacy vs V2

Databricks to Salesforce Core (Not cloud)

Resolved! Lakeflow SDP failed with DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG

update on iceberg table creating duplicate records

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

Python Data Source API — worth using?

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...