Data Engineering

Forum Posts

Sorted by:

Start a conversation

by R_Chaitanya • New Contributor

an hour ago

17 Views
0 replies
0 kudos

I completed all the modules in data Engineering Associate. But I received badge for only 2 modules

Data Engineering

17 Views
0 replies
0 kudos

an hour ago

by abhishek13 • Visitor

yesterday

43 Views
0 replies
0 kudos

Connection reset error from Databricks notebook but works via curl (GCP)

Hi everyone,I’m facing a connectivity issue in my Databricks workspace on GCP and would appreciate any guidance. ProblemWhen I run commands from a Databricks notebook, I see intermittent errors like:Connection reset Retrying request to https://us-eas...

Data Engineering

43 Views
0 replies
0 kudos

yesterday

by MaartenH • New Contributor III

05-07-2025 1:58:47 AM

3767 Views
12 replies
4 kudos

Lakehouse federation for SQL server: database name with spaces

We're currently using lakehouse federation for various sources (Snowflake, SQL Server); usually succesful. However we've encountered a case where one of the databases on the SQL Server has spaces in its name, e.g. 'My Database Name'. We've tried vari...

Data Engineering

3767 Views
12 replies
4 kudos

05-07-2025 1:58:47 AM

View Replies

Latest Reply

QueryingQuail
New Contributor III

Tuesday

4 kudos

Hello all,We have a good amount of tables from an external ERP system that are being replicated to an existing dwh in an Azure SQL Server database.We have set up a foreign connection for this database and we can connect to the server and database. Sa...

4 kudos

Tuesday

11 More Replies

by BF7 • Contributor

04-28-2025 6:56:25 AM

1690 Views
7 replies
3 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint? How does that affect the behavior of cloud...

Data Engineering

1690 Views
7 replies
3 kudos

04-28-2025 6:56:25 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

3 kudos

@mits1 , if you are happy with the answer please click on "Accept as Solution." It will give confidence to others. Cheers, Lou.

3 kudos

yesterday

6 More Replies

by Manjusha • New Contributor II

Tuesday

162 Views
4 replies
2 kudos

Running python functions (written using polars) on databricks

Hi,We are planning to re-write our application ( which was originally running in R) in python. We chose to use Polars as they seems to be faster than pandas. We have functions written in R which we are planning to convert to Python.However in one of ...

Data Engineering

162 Views
4 replies
2 kudos

Tuesday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

2 kudos

@Manjusha , if you are happy with my response please click on "Accept as Solution." It will help others trust the guidance.

2 kudos

yesterday

3 More Replies

by sdurai • New Contributor

Wednesday

155 Views
4 replies
4 kudos

Resolved! Databricks to Salesforce Core (Not cloud)

Hi,Is there any native connector available to connect salesforce core (not cloud) in Databricks? If no native connector, what are all recommended approaches to connect to Salesforce coreThanks,Subashini

Data Engineering

155 Views
4 replies
4 kudos

Wednesday

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

yesterday

4 kudos

Hi @sdurai, Yes. Databricks has a native Salesforce connector for core Salesforce (Sales Cloud / Service Cloud / Platform objects) via Lakeflow Connect - Salesforce ingestion connector. It lets you create fully managed, incremental pipelines from Sal...

4 kudos

yesterday

3 More Replies

by P10d • New Contributor

Monday

122 Views
1 replies
0 kudos

Connect Databrick's cluster with Artifactory

Hello,I'm trying to connect databricks with an own JFrog Artifactory. The objective is to download both PIP/JAR dependencies from it instead of connecting to maven-central/PyPi. Im struggling with JAR's. My aproximation to solve the problem is:1. Cre...

Data Engineering

122 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

emma_s
Databricks Employee

yesterday

0 kudos

Hi, I haven't got the ability to test this myself but based on some internal research, I think the following is true: Hi, The most likely issue is your truststore configuration. Setting spark.driver.extraJavaOptions -Djavax.net.ssl.trustStore=<custom...

0 kudos

yesterday

by mordex • New Contributor III

Wednesday

59 Views
1 replies
0 kudos

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

Title: Databricks workflows for APIs with different frequencies (cluster keeps restarting)Hey everyone,I’m stuck with a Databricks workflow design and could use some advice.Currently, we are calling 70+ APIs Right now the workflow looks something l...

Data Engineering

59 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

yesterday

0 kudos

Hi @mordex This is a classic high-frequency orchestration problem on Databricks. The core issue is that Databricks job clusters are designed for batch workloads, not sub-5-minute polling loops. Job clusters have a ~3–5 min cold start. For a 1-min fre...

0 kudos

yesterday

by bi123 • Visitor

yesterday

103 Views
3 replies
0 kudos

How to import python modules in a notebook?

I have a job with a notebook task that utilizes python modules in another folder than the notebook itself. When I try to import the module in the notebook, it raises module not found error. I solved the problem using sys.pathBut I am curious if there...

Data Engineering

103 Views
3 replies
0 kudos

yesterday

View Replies

Latest Reply

shazi
New Contributor III

yesterday

0 kudos

While this is a classic way to solve this, it can sometimes be "brittle" if your folder structure changes or if you share the notebook with others who have different file paths. In modern notebook environments.

0 kudos

yesterday

2 More Replies

by ittzzmalind • New Contributor

yesterday

49 Views
1 replies
0 kudos

Databricks Workspace - Unknow IP access

Azure monitor log showing unknow ip authentication requests to Databricks workspace . -- When searched ip below url, result showing its from AZURE CLOUD : <Region> (the region is same as workspace)https://azureipranges.azurewebsites.net/SearchFor -- ...

Data Engineering

49 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

yesterday

0 kudos

Hi @ittzzmalind, Because the IP is in the same Azure region but not listed in the Azure Databricks control plane ranges, it’s very likely not a Databricks owned control plane IP. It’s typically either a user or service coming from another Azure resou...

0 kudos

yesterday

by murtadha_s • Databricks Partner

yesterday

56 Views
1 replies
0 kudos

Default ACL for Jobs and Clusters

Hi,I want to set default ACL that applies to all created jobs and clusters, according to a cluster policy for example, but currently I need to apply my ACL at every created job/cluster separately.is there a way to do that?BR,

Data Engineering

56 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

yesterday

0 kudos

Hi @murtadha_s Can you please clarify what you are after? The second part of your question sounded more like a statement: "but currently I need to apply my ACL at every created job/cluster separately," and that confused me a bit. To make sure we poi...

0 kudos

yesterday

by sai_sakhamuri • Databricks Partner

yesterday

126 Views
1 replies
1 kudos

Resolved! Databricks optimization for query perfomance and pipeline run

I am currently working on optimizing several Spark pipelines and wanted to gather community insights on advanced performance tuning. Typically, my workflow for traditional SQL optimization involves a deep dive into the execution plan to identify bott...

Data Engineering

126 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

yesterday

1 kudos

Hi @sai_sakhamuri You're clearly past the basics. Let me give you a practitioner-level breakdown of each layer you mentioned, plus a few things that often get overlooked.Spark Catalyst Optimizer — Working With the Rules EngineCatalyst operates in fou...

1 kudos

yesterday

by databrciks • New Contributor III

Wednesday

166 Views
3 replies
1 kudos

Resolved! Parametrize the DLT pipeline for dynamic loading of many tables

I need to load many tables into Bronze layer connecting to sql server DB. How can i pass the tables names dynamically in DLT. Means one code pass many tables and load into bronze layer

Data Engineering

166 Views
3 replies
1 kudos

Wednesday

View Replies

Latest Reply

databrciks
New Contributor III

yesterday

1 kudos

Hi Ashwin Thanks for the quick response. Yes I want to pass all the tables through config parameter/param file and load that into bronze layerI will try this approach. Thanks

1 kudos

yesterday

2 More Replies

by mits1 • New Contributor III

Wednesday

205 Views
8 replies
0 kudos

Autoloader inserts null rows in delta table while reading json file

Hi,I am exploring Schema inference and Schema evolution using Autoloader.I am reading a single line json file and writing in a delta table which does not exist already (creating it on the fly), using pyspark (below is the code).Code :spark.readStream...

Data Engineering

205 Views
8 replies
0 kudos

Wednesday

View Replies

Latest Reply

saurabh18cs
Honored Contributor III

yesterday

0 kudos

Hi @mits1 can you try adding this option as well:{"multiLine": "true"}

0 kudos

yesterday

7 More Replies

by ittzzmalind • New Contributor

Friday

138 Views
2 replies
0 kudos

DLT Pipeline Error -key not found: all_info_dlt_cx_utils_cod resulting in a NoSuchElementException.

Databricks ETL pipeline, specifically an error with the @DP.expectorfail decorator causing the pipeline update to fail. The error message indicated a 'key not found: all_info_dlt_cx_utils_cod ' resulting in a NoSuchElementException.Note: if we commen...

Databricks Community

Forum Posts

I completed all the modules in data Engineering Associate. But I received badge for only 2 modules

Connection reset error from Databricks notebook but works via curl (GCP)

Lakehouse federation for SQL server: database name with spaces

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Running python functions (written using polars) on databricks

Resolved! Databricks to Salesforce Core (Not cloud)

Connect Databrick's cluster with Artifactory

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

How to import python modules in a notebook?

Databricks Workspace - Unknow IP access

Default ACL for Jobs and Clusters

Resolved! Databricks optimization for query perfomance and pipeline run

Resolved! Parametrize the DLT pipeline for dynamic loading of many tables

Autoloader inserts null rows in delta table while reading json file

DLT Pipeline Error -key not found: all_info_dlt_cx_utils_cod resulting in a NoSuchElementException.

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...