Data Engineering

Forum Posts

Sorted by:

by Bram • New Contributor II

09-05-2023 6:53:03 AM

7924 Views
9 replies
1 kudos

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Dear, In the current setup, we are using dbt as a modeling tool for our data lakehouse.For a specific use case, we want to use the insert_overwrite strategy, where dbt will replace all data for a specific partition:Databricks configurations | dbt Dev...

Data Engineering

7924 Views
9 replies
1 kudos

09-05-2023 6:53:03 AM

View Replies

Latest Reply

hendrik
New Contributor II

05-28-2025 6:30:40 PM

1 kudos

An approach that works well when using a Databricks SQL Warehouse is to use the replace_where strategy - I've just tested this. It also works with partitioned tables:{{ config( materialized='incremental', incremental_strategy='replace_where', ...

1 kudos

05-28-2025 6:30:40 PM

8 More Replies

by SteveC527 • New Contributor

12-16-2024 12:28:19 PM

2922 Views
6 replies
1 kudos

Medallion Architecture and Databricks Assistant

I am in the process of rebuilding the data lake at my current company with databricks and I'm struggling to find comprehensive best practices for naming conventions and structuring medallion architecture to work optimally with the Databricks assistan...

Data Engineering

2922 Views
6 replies
1 kudos

12-16-2024 12:28:19 PM

View Replies

Latest Reply

suman23479
New Contributor II

05-28-2025 12:02:14 PM

1 kudos

If we talk about traditional data warehouse way of building the architecture, we can consider Silver layer as Data mart with star schema kind of relations for dimensions and fact. Can we build entire DWH enterprise scale using databrikcs? I see in pr...

1 kudos

05-28-2025 12:02:14 PM

5 More Replies

by Splush • New Contributor II

05-28-2025 6:08:18 AM

334 Views
1 replies
0 kudos

JDBC Oracle Connection change Container Statement

Hey,Im running into a weird issue while running the following code:def getDf(query, preamble_sql=None): jdbc_url = f"jdbc:oracle:thin:@//{host}:{port}/{service_name}" request = spark.read \ .format("jdbc") \ .o...

Data Engineering

334 Views
1 replies
0 kudos

05-28-2025 6:08:18 AM

View Replies

Latest Reply

BigRoux
Databricks Employee

05-28-2025 11:29:50 AM

0 kudos

Here is something to consider: The issue you're experiencing likely stems from differences in behavior when accessing Oracle database objects via Spark JDBC versus other database clients like DBeaver. Specifically, Spark's JDBC interface may perform ...

0 kudos

05-28-2025 11:29:50 AM

by JD2 • Contributor

07-19-2021 7:16:43 PM

5790 Views
6 replies
7 kudos

Resolved! Auto Loader for Shape File

Hello: As you can see from below link, that it support 7 file formats. I am dealing with GeoSpatial Shape files and I want to know if Auto Loader can support Shape Files ???Any help on this is greatly appreciated. Thanks. https://docs.microsoft.com/...

Data Engineering

5790 Views
6 replies
7 kudos

07-19-2021 7:16:43 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-29-2021 1:56:25 AM

7 kudos

You could try to use the binary file type. But the disadvantage of this is that the content of the shape files will be put into a column, that might not be what you want.If you absolutely want to use the autoloader, maybe some thinking outside the b...

7 kudos

09-29-2021 1:56:25 AM

5 More Replies

by petehart92 • New Contributor II

10-19-2022 4:15:45 PM

6249 Views
6 replies
6 kudos

Error While Rendering Visualization -- Map (Markers)

I have a table with latitude and longitude for a few addresses (no more than 10 at the moment) but when I select the appropriate columns in the visualization editor for Map (Markers) I get an message that states "error while rendering visualization"....

Data Engineering

6249 Views
6 replies
6 kudos

10-19-2022 4:15:45 PM

View Replies

Latest Reply

Gabi_A
New Contributor II

05-28-2025 9:06:08 AM

6 kudos

Having the same issue. Every time I update my SQL, all the widgets drop and show the error 'Unable to render visualization'. The only way I found to fix is to manually duplicate all my widgets and delete the old ones with errors, which is a pain and ...

6 kudos

05-28-2025 9:06:08 AM

5 More Replies

by martheelise • New Contributor

05-28-2025 4:36:40 AM

340 Views
1 replies
0 kudos

What happens when you change from .ipynb to .py as default fileformat for notebooks

Hi, I was struggling to do Pull Requests with the "new" default fileformat for Notebooks and wanted to change it back to source(.py). My questions are:1) Does this affect the whole workspace for all users?2) Does this change the format of old .ipynb ...

Data Engineering

340 Views
1 replies
0 kudos

05-28-2025 4:36:40 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

05-28-2025 5:26:44 AM

0 kudos

Changing the default notebook file format from .ipynb to .py in Databricks has several implications based on current implementations and user scenarios: User Experience: The .ipynb format captures more comprehensive data, including environment setti...

0 kudos

05-28-2025 5:26:44 AM

by Sahil0007 • New Contributor III

05-26-2025 6:47:24 AM

1022 Views
8 replies
0 kudos

Databricks Delta table Merge Command Issue

I have one table customer and 1 temp view which I am creating from the incremental file and using as a source in merge command. Earlier the notebook was working fine from the adf pipeline, but from past few days, I am getting an error states that my ...

Data Engineering

1022 Views
8 replies
0 kudos

05-26-2025 6:47:24 AM

View Replies

Latest Reply

MujtabaNoori
New Contributor III

05-28-2025 4:55:45 AM

0 kudos

Hi @Sahil0007 ,The [REDACTED] value you're seeing is being retrieved from Key Vault.Here is the workaround, you can reverse the value twice to decode it and retrieve the original string. Alternatively, you can slice the string into two parts and conc...

0 kudos

05-28-2025 4:55:45 AM

7 More Replies

by jorhona • New Contributor III

05-27-2025 3:43:05 AM

535 Views
2 replies
0 kudos

Resolved! Deleted schema leads to DLT pipeline problems

Hello. When testing a dlt table pipeline i accidentally mispelt the target schema. The pipeline worked and created the tables. After realising my mistake, i deleted the tables and the schema - thinking nothing of it. However when I run the pipeline w...

Data Engineering

Databricks

dlt

pipeline

535 Views
2 replies
0 kudos

05-27-2025 3:43:05 AM

View Replies

Latest Reply

jorhona
New Contributor III

05-28-2025 4:06:42 AM

0 kudos

In the end i deleted and recreated the pipeline which fixed the problem. Luckily it was only in dev so didnt lose any history of pipeline success etc in prod. Still, is a bit of a pain for dlt, along with the problem of multiple developers not being ...

0 kudos

05-28-2025 4:06:42 AM

1 More Replies

by ForestDD • New Contributor

09-13-2023 8:37:02 AM

8635 Views
5 replies
1 kudos

java.lang.NoSuchMethodError after upgrade to Databricks Runtime 13

We use spark mssql connector to connect sql server, it works well on dbr runtime 10.*, 11.* and 12.*. But when we use dbr 13.*, we got the error below. It happens when we are trying to use df.write to save the data to the sql database.We have encount...

Data Engineering

8635 Views
5 replies
1 kudos

09-13-2023 8:37:02 AM

View Replies

Latest Reply

AradhanaSahu
New Contributor II

03-27-2024 7:51:25 AM

1 kudos

I was also facing the same issue while writing to a sql server. Was able to resolve it by updating the format to "jdbc" instead of "com.micorsoft.sqlserver.jdbc.spark".df.write.format("jdbc") works on DBR 13.3 LTS using the connector: com.microsoft.a...

1 kudos

03-27-2024 7:51:25 AM

4 More Replies

by iarregui • New Contributor

05-23-2023 3:31:42 AM

4312 Views
3 replies
0 kudos

Getting a Databricks static IP

Hello. I want to connect from my Databricks workspace to an external API to extract some data. The owner of the API asks for an IP to provide the token necessary for the extraction of data. Therefore I would need to set a static IP in Databricks that...

Data Engineering

4312 Views
3 replies
0 kudos

05-23-2023 3:31:42 AM

View Replies

Latest Reply

Wojciech_BUK
Valued Contributor III

04-05-2024 8:20:48 AM

0 kudos

Hello, the easiest way (in Azure) is to deploy Workspace in VNET Injection mode and attach NAT Gateway to you VNET. NAT GW require Public IP. This IP will be your static egress IP for all Cluster in for this Workspace.Note: both NAT GW and IP Address...

0 kudos

04-05-2024 8:20:48 AM

2 More Replies

by samtech • New Contributor

05-27-2025 4:10:27 PM

366 Views
1 replies
0 kudos

Regional Workspaces . How to consolidate

Hi,We have similar catalog (specific to regional data) in APAC worksace and America workspace. Our goal is to have silver table created in each regional worksapce and then consolidate as gold in one of the workspace. So if i create silver in APAC and...

Data Engineering

366 Views
1 replies
0 kudos

05-27-2025 4:10:27 PM

View Replies

Latest Reply

lingareddy_Alva
Honored Contributor III

05-27-2025 10:18:38 PM

0 kudos

Hi @samtech Yes, you're on the right track! For cross-workspace data access in Databricks.Yes, Delta Sharing is the recommended approach for accessing tables across different Databricks workspaces/regions.

0 kudos

05-27-2025 10:18:38 PM

by Bart_DE • New Contributor II

05-27-2025 1:35:09 AM

464 Views
1 replies
0 kudos

Resolved! Databricks Asset Bundle conditional job cluster size?

Hey folks,Can someone please suggest if there is a way to spawn a job cluster of a given size if a parameter of the job invocation (e.g file_name) contains a desired value? I have a job which 90% of the time deals with very small files, but the remai...

Data Engineering

464 Views
1 replies
0 kudos

05-27-2025 1:35:09 AM

View Replies

Latest Reply

lingareddy_Alva
Honored Contributor III

05-27-2025 12:06:56 PM

0 kudos

Hi @Bart_DE No — a single job.yml file can’t “look inside” a parameter like file_name and then decide to spin up a different job-cluster size on the fly.Job-cluster definitions in Databricks Workflows (Jobs) are static. All the heavy-lifting has to b...

0 kudos

05-27-2025 12:06:56 PM

by Vasu_Kumar_T • New Contributor II

05-27-2025 6:21:35 AM

302 Views
1 replies
0 kudos

Job performance issue : Configurations

Hello All, One job taking more than 7hrs, when we added below configuration its taking <2:30 mins but after deployment with same parameters again its taking 7+hrs. 1) spark.conf.set("spark.sql.shuffle.partitions", 500) --> spark.conf.set("spark.sql.s...

Data Engineering

302 Views
1 replies
0 kudos

05-27-2025 6:21:35 AM

View Replies

Latest Reply

lingareddy_Alva
Honored Contributor III

05-27-2025 11:00:49 AM

0 kudos

Hi @Vasu_Kumar_T This is a classic Spark performance inconsistency issue. The fact that it works fine in your notebookbut degrades after deployment suggests several potential causes. Here are the most likely culprits and solutions:Primary Suspects1. ...

0 kudos

05-27-2025 11:00:49 AM

by Mahtab67 • New Contributor

05-27-2025 7:34:33 AM

642 Views
1 replies
0 kudos

Spark Kafka Client Not Using Certs from Default truststore

Hi Team, I'm working on connecting Databricks to an external Kafka cluster secured with SASL_SSL (SCRAM-SHA-512 + certificate trust). We've encountered an issue where certificates imported into the default JVM truststore (cacerts) via an init script ...

Data Engineering

642 Views
1 replies
0 kudos

05-27-2025 7:34:33 AM

View Replies

Latest Reply

lingareddy_Alva
Honored Contributor III

05-27-2025 10:48:33 AM

0 kudos

Hi @Mahtab67 This is a common issue with Databricks and Kafka SSL connectivity.The problem stems from how Spark's Kafka connector handles SSL context initialization versus the JVM's default truststore.Root Cause Analysis:The Spark Kafka connector cre...

0 kudos

05-27-2025 10:48:33 AM

by Sainath368 • New Contributor III

05-27-2025 4:05:37 AM

417 Views
1 replies
0 kudos

COMPUTE DELTA STATISTICS vs COMPUTE STATISTICS - Data Skipping

Hi all,I recently altered the data skipping stats columns on my Delta Lake table to optimize data skipping. Now, I’m wondering about the best practice for updating statistics:Is running ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS sufficient a...

Data Engineering

417 Views
1 replies
0 kudos

05-27-2025 4:05:37 AM

View Replies

Latest Reply

Advika
Databricks Employee

05-27-2025 8:18:37 AM

0 kudos

Hello @Sainath368! Running ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS is a good practice after modifying data skipping stats columns on a Delta Lake table. However, this command doesn’t update query optimizer stats. For that, you’ll need to ...

0 kudos

05-27-2025 8:18:37 AM

Databricks Community

Forum Posts

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Medallion Architecture and Databricks Assistant

JDBC Oracle Connection change Container Statement

Resolved! Auto Loader for Shape File

Error While Rendering Visualization -- Map (Markers)

What happens when you change from .ipynb to .py as default fileformat for notebooks

Databricks Delta table Merge Command Issue

Resolved! Deleted schema leads to DLT pipeline problems

java.lang.NoSuchMethodError after upgrade to Databricks Runtime 13

Getting a Databricks static IP

Regional Workspaces . How to consolidate

Resolved! Databricks Asset Bundle conditional job cluster size?

Job performance issue : Configurations

Spark Kafka Client Not Using Certs from Default truststore

COMPUTE DELTA STATISTICS vs COMPUTE STATISTICS - Data Skipping

Join Us as a Local Community Builder!

Cognito as IdP provider for Delta Share

How to Retrieve the spark.statistics.createdAt Whe...

Not able to find lab for Data Engineering Learning...

Lakeflow Connect - Postgres connector

Prakash Hinduja Switzerland (Swiss) How do I build...