cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bram
by New Contributor II
  • 7924 Views
  • 9 replies
  • 1 kudos

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Dear, In the current setup, we are using dbt as a modeling tool for our data lakehouse.For a specific use case, we want to use the insert_overwrite strategy, where dbt will replace all data for a specific partition:Databricks configurations | dbt Dev...

  • 7924 Views
  • 9 replies
  • 1 kudos
Latest Reply
hendrik
New Contributor II
  • 1 kudos

An approach that works well when using a Databricks SQL Warehouse is to use the replace_where strategy - I've just tested this. It also works with partitioned tables:{{ config( materialized='incremental', incremental_strategy='replace_where', ...

  • 1 kudos
8 More Replies
SteveC527
by New Contributor
  • 2922 Views
  • 6 replies
  • 1 kudos

Medallion Architecture and Databricks Assistant

I am in the process of rebuilding the data lake at my current company with databricks and I'm struggling to find comprehensive best practices for naming conventions and structuring medallion architecture to work optimally with the Databricks assistan...

  • 2922 Views
  • 6 replies
  • 1 kudos
Latest Reply
suman23479
New Contributor II
  • 1 kudos

If we talk about traditional data warehouse way of building the architecture, we can consider Silver layer as Data mart with star schema kind of relations for dimensions and fact. Can we build entire DWH enterprise scale using databrikcs? I see in pr...

  • 1 kudos
5 More Replies
Splush
by New Contributor II
  • 334 Views
  • 1 replies
  • 0 kudos

JDBC Oracle Connection change Container Statement

Hey,Im running into a weird issue while running the following code:def getDf(query, preamble_sql=None): jdbc_url = f"jdbc:oracle:thin:@//{host}:{port}/{service_name}" request = spark.read \ .format("jdbc") \ .o...

  • 334 Views
  • 1 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

Here is something to consider: The issue you're experiencing likely stems from differences in behavior when accessing Oracle database objects via Spark JDBC versus other database clients like DBeaver. Specifically, Spark's JDBC interface may perform ...

  • 0 kudos
JD2
by Contributor
  • 5790 Views
  • 6 replies
  • 7 kudos

Resolved! Auto Loader for Shape File

Hello: As you can see from below link, that it support 7 file formats. I am dealing with GeoSpatial Shape files and I want to know if Auto Loader can support Shape Files ???Any help on this is greatly appreciated. Thanks. https://docs.microsoft.com/...

  • 5790 Views
  • 6 replies
  • 7 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 7 kudos

You could try to use the binary file type. But the disadvantage of this is that the content of the shape files will be put into a column, that might not be what you want.If you absolutely want to use the autoloader, maybe some thinking outside the b...

  • 7 kudos
5 More Replies
petehart92
by New Contributor II
  • 6249 Views
  • 6 replies
  • 6 kudos

Error While Rendering Visualization -- Map (Markers)

I have a table with latitude and longitude for a few addresses (no more than 10 at the moment) but when I select the appropriate columns in the visualization editor for Map (Markers) I get an message that states "error while rendering visualization"....

Not a lot of detail...
  • 6249 Views
  • 6 replies
  • 6 kudos
Latest Reply
Gabi_A
New Contributor II
  • 6 kudos

Having the same issue. Every time I update my SQL, all the widgets drop and show the error 'Unable to render visualization'. The only way I found to fix is to manually duplicate all my widgets and delete the old ones with errors, which is a pain and ...

  • 6 kudos
5 More Replies
martheelise
by New Contributor
  • 340 Views
  • 1 replies
  • 0 kudos

What happens when you change from .ipynb to .py as default fileformat for notebooks

Hi, I was struggling to do Pull Requests with the "new" default fileformat for Notebooks and wanted to change it back to source(.py). My questions are:1) Does this affect the whole workspace for all users?2) Does this change the format of old .ipynb ...

  • 340 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Changing the default notebook file format from .ipynb to .py in Databricks has several implications based on current implementations and user scenarios: User Experience: The .ipynb format captures more comprehensive data, including environment setti...

  • 0 kudos
Sahil0007
by New Contributor III
  • 1022 Views
  • 8 replies
  • 0 kudos

Databricks Delta table Merge Command Issue

I have one table customer and 1 temp view which I am creating from the incremental file and using as a source in merge command. Earlier the notebook was working fine from the adf pipeline, but from past few days, I am getting an error states that my ...

  • 1022 Views
  • 8 replies
  • 0 kudos
Latest Reply
MujtabaNoori
New Contributor III
  • 0 kudos

Hi @Sahil0007 ,The [REDACTED] value you're seeing is being retrieved from Key Vault.Here is the workaround, you can reverse the value twice to decode it and retrieve the original string. Alternatively, you can slice the string into two parts and conc...

  • 0 kudos
7 More Replies
jorhona
by New Contributor III
  • 535 Views
  • 2 replies
  • 0 kudos

Resolved! Deleted schema leads to DLT pipeline problems

Hello. When testing a dlt table pipeline i accidentally mispelt the target schema. The pipeline worked and created the tables. After realising my mistake, i deleted the tables and the schema - thinking nothing of it. However when I run the pipeline w...

Data Engineering
Databricks
dlt
pipeline
  • 535 Views
  • 2 replies
  • 0 kudos
Latest Reply
jorhona
New Contributor III
  • 0 kudos

In the end i deleted and recreated the pipeline which fixed the problem. Luckily it was only in dev so didnt lose any history of pipeline success etc in prod. Still, is a bit of a pain for dlt, along with the problem of multiple developers not being ...

  • 0 kudos
1 More Replies
ForestDD
by New Contributor
  • 8635 Views
  • 5 replies
  • 1 kudos

java.lang.NoSuchMethodError after upgrade to Databricks Runtime 13

We use spark mssql connector to connect sql server, it works well on dbr runtime 10.*, 11.* and 12.*. But when we use dbr 13.*, we got the error below. It happens when we are trying to use df.write to save the data to the sql database.We have encount...

  • 8635 Views
  • 5 replies
  • 1 kudos
Latest Reply
AradhanaSahu
New Contributor II
  • 1 kudos

I was also facing the same issue while writing to a sql server. Was able to resolve it by updating the format to "jdbc" instead of "com.micorsoft.sqlserver.jdbc.spark".df.write.format("jdbc") works on DBR 13.3 LTS using the connector: com.microsoft.a...

  • 1 kudos
4 More Replies
iarregui
by New Contributor
  • 4312 Views
  • 3 replies
  • 0 kudos

Getting a Databricks static IP

Hello. I want to connect from my Databricks workspace to an external API to extract some data. The owner of the API asks for an IP to provide the token necessary for the extraction of data. Therefore I would need to set a static IP in Databricks that...

  • 4312 Views
  • 3 replies
  • 0 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 0 kudos

Hello, the easiest way (in Azure) is to deploy Workspace in VNET Injection mode and attach NAT Gateway to you VNET. NAT GW require Public IP. This IP will be your static egress IP for all Cluster in for this Workspace.Note: both NAT GW and IP Address...

  • 0 kudos
2 More Replies
samtech
by New Contributor
  • 366 Views
  • 1 replies
  • 0 kudos

Regional Workspaces . How to consolidate

Hi,We have similar catalog (specific to regional data) in APAC worksace and America workspace. Our goal is to have silver table created in each regional worksapce and then consolidate as gold in one of the workspace. So if i create silver in APAC and...

  • 366 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @samtech Yes, you're on the right track! For cross-workspace data access in Databricks.Yes, Delta Sharing is the recommended approach for accessing tables across different Databricks workspaces/regions.

  • 0 kudos
Bart_DE
by New Contributor II
  • 464 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Asset Bundle conditional job cluster size?

Hey folks,Can someone please suggest if there is a way to spawn a job cluster of a given size if a parameter of the job invocation (e.g file_name) contains a desired value? I have a job which 90% of the time deals with very small files, but the remai...

  • 464 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Bart_DE No — a single job.yml file can’t “look inside” a parameter like file_name and then decide to spin up a different job-cluster size on the fly.Job-cluster definitions in Databricks Workflows (Jobs) are static. All the heavy-lifting has to b...

  • 0 kudos
Vasu_Kumar_T
by New Contributor II
  • 302 Views
  • 1 replies
  • 0 kudos

Job performance issue : Configurations

Hello All, One job taking more than 7hrs, when we added below configuration its taking <2:30 mins but after deployment with same parameters again its taking 7+hrs. 1) spark.conf.set("spark.sql.shuffle.partitions", 500) --> spark.conf.set("spark.sql.s...

  • 302 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Vasu_Kumar_T This is a classic Spark performance inconsistency issue. The fact that it works fine in your notebookbut degrades after deployment suggests several potential causes. Here are the most likely culprits and solutions:Primary Suspects1. ...

  • 0 kudos
Mahtab67
by New Contributor
  • 642 Views
  • 1 replies
  • 0 kudos

Spark Kafka Client Not Using Certs from Default truststore

Hi Team, I'm working on connecting Databricks to an external Kafka cluster secured with SASL_SSL (SCRAM-SHA-512 + certificate trust). We've encountered an issue where certificates imported into the default JVM truststore (cacerts) via an init script ...

  • 642 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Mahtab67 This is a common issue with Databricks and Kafka SSL connectivity.The problem stems from how Spark's Kafka connector handles SSL context initialization versus the JVM's default truststore.Root Cause Analysis:The Spark Kafka connector cre...

  • 0 kudos
Sainath368
by New Contributor III
  • 417 Views
  • 1 replies
  • 0 kudos

COMPUTE DELTA STATISTICS vs COMPUTE STATISTICS - Data Skipping

Hi all,I recently altered the data skipping stats columns on my Delta Lake table to optimize data skipping. Now, I’m wondering about the best practice for updating statistics:Is running ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS sufficient a...

  • 417 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Sainath368! Running ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS is a good practice after modifying data skipping stats columns on a Delta Lake table. However, this command doesn’t update query optimizer stats. For that, you’ll need to ...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels