cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Subhrajyoti
by New Contributor
  • 396 Views
  • 1 replies
  • 0 kudos

Issue in Databricks Cluster Configuration in Pre Prod Environment

Hi team,Hope you are doing well!!This is just to share one incident regarding one of the difficulties we are facing for a UC enabled cluster (interactive and job cluster both) in our pre prod environment that the data is not getting refreshed properl...

  • 396 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi @Subhrajyoti , Can you please try running REFRESH TABLE table_name when you encounter this issue. Can you also try disabling Delta caching and check if it returns correct result (spark.databricks.io.cache.enabled false)

  • 0 kudos
bharathelsker
by New Contributor
  • 343 Views
  • 1 replies
  • 0 kudos

Resolved! Why does disabling Photon fix my ConcurrentDeleteDeleteException in Databricks?

I’m running a Databricks 15.4 LTS job with Photon acceleration enabled.I have a wrapper notebook that uses ThreadPoolExecutor to trigger multiple child notebooks in parallel.Each thread calls a function that runs a child notebook and updates an audit...

  • 343 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi @bharathelsker , ConcurrentDeleteDeleteException This exception occurs when a concurrent operation deleted a file that your operation also deletes. This could be caused by two concurrent compaction operations rewriting the same files. To further i...

  • 0 kudos
Ovasheli
by New Contributor
  • 321 Views
  • 1 replies
  • 0 kudos

How to Get CDF Metadata from an Overwritten Batch Source in DLT?

Hello,I'm working on a Delta Live Tables pipeline and need help with a data source challenge.My source tables are batch-loaded SCD2 tables with CDF (Change Data Feed) enabled. These tables are updated daily using a complete overwrite operation.For my...

  • 321 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi @Ovasheli , I believe the error message would be something like below. Error: com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: [DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example DELETE (Map(predicate - ...

  • 0 kudos
turagittech
by Contributor
  • 602 Views
  • 1 replies
  • 1 kudos

Credential Sharing Across Cluster Nodes - spark.conf()

Hi All,I am struggling to understand how to manage credentials for azure storage across cluster when trying to use Azure python libraries within functions that may end up on the cluster worker nodes.I am building a task to load blobs from Azure stora...

  • 602 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

Hi @turagittech , I found a KB article related to this error.Let me know if this helps. https://kb.databricks.com/data-sources/keyproviderexception-error-when-trying-to-create-an-external-table-on-an-external-schema-with-authentication-at-the-noteboo...

  • 1 kudos
JothyGanesan
by New Contributor III
  • 809 Views
  • 5 replies
  • 2 kudos

Resolved! History Retention in DLT table (Delta Live Table)

Hi AllWe have the DLT table as our curated layer with apply changes. The DLT pipelines runs on continuous mode for streaming real time data ingestion. There is a requirement as per regulatory to retain only 1 year data in the DLT table and to move th...

  • 809 Views
  • 5 replies
  • 2 kudos
Latest Reply
Ranga_naik1180
New Contributor III
  • 2 kudos

@szymon_dybczak  ..Can you please suggest how we can delete records ..we have a scd2 target table(Silver) on top of that scd2 table we are having another scd2 table (Gold_layer) target table ..idea was if i delete a row in silver table how it can pro...

  • 2 kudos
4 More Replies
Datalight
by Contributor
  • 845 Views
  • 9 replies
  • 2 kudos

Adobe Campaign to Azure Databricks file transfer

I have to create a Data Pipeline which pushes Data (2. JSON FILE)  from Source Adobe using corn job to ADLS Gen2. 1. How My ADLS Gen2 will know the new file came to container from Adobe. I am using Databricks as orchestrator and ETL tool.2. What all ...

Datalight_0-1755698274906.png
  • 845 Views
  • 9 replies
  • 2 kudos
Latest Reply
nachoBot
New Contributor II
  • 2 kudos

Datalight,With regards to 1) I see that you are using the Medallion Architecture. Have you considered using AutoLoader for the detection and ingestion of new files in ADLS Gen2

  • 2 kudos
8 More Replies
Anubhav2603
by New Contributor
  • 510 Views
  • 1 replies
  • 0 kudos

DLT Pipeline Question

I am new to DLT and trying to understand the process. My bronze table will receive incremental data from SAP in real time. In my bronze table, we are not maintaining any history and any data older than 2 weeks will be deleted. This data from bronze w...

  • 510 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

When loading data from SAP, how will you determine which records are new? With Lakeflow, incremental loads from cloud storage or another Delta table are fully automated. However, when pulling directly from SAP, Lakeflow does not have visibility into ...

  • 0 kudos
HoussemBL
by New Contributor III
  • 3714 Views
  • 11 replies
  • 3 kudos

DLT Pipeline & Automatic Liquid Clustering Syntax

Hi everyone,I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.However, I'm having trouble figuring o...

  • 3714 Views
  • 11 replies
  • 3 kudos
Latest Reply
jsturgeon
New Contributor II
  • 3 kudos

Is there a resolution to this? I am having the same problem. I can create tables with cluster by auto, but the MVs are failing saying I need to enable PO. This was working yesterday and is working in other environments.

  • 3 kudos
10 More Replies
jeremy98
by Honored Contributor
  • 508 Views
  • 6 replies
  • 1 kudos

Is there a way to discover in the next task if the previous for loop task has some...

Hi community,As the title suggests, I'm looking for a smart way to determine which runs in a for-loop task succeeded and which didn’t, so I can use that information in the next task.Summary:I have a for-loop task that runs multiple items (e.g., run1,...

  • 508 Views
  • 6 replies
  • 1 kudos
Latest Reply
SebastianRowan
Contributor
  • 1 kudos

Easiest way is to log each loop’s status with `dbutils.jobs.taskValues.set` then just grab those in the next task and only work with the ones that passed.

  • 1 kudos
5 More Replies
SebastianRowan
by Contributor
  • 856 Views
  • 8 replies
  • 6 kudos

Resolved! Batch jobs suddenly slow down?

What to do when sometimes batch jobs take way longer even though the data size hasn’t changed. What causes this? And do you use any tool for that??

  • 856 Views
  • 8 replies
  • 6 kudos
Latest Reply
SebastianRowan
Contributor
  • 6 kudos

Thanks for the AMAZING response!

  • 6 kudos
7 More Replies
Hasiok1337
by New Contributor II
  • 903 Views
  • 3 replies
  • 2 kudos

Transport Data from Sharepoint Excel file to Databrics Table

Hello Is there a way in a Databricks notebook to pull data from an Excel file stored on SharePoint and upload it into my table in Databricks?I have a situation where I maintain a few tables on SharePoint and a few tables with the same data in Databri...

  • 903 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @Hasiok1337 ,Your approach is valid in my opinion. But you can also check SharePoint connector. Is in beta currently, but should work. It gives you out of the box way to extract files from SharePoint into databricks 

  • 2 kudos
2 More Replies
ckarrasexo
by New Contributor III
  • 23034 Views
  • 9 replies
  • 5 kudos

pyspark.sql.connect.dataframe.DataFrame vs pyspark.sql.DataFrame

I noticed that on some Databricks 14.3 clusters, I get DataFrames with type pyspark.sql.connect.dataframe.DataFrame, while on other clusters also with Databricks 14.3, the exact same code gets DataFrames of type pyspark.sql.DataFramepyspark.sql.conne...

  • 23034 Views
  • 9 replies
  • 5 kudos
Latest Reply
Gleydson404
New Contributor II
  • 5 kudos

I have found a work around for this issue. Basically, I create a dummy_df and then I check if the dataframe I want to check has the same type as the dummy_df.def get_dummy_df() -> DataFrame: """ Generates a dummy DataFrame with a range of int...

  • 5 kudos
8 More Replies
sharukh_lodhi
by New Contributor III
  • 4253 Views
  • 5 replies
  • 3 kudos

Azure IMDS is not accesbile selecting shared compute policy

Hi, Databricks community,I recently encountered an issue while using the 'azure.identity' Python library on a cluster set to the personal compute policy in Databricks. In this case, Databricks successfully returns the Azure Databricks managed user id...

image.png
Data Engineering
azure IMDS
DefaultAzureCredential
  • 4253 Views
  • 5 replies
  • 3 kudos
Latest Reply
Malthe
Contributor II
  • 3 kudos

How does this work with serverless (for example with DLT pipelines) which runs in standard access mode:Serverless compute is based on Databricks standard access mode compute architecture (formerly called shared access mode).To my understanding, from ...

  • 3 kudos
4 More Replies
der
by Contributor II
  • 1888 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks JDBC Driver 2.7.3 with OAuth2 M2M on Databricks

We have an application implemented in Java and installed as JAR on the cluster. The application reads data from unity catalog over Databricks JDBC Driver.We used PAT Tokens for the Service Principal in the past and everything worked fine. Now we chan...

  • 1888 Views
  • 2 replies
  • 0 kudos
Latest Reply
der
Contributor II
  • 0 kudos

According support team. I had to set the JDBC parameter OAuthEnabledIPAddressRanges. The range of the IP should be the resolved private link IP (usually starting with 10.x) of the hostname for the Databricks workspace URL. 

  • 0 kudos
1 More Replies
chexa_Wee
by New Contributor III
  • 1250 Views
  • 8 replies
  • 5 kudos

error creating catalog in Unity Catalog – EXTERNAL_LOCATION_DOES_NOT_EXIST and admin console storage

Hi all, I’m trying to create a new catalog in Azure Databricks Unity Catalog but I’m running into issues. When I tried to add a default path in the Admin Console → Metastore settings, I got this error: “Metastore storage root URL does not exist. Plea...

  • 1250 Views
  • 8 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 5 kudos

Hi @chexa_Wee ,Starting from November 9, 2023, Databricks by default won't configure metastore-level storage for managed tables and volumes. Databricks recommends that you create a separate managed storage location for each catalog in your metastore....

  • 5 kudos
7 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels