cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nidhig
by Contributor II
  • 1026 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Badge Challenge

Hi Team,As part of badges challenge from Databricks for partners. The criteria is the badge from Sales/Pre-Sales not from Tech-Sales?  

  • 1026 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @Nidhig! Please raise a ticket with the Databricks Support Team, they’ll be able to provide clarity on this topic.

  • 1 kudos
jeremy98
by Honored Contributor
  • 1167 Views
  • 2 replies
  • 1 kudos

how to manage the databricks failure notifications to slack webhook?

Hi community,I’d like to handle the case when one of our Databricks jobs fails. From the documentation, I understand that the HTTP response from Databricks will look like this:{ "event_type": "jobs.on_failure", "workspace_id": "your_workspace_id"...

  • 1167 Views
  • 2 replies
  • 1 kudos
Latest Reply
jeremy98
Honored Contributor
  • 1 kudos

bc this:is not correct I suppose, right?

  • 1 kudos
1 More Replies
RPalmer
by New Contributor III
  • 5761 Views
  • 6 replies
  • 4 kudos

Confusion around the dollar param being flagged as deprecated

Over the past week we have seen a warning showing in our notebooks about the dollar param being deprecated. Which has apparently been deprecated since runtime 7, but I cannot find any info about when it will actually be removed. Will the removal be t...

  • 5761 Views
  • 6 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 4 kudos

Hey everyone, I've just had a look into this. I think the documentation seems pretty clear on what's expected with using the legacy-style parameters https://docs.databricks.com/aws/en/notebooks/legacy-widgets @Yourstruly, I appreciate the frustration...

  • 4 kudos
5 More Replies
Subhrajyoti
by New Contributor
  • 541 Views
  • 1 replies
  • 0 kudos

Issue in Databricks Cluster Configuration in Pre Prod Environment

Hi team,Hope you are doing well!!This is just to share one incident regarding one of the difficulties we are facing for a UC enabled cluster (interactive and job cluster both) in our pre prod environment that the data is not getting refreshed properl...

  • 541 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi @Subhrajyoti , Can you please try running REFRESH TABLE table_name when you encounter this issue. Can you also try disabling Delta caching and check if it returns correct result (spark.databricks.io.cache.enabled false)

  • 0 kudos
bharathelsker
by New Contributor
  • 504 Views
  • 1 replies
  • 0 kudos

Resolved! Why does disabling Photon fix my ConcurrentDeleteDeleteException in Databricks?

I’m running a Databricks 15.4 LTS job with Photon acceleration enabled.I have a wrapper notebook that uses ThreadPoolExecutor to trigger multiple child notebooks in parallel.Each thread calls a function that runs a child notebook and updates an audit...

  • 504 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi @bharathelsker , ConcurrentDeleteDeleteException This exception occurs when a concurrent operation deleted a file that your operation also deletes. This could be caused by two concurrent compaction operations rewriting the same files. To further i...

  • 0 kudos
Ovasheli
by New Contributor
  • 462 Views
  • 1 replies
  • 0 kudos

How to Get CDF Metadata from an Overwritten Batch Source in DLT?

Hello,I'm working on a Delta Live Tables pipeline and need help with a data source challenge.My source tables are batch-loaded SCD2 tables with CDF (Change Data Feed) enabled. These tables are updated daily using a complete overwrite operation.For my...

  • 462 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi @Ovasheli , I believe the error message would be something like below. Error: com.databricks.sql.transaction.tahoe.DeltaUnsupportedOperationException: [DELTA_SOURCE_TABLE_IGNORE_CHANGES] Detected a data update (for example DELETE (Map(predicate - ...

  • 0 kudos
turagittech
by Contributor
  • 892 Views
  • 1 replies
  • 1 kudos

Credential Sharing Across Cluster Nodes - spark.conf()

Hi All,I am struggling to understand how to manage credentials for azure storage across cluster when trying to use Azure python libraries within functions that may end up on the cluster worker nodes.I am building a task to load blobs from Azure stora...

  • 892 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

Hi @turagittech , I found a KB article related to this error.Let me know if this helps. https://kb.databricks.com/data-sources/keyproviderexception-error-when-trying-to-create-an-external-table-on-an-external-schema-with-authentication-at-the-noteboo...

  • 1 kudos
JothyGanesan
by New Contributor III
  • 1280 Views
  • 5 replies
  • 2 kudos

Resolved! History Retention in DLT table (Delta Live Table)

Hi AllWe have the DLT table as our curated layer with apply changes. The DLT pipelines runs on continuous mode for streaming real time data ingestion. There is a requirement as per regulatory to retain only 1 year data in the DLT table and to move th...

  • 1280 Views
  • 5 replies
  • 2 kudos
Latest Reply
Ranga_naik1180
New Contributor III
  • 2 kudos

@szymon_dybczak  ..Can you please suggest how we can delete records ..we have a scd2 target table(Silver) on top of that scd2 table we are having another scd2 table (Gold_layer) target table ..idea was if i delete a row in silver table how it can pro...

  • 2 kudos
4 More Replies
Datalight
by Contributor
  • 1996 Views
  • 9 replies
  • 2 kudos

Adobe Campaign to Azure Databricks file transfer

I have to create a Data Pipeline which pushes Data (2. JSON FILE)  from Source Adobe using corn job to ADLS Gen2. 1. How My ADLS Gen2 will know the new file came to container from Adobe. I am using Databricks as orchestrator and ETL tool.2. What all ...

Datalight_0-1755698274906.png
  • 1996 Views
  • 9 replies
  • 2 kudos
Latest Reply
nachoBot
New Contributor II
  • 2 kudos

Datalight,With regards to 1) I see that you are using the Medallion Architecture. Have you considered using AutoLoader for the detection and ingestion of new files in ADLS Gen2

  • 2 kudos
8 More Replies
Anubhav2603
by New Contributor
  • 641 Views
  • 1 replies
  • 0 kudos

DLT Pipeline Question

I am new to DLT and trying to understand the process. My bronze table will receive incremental data from SAP in real time. In my bronze table, we are not maintaining any history and any data older than 2 weeks will be deleted. This data from bronze w...

  • 641 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

When loading data from SAP, how will you determine which records are new? With Lakeflow, incremental loads from cloud storage or another Delta table are fully automated. However, when pulling directly from SAP, Lakeflow does not have visibility into ...

  • 0 kudos
HoussemBL
by New Contributor III
  • 5528 Views
  • 11 replies
  • 3 kudos

DLT Pipeline & Automatic Liquid Clustering Syntax

Hi everyone,I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.However, I'm having trouble figuring o...

  • 5528 Views
  • 11 replies
  • 3 kudos
Latest Reply
jsturgeon
New Contributor II
  • 3 kudos

Is there a resolution to this? I am having the same problem. I can create tables with cluster by auto, but the MVs are failing saying I need to enable PO. This was working yesterday and is working in other environments.

  • 3 kudos
10 More Replies
jeremy98
by Honored Contributor
  • 756 Views
  • 6 replies
  • 1 kudos

Is there a way to discover in the next task if the previous for loop task has some...

Hi community,As the title suggests, I'm looking for a smart way to determine which runs in a for-loop task succeeded and which didn’t, so I can use that information in the next task.Summary:I have a for-loop task that runs multiple items (e.g., run1,...

  • 756 Views
  • 6 replies
  • 1 kudos
Latest Reply
SebastianRowan
Contributor
  • 1 kudos

Easiest way is to log each loop’s status with `dbutils.jobs.taskValues.set` then just grab those in the next task and only work with the ones that passed.

  • 1 kudos
5 More Replies
SebastianRowan
by Contributor
  • 1679 Views
  • 8 replies
  • 6 kudos

Resolved! Batch jobs suddenly slow down?

What to do when sometimes batch jobs take way longer even though the data size hasn’t changed. What causes this? And do you use any tool for that??

  • 1679 Views
  • 8 replies
  • 6 kudos
Latest Reply
SebastianRowan
Contributor
  • 6 kudos

Thanks for the AMAZING response!

  • 6 kudos
7 More Replies
Hasiok1337
by New Contributor II
  • 1401 Views
  • 3 replies
  • 2 kudos

Transport Data from Sharepoint Excel file to Databrics Table

Hello Is there a way in a Databricks notebook to pull data from an Excel file stored on SharePoint and upload it into my table in Databricks?I have a situation where I maintain a few tables on SharePoint and a few tables with the same data in Databri...

  • 1401 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @Hasiok1337 ,Your approach is valid in my opinion. But you can also check SharePoint connector. Is in beta currently, but should work. It gives you out of the box way to extract files from SharePoint into databricks 

  • 2 kudos
2 More Replies
ckarrasexo
by New Contributor III
  • 26394 Views
  • 9 replies
  • 5 kudos

pyspark.sql.connect.dataframe.DataFrame vs pyspark.sql.DataFrame

I noticed that on some Databricks 14.3 clusters, I get DataFrames with type pyspark.sql.connect.dataframe.DataFrame, while on other clusters also with Databricks 14.3, the exact same code gets DataFrames of type pyspark.sql.DataFramepyspark.sql.conne...

  • 26394 Views
  • 9 replies
  • 5 kudos
Latest Reply
Gleydson404
New Contributor II
  • 5 kudos

I have found a work around for this issue. Basically, I create a dummy_df and then I check if the dataframe I want to check has the same type as the dummy_df.def get_dummy_df() -> DataFrame: """ Generates a dummy DataFrame with a range of int...

  • 5 kudos
8 More Replies
Labels