cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anske
by New Contributor II
  • 56 Views
  • 0 replies
  • 0 kudos

DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes(    target = "cdctest",    source = "cdctest_cdc_enriched",    keys = ["ID"],    sequence_by...

Data Engineering
Delta Live Tables
  • 56 Views
  • 0 replies
  • 0 kudos
Phani1
by Valued Contributor
  • 115 Views
  • 1 replies
  • 0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

  • 115 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, Let’s explore the integration of Databricks with Boomi and compare it to Azure Event Hub. Databricks Integration with Boomi: Databricks is a powerful data analytics platform that allows you to process large-scale data and build machin...

  • 0 kudos
niruban
by New Contributor
  • 256 Views
  • 2 replies
  • 0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering
CICD
DAB
Databricks Asset Bundle
DevOps
  • 256 Views
  • 2 replies
  • 0 kudos
Latest Reply
niruban
New Contributor
  • 0 kudos

@Rajani : This is what I am doing. I am having git actions to kick off which will run - name: bundle-deployrun: |      cd ${{ vars.HOME }}/dev-ops/databricks_cicd_deployment      databricks bundle deploy --debug Before running this step, I am creatin...

  • 0 kudos
1 More Replies
dashawn
by New Contributor
  • 109 Views
  • 1 replies
  • 0 kudos

DLT Pipeline Error Handling

Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...

Data Engineering
Delta Live Tables
  • 109 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @dashawn,  When data processing fails, manual investigation of logs to understand the failures, data cleanup, and determining the restart point can be time-consuming and costly. DLT provides features to handle errors more intelligently.By default,...

  • 0 kudos
Dom1
by New Contributor II
  • 316 Views
  • 2 replies
  • 2 kudos

Show log4j messages in run output

Hi,I have an issue when running JAR jobs. I expect to see logs in the output window of a run. Unfortunately, I can only see messages of that are generated with "System.out.println" or "System.err.println". Everything that is logged via slf4j is only ...

Dom1_0-1713189014582.png
  • 316 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Dom1,  Ensure that both the slf4j-api and exactly one implementation binding (such as slf4j-simple, logback, or another compatible library) are present in your classpath1.If you’re developing a library, it’s recommended to depend only on slf4j-ap...

  • 2 kudos
1 More Replies
Abhi0607
by New Contributor II
  • 198 Views
  • 2 replies
  • 0 kudos

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Dear Members,I need your help in below scenario.I am passing few parameters from ADF pipeline to Databricks notebook.If I execute ADF pipeline to run my databricks notebook and use these variables as is in my code (python) then it works fine.But as s...

  • 198 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Abhi0607  Can you please help me to find if you are taking or defining these parameter value outside try catch or inside it ?

  • 0 kudos
1 More Replies
PrebenOlsen
by New Contributor III
  • 165 Views
  • 2 replies
  • 0 kudos

Job stuck while utilizing all workers

Hi!Started a job yesterday. It was iterating over data, 2-months at a time, and writing to a table. It was successfully doing this for 4 out of 6 time periods. The 5th time period however, got stuck, 5 hours in.I can find one Failed Stage that reads ...

Data Engineering
job failed
Job froze
need help
  • 165 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.The data is read when you apply an action (write f.e.).That being said:  I have no knowledge of a bug in Databricks on clusters...

  • 0 kudos
1 More Replies
Anske
by New Contributor II
  • 594 Views
  • 1 replies
  • 0 kudos

One-time backfill for DLT streaming table before apply_changes

Hi,absolute Databricks noob here, but I'm trying to set up a DLT pipeline that processes cdc records from an external sql server instance to create a mirrored table in my databricks delta lakehouse. For this, I need to do some initial one-time backfi...

Data Engineering
Delta Live Tables
  • 594 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anske
New Contributor II
  • 0 kudos

So since nobody responded, I decided to try my own suggestion and hack the snapshot data into the table that gathers the change data capture. After some straying I ended up with the notebook as attached.The notebook first creates 2 dlt tables (lookup...

  • 0 kudos
ThomazRossito
by New Contributor II
  • 107 Views
  • 0 replies
  • 0 kudos

Post: Lakehouse Federation - Databricks

Lakehouse Federation - Databricks In the world of data, innovation is constant. And the most recent revolution comes with Lakehouse Federation, a fusion between data lakes and data warehouses, taking data manipulation to a new level. This advancement...

Data Engineering
data engineer
Lakehouse
SQL Analytics
  • 107 Views
  • 0 replies
  • 0 kudos
57410
by New Contributor
  • 653 Views
  • 1 replies
  • 0 kudos

Deploy python application with submodules - Poetry library management

Hi,I'm using DBX (I'll soon move to Databricks Asset Bundle, but it doesn't change anything in my situation) to deploy a Python application to Databricks. I'm also using Poetry to manage my libraries and dependencies.My project looks like this :Proje...

  • 653 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @57410, It seems you’re transitioning from DBX to Databricks Asset Bundles (DABs) for managing your complex data, analytics, and ML projects on the Databricks platform. Let’s dive into the details and address the issue you’re facing. Databricks...

  • 0 kudos
brian_zavareh
by New Contributor III
  • 1545 Views
  • 5 replies
  • 4 kudos

Resolved! Optimizing Delta Live Table Ingestion Performance for Large JSON Datasets

I'm currently facing challenges with optimizing the performance of a Delta Live Table pipeline in Azure Databricks. The task involves ingesting over 10 TB of raw JSON log files from an Azure Data Lake Storage account into a bronze Delta Live Table la...

Data Engineering
autoloader
bigdata
delta-live-tables
json
  • 1545 Views
  • 5 replies
  • 4 kudos
Latest Reply
standup1
New Contributor III
  • 4 kudos

Hey @brian_zavareh , see this document. I hope this can help.https://learn.microsoft.com/en-us/azure/databricks/compute/cluster-config-best-practicesJust keep in mind that there's some extra cost from Azure VM side, check your Azure Cost Analysis for...

  • 4 kudos
4 More Replies
397973
by New Contributor III
  • 355 Views
  • 3 replies
  • 0 kudos

Having trouble installing my own Python wheel?

I want to install my own Python wheel package on a cluster but can't get it working. I tried two ways: I followed these steps: https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#:~:text=March%2025%2C%202024,code%...

Data Engineering
cluster
Notebook
  • 355 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 0 kudos

@397973 - Once you uploaded the .whl file, did you had a chance to list the file manually in the notebook?  Also, did you had a chance to move the files to /Volumes .whl file?  

  • 0 kudos
2 More Replies
SyedSaqib
by New Contributor II
  • 224 Views
  • 2 replies
  • 0 kudos

Delta Live Table : [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view

Hi,I have a delta live table workflow with storage enabled for cloud storage to a blob store.Syntax of bronze table in notebook===@dlt.table(spark_conf = {"spark.databricks.delta.schema.autoMerge.enabled": "true"},table_properties = {"quality": "bron...

  • 224 Views
  • 2 replies
  • 0 kudos
Latest Reply
SyedSaqib
New Contributor II
  • 0 kudos

Hi Kaniz,Thanks for replying back.I am using python for delta live table creation, so how can I set these configurations?When creating the table, add the IF NOT EXISTS clause to tolerate pre-existing objects.consider using the OR REFRESH clause Answe...

  • 0 kudos
1 More Replies
BenDataBricks
by New Contributor
  • 176 Views
  • 1 replies
  • 0 kudos

Register more redirect URIs for OAuth U2M

I am following this guide on allowing OAuth U2M for Azure Databricks.When I get to Step 2, I make a request to account.azuredatabricks.net and specify a redirect URI to receive a code.The redirect URI in the example is localhost:8020. If I change thi...

  • 176 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BenDataBricks,    OAuth user-to-machine (U2M) authentication in Azure Databricks allows real-time human sign-in and consent to authenticate the target user account. After successful sign-in and consent, an OAuth token is granted to the particip...

  • 0 kudos
Yuki
by New Contributor
  • 688 Views
  • 3 replies
  • 0 kudos

Can I use Git provider with using Service Principal in job

Hi everyone,I'm trying to use Git provider in Databricks job.First, I was using my personal user account to `Run as`.But when I change `Run as` to Service Principal, it was failed because of permission error.And I can't find a way to solve it.Could I...

Yuki_0-1699340000007.png
  • 688 Views
  • 3 replies
  • 0 kudos
Latest Reply
martindlarsson
New Contributor III
  • 0 kudos

The documentation is lacking in this area which should be easy to set up. Instead we are forced to search among community topics such as these.

  • 0 kudos
2 More Replies
Labels