cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yazz
by Databricks Partner
  • 2065 Views
  • 2 replies
  • 0 kudos

Converting Existing Streaming Job to Delta Live Tables with Historical Backfill

Description:I’m migrating a two-stage streaming job into Delta Live Tables (DLT):Bronze: read from Pub/Sub → write to Bronze tableSilver: use create_auto_cdc_flow on Bronze → upsert into Silver tableNew data works perfectly, but I now need to backfil...

  • 2065 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @yazz ,I’m wondering if you could use a similar approach to the one in the article below.  So, just backfill your bronze table first. Then, the downstream silver and gold layers will pick up the new data from the bronze layer.In that approach you ...

  • 0 kudos
1 More Replies
pt16
by Databricks Partner
  • 1730 Views
  • 3 replies
  • 0 kudos

Enable automatic identity management in Azure Databricks

We have Databricks account admin access but not able to see the option from Databricks admin console to enable automatic identity management.Using the Previews page wanted to enable and fallowed below steps:1. As an account admin, log in to the accou...

  • 1730 Views
  • 3 replies
  • 0 kudos
Latest Reply
pt16
Databricks Partner
  • 0 kudos

After raising Databrick ticket, today I am able to see the Automatic Identity Management  public preview option 

  • 0 kudos
2 More Replies
seefoods
by Valued Contributor
  • 1687 Views
  • 1 replies
  • 1 kudos

process mongo table to delta table databricks

Hello Guys,I have a table mongo which size is 67GB, I use streaming to ingest but is very slow to copying all data to Delta table. Someone have an answer to this?  I use connector mongodb V10.5 this is my code pipeline_mongo_sec = [ { "$u...

  • 1687 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

What if you do not update the delta table for each incoming microbatch but f.e. only do this every 15 min/hour/whatever.Like that you can keep on ingesting in a streaming way, but the actual update towards the delta table is more batch approached so ...

  • 1 kudos
mr3
by New Contributor
  • 3223 Views
  • 2 replies
  • 2 kudos

Update Delta Table with Apache Spark connector

Hi everyone. I'd like to ask a question about updating Delta tables using the Apache Spark connector.Let's say I have two tables: one is a product dimension table with items from my shop, and the other contains a single column with the IDs of the pro...

  • 3223 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @mr3 ,Yes, it’s perfectly fine to use a MERGE operation solely for updates. The UPDATE statement has many limitations. It doesn't support neither UPDATE FROM nor subqueries. This creates many limitations. There are situations where we would like t...

  • 2 kudos
1 More Replies
shrutikatyal
by New Contributor III
  • 5674 Views
  • 9 replies
  • 2 kudos

Resolved! commit time is coming as null in autoloader

As per the databricks new feature in autoloader that we can use archival and move feature in autoloader however I am trying to use that feature using databricks 16.4.x.scala2.12 however commit time is still coming null as its mentioned in the documen...

  • 5674 Views
  • 9 replies
  • 2 kudos
Latest Reply
TheOC
Databricks Partner
  • 2 kudos

Hey @shrutikatyal I believe the only current route to get a discount voucher would be the following:https://community.databricks.com/t5/events/dais-2025-virtual-learning-festival-11-june-02-july-2025/ev-p/119323I think it’s the last day of the event ...

  • 2 kudos
8 More Replies
MinuN
by New Contributor
  • 3740 Views
  • 1 replies
  • 0 kudos

Handling Merged Heading Rows When Converting Excel to CSV in Databricks

Hi all,I'm working on a process in Databricks to convert multiple Excel files to CSV format. These Excel files follow a similar structure but with some variations. Here's the situation:Each file contains two header rows:The first row contains merged ...

  • 3740 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi MinuN,How are you doing today?That’s a great question, and you're definitely on the right path using BeautifulSoup to extract the table structure from .xls HTML-like files. To generate the repeated first row of main headings for the CSV, one pract...

  • 0 kudos
sridharplv
by Valued Contributor II
  • 1983 Views
  • 1 replies
  • 1 kudos

Need help on "You cannot enable Iceberg reads on materialized views and streaming tables"

Hi All, As we  "cannot enable Iceberg reads on materialized views and streaming tables", Is there any option in private preview to enable Iceberg reads for materialized views and streaming tables. I tried using the option of DLT Sink API with table c...

  • 1983 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi sridharplv,How are you doing today? As per my understanding, Databricks does not support Iceberg reads for materialized views and streaming tables, and there’s no official preview or timeline shared publicly for enabling this support. Your workaro...

  • 1 kudos
ddundovic
by New Contributor III
  • 6989 Views
  • 2 replies
  • 1 kudos

Resolved! Lookup dashboard ID in bundle variables

Hi all,I have an asset bundle that contains the following dashboard_task:resources: jobs: my_job: name: my_job_name tasks: - task_key: refresh_my_dashboard dashboard_task: dashboard_id: ${var.my_dashbo...

  • 6989 Views
  • 2 replies
  • 1 kudos
Latest Reply
ddundovic
New Contributor III
  • 1 kudos

Thanks! That does make sense. When I run `databricks lakeview list` I do get the dashboard I want:[ { "create_time": "2025-06-23T08:09:49.595Z", "dashboard_id": "id000000000000000000000", "display_name": "My_Dashboard_Name", "lifecy...

  • 1 kudos
1 More Replies
varni
by New Contributor III
  • 1550 Views
  • 1 replies
  • 0 kudos

Widget value not synchronized after detach/reattach

Hello Databricks Team,I hope you are doing well.I’m working with dbutils.widgets in a Databricks notebook using the Accessed Commands mode, and I have encountered some challenges.Specifically, after detaching and reattaching to the cluster:- the widg...

  • 1550 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello there Can you please share the code used for widgets?also if you change mannually, is it working? (did it worked before?) Are you trying to load via some parent notebook?Waiting for your response. 

  • 0 kudos
JameDavi_51481
by Contributor
  • 1866 Views
  • 1 replies
  • 0 kudos

making REORG TABLE to enable Iceberg Uniform more efficient and faster

I am upgrading a large number of tables for Iceberg / Uniform compatibility by running REORG TABLE <tablename> APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));and finding that some tables take several hours to upgrade - presumably because they are ...

  • 1866 Views
  • 1 replies
  • 0 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 0 kudos

HI @JameDavi_51481 , Hope you tried this approach for enabling iceberg metadata along with delta format :ALTER TABLE internal_poc_iceberg.iceberg_poc.clickstream_gold_sink_dltSET TBLPROPERTIES ('delta.columnMapping.mode' = 'name','delta.enableIceberg...

  • 0 kudos
vsam
by New Contributor II
  • 2992 Views
  • 5 replies
  • 2 kudos

Optimize taking FULL Taking Longer time on Clustered Table

Hi Everyone, Currently we are facing issue with OPTIMIZE table_name FULL operation. The dataset consists of 150 billion rows of data and it takes 8 hours to optimize the reloaded clustered table. The table is refreshed every month and it needs cluste...

  • 2992 Views
  • 5 replies
  • 2 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 2 kudos

Hi @vsam , Have you tried the Auto liquid clustering with Predictive optimization enabled where you don't need to mention cluster by columns specifically and also the optimization will be handled in the backend by predictive optimization concept.http...

  • 2 kudos
4 More Replies
glevin1
by Databricks Partner
  • 2711 Views
  • 1 replies
  • 0 kudos

API response code when running a new job

We are attempting to use the POST /api/2.2/jobs/run-now endpoint using oAuth 2.0 Client Credentials authentication.We are finding that when sending a request with an expired token, we receive a HTTP code of 400. This contradicts the documentation on ...

  • 2711 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 0 kudos

Hello gelvinPlease raise the ticket using this lik  https://help.databricks.com/s/contact-us?ReqType=training Please explain the issue clearly so that it will be easy for supoort team to help easily.

  • 0 kudos
carolregatt
by New Contributor II
  • 3453 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Asset Bundle wrongfully deleting job

Hey so Ive just started to use DAB to automatically mange job configs via CICD I had a previously-existing job (lets say ID 123) which was created manually and had this config resources:  jobs:    My_Job_A:      name: My Job A And I wanted to automat...

  • 3453 Views
  • 2 replies
  • 1 kudos
Latest Reply
carolregatt
New Contributor II
  • 1 kudos

Thanks so much for the response @Advika !That makes sense!Can you explain why the remote config had a different key when compared to the local one? I guess that was what threw me off and made me want to change the local key to match the remote

  • 1 kudos
1 More Replies
Hoviedo
by New Contributor III
  • 1850 Views
  • 4 replies
  • 0 kudos

Apply expectations only if column exists

Hi, is there any way to apply a expectations only if that column exists? I am creating multiple dlt tables with the same python function so i would like to create diferent expectations based in the table name, currently i only can create expectations...

  • 1850 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To apply expectations only if a column exists in Delta Live Tables (DLT), you can use the @Dlt.expect decorator conditionally within your Python function. Here is a step-by-step approach to achieve this: Check if the Column Exists: Before applying th...

  • 0 kudos
3 More Replies
Arturo_Franco
by New Contributor II
  • 1155 Views
  • 2 replies
  • 0 kudos

I cant find Data Engineering Associate Resources

Im taking the self-paced Databricks Data Engineering Associate course. Where I can find the repo link to the repo is shared throughout the course?

  • 1155 Views
  • 2 replies
  • 0 kudos
Latest Reply
Arturo_Franco
New Contributor II
  • 0 kudos

Hi, @Advika thanks for the reponse. Im currenlty usign the partner databricks account, dont I have access to the resources with this partner subscription?

  • 0 kudos
1 More Replies
Labels