cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MrChromatic
by New Contributor II
  • 482 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks UI and Backend Deployment Issue

Hi everyone,I’m deploying a frontend (Streamlit) and backend (FastAPI) as two separate Databricks Apps within the same workspace, both with user authentication enabled.The frontend makes a server-side HTTP request to the backend app URL when a user s...

  • 482 Views
  • 2 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @MrChromatic, The behavior you are seeing is expected. When your Streamlit frontend app makes an HTTP request to your FastAPI backend app URL, that request goes through the Databricks authentication proxy just like any browser request would. Since...

  • 2 kudos
1 More Replies
Saf4Databricks
by Contributor
  • 768 Views
  • 8 replies
  • 1 kudos

Resolved! Issue on Service Credential creation for Azure Databricks access connector

Question: Why I'm getting the following error and how can we fix it?In step 6 of Create service credentials - Azure Databricks | Microsoft Learn when I enter the resource id of my Azure Databricks access connector, I get the following error:/subscrip...

Service_Credential_dialogBox.png
  • 768 Views
  • 8 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Saf4Databricks, The error message is the key clue here. When you enter the Azure Access Connector resource ID and get back: "is not a valid IAM role ARN. Valid ARNs normally look like arn:aws:iam::<account>:role/<iam-role-name>" This tells you th...

  • 1 kudos
7 More Replies
yit337
by Contributor
  • 456 Views
  • 4 replies
  • 1 kudos

Are streaming tables suitable for Gold layer Star schema?

Based on docs, we can't use identity columns and ANALYZE TABLE on streaming tables. So, should we avoid using streaming tables for Gold layer Star schema?https://docs.databricks.com/aws/en/ldp/developer/ldp-sql-ref-create-streaming-table#limitations 

  • 456 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @yit337, You are on the right track noticing those limitations. The short answer is: for a Gold layer star schema, materialized views are generally the better fit, though streaming tables are not completely ruled out depending on the specific tabl...

  • 1 kudos
3 More Replies
Akash_Varuna
by New Contributor II
  • 241 Views
  • 1 replies
  • 0 kudos

Streaming Table data leakage to historical permanent table

Data Leakage in Historical Table from Streaming TableEnvironmentPlatform: Azure Databricks + Azure Event HubsStreaming Framework: Spark Structured StreamingStorage: Delta LakePipeline  Event Hubs → stream_messages (live 24hr rolling window) → message...

  • 241 Views
  • 1 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Akash_Varuna, The count discrepancies you are seeing between stream_messages and messages are almost certainly caused by the 24-hour rolling window on your stream_messages table expiring data while the load_messages job is paused during your main...

  • 0 kudos
senkii
by Databricks Partner
  • 958 Views
  • 2 replies
  • 1 kudos

Resolved! How to stop task retry

I would like to stop automatic retries, but the max retries configuration does not seem to work.Could you please tell me how to disable retries? I would also like to understand why the task retries automatically.I did not set any scheduler. I created...

senkii_0-1771320879821.png senkii_1-1771320966643.png senkii_2-1771321009007.png
  • 958 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @senkii, There are two separate retry mechanisms in Databricks that can cause tasks to run again, and distinguishing between them is important for your situation. 1. TASK-LEVEL RETRIES (Workflows setting) This is the "Retries" setting you configur...

  • 1 kudos
1 More Replies
developer3535
by New Contributor II
  • 517 Views
  • 2 replies
  • 0 kudos

Resolved! Zerobus Kafka-compatible API

Hi Team,I went through a recording where it was mentioned that a Kafka‑compatible API is planned for a Beta release in Q1. Do we have any rough timeline on when this feature might be available?We already have Kafka producer topics, and we would like ...

developer3535_0-1771491074468.png
  • 517 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @developer3535, I see @stbjelcevic already confirmed the Q1 2026 timeline for the Kafka-compatible API Beta. I wanted to add some context on what you can do in the meantime and where to look for updates. CURRENT ZEROBUS INGEST INTERFACES While wai...

  • 0 kudos
1 More Replies
samuelperezh
by New Contributor
  • 664 Views
  • 2 replies
  • 2 kudos

Architecture Advice: DLT Strategy for Daily Snapshots to SCD2 with "Grace Period" Deletes

Hi everyone,I’m looking for architectural advice on building a Silver layer in DLT. I am dealing with inventory data and need to handle historical tracking, "sold" logic based on disappearance, and storage cost optimization.Here's how the situation l...

  • 664 Views
  • 2 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @samuelperezh, Building on @aleksandra_ch's reply, I wanted to add some additional detail around each of your three questions, especially around the grace period implementation and the backfill strategy. 1. GRACE PERIOD PATTERN As aleksandra_ch no...

  • 2 kudos
1 More Replies
aranjan99
by Contributor
  • 410 Views
  • 2 replies
  • 1 kudos

how does Job cluster auto scaling work

Can you share the metrics databricks uses during job cluster auto scaling?Is Databricks  looking at queued tasks, slot utilization etc or just looking at CPU utilizations?The autoscaling docuemnt https://docs.databricks.com/aws/en/compute/configure?u...

  • 410 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @aranjan99, The autoscaling behavior on job clusters depends on your workspace pricing tier. Here is a breakdown of the metrics and mechanics involved. WHAT METRICS DRIVE SCALING DECISIONS Job cluster autoscaling uses Spark scheduler signals, not ...

  • 1 kudos
1 More Replies
Ashley1
by Contributor
  • 1432 Views
  • 4 replies
  • 1 kudos

Resolved! Turn off AI assistance in notebooks

Hi, has anyone found a way that the AI assistant can be turned off in notebooks? I would be happy to keep code introspection but I find I'm more often hitting escape than accepting the AI's suggestions (or removing the code it has suggested when I ac...

  • 1432 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Ashley1, There are a few different levels where you can control the AI assistance behavior in notebooks. Here is a breakdown: USER-LEVEL: DISABLE AI AUTOCOMPLETE (INLINE SUGGESTIONS) This is the setting that controls the "ghost text" inline code ...

  • 1 kudos
3 More Replies
yit337
by Contributor
  • 389 Views
  • 2 replies
  • 1 kudos

Resolved! Identity column has null values

I want to update a dimension table in the gold model from a silver table by using  create_auto_cdc_from_snapshot_flow and SCD2. In the target table, I have defined an IDENTITY column, which should be populated automatically.The dlt flow runs successf...

  • 389 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @yit337, The reason your identity column values are NULL is that the target table created by create_auto_cdc_from_snapshot_flow is a streaming table, and streaming tables do not support identity columns. This is a documented limitation: https://do...

  • 1 kudos
1 More Replies
Saikumar_Manne
by New Contributor II
  • 704 Views
  • 4 replies
  • 1 kudos

Resolved! How to use multi-threading and batch inserts for large UPSERT to PostgreSQL from Databricks?

Hi everyone,We have a Databricks (Unity Catalog) pipeline where we process large datasets in Spark and need to load incremental data into a PostgreSQL target table.Our scenario is:Initial full load (~300 million rows) to PostgreSQL using bulk COPY is...

  • 704 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Saikumar_Manne, With 190M+ daily rows going into PostgreSQL via INSERT ON CONFLICT DO UPDATE, there are several levers to pull. Here is a breakdown of the approaches and tuning options. APPROACH 1: STAGING TABLE + MERGE (RECOMMENDED FOR THIS VOLU...

  • 1 kudos
3 More Replies
ChrisLawford_n1
by Contributor II
  • 543 Views
  • 3 replies
  • 1 kudos

Resolved! DeltaFileOperations: Listing improvement?

Hello, I am using databricks autoloader with managedfileevents turned on and include existing files.I want to understand if there is a way of increasing the speed of the initial listing of the files for autoloader.I thought that the idea behind the m...

  • 543 Views
  • 3 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @ChrisLawford_n1, You are correct that managed file events (cloudFiles.useManagedFileEvents = true) works by having Databricks maintain a record of file events on the external location, so when you start a new Auto Loader stream, it can replay tho...

  • 1 kudos
2 More Replies
arushigulati
by Databricks Partner
  • 503 Views
  • 2 replies
  • 0 kudos

Lakebridge transpile to translate from oracle to databricks sql

Hi Community,I am currently working on a PoC to migrate data from Oracle to Databricks. As part of this, we are attempting to automate the DDL conversion process.We are leveraging Databricks Labs Lakebridge for transpilation, but it is failing to con...

arushigulati_0-1769670207329.png arushigulati_1-1769670261737.png
  • 503 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @arushigulati, Lakebridge (the Databricks Labs project formerly known as Remorph) does support Oracle as a source dialect for transpilation, but the DDL handling, particularly around constraints like PRIMARY KEY, has some gaps depending on the ver...

  • 0 kudos
1 More Replies
echol
by New Contributor II
  • 870 Views
  • 6 replies
  • 1 kudos

Redeploy Databricks Asset Bundle created by others

Hi everyone,Our team is using Databricks Asset Bundles (DAB) with a customized template to develop data pipelines. We have a core team that maintains the shared infrastructure and templates, and multiple product teams that use this template to develo...

  • 870 Views
  • 6 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @echol, This is a common scenario when multiple team members work with Databricks Asset Bundles, and there are a few approaches to solve it cleanly. THE ROOT CAUSE When Staff A deploys a bundle, the jobs and other resources are created with Staff ...

  • 1 kudos
5 More Replies
RIDBX
by Contributor
  • 336 Views
  • 3 replies
  • 1 kudos

Robust/complex scheduling with dependency within Databricks?

Robust scheduling with dependency within Databricks?======================================  Thanks for reviewing my threads. I like to explore Robust/complex scheduling with dependency within Databricks.We know traditional scheduling framework allow ...

  • 336 Views
  • 3 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @RIDBX, Databricks Lakeflow Jobs has several features that let you build exactly this kind of tiered, dependency-driven orchestration natively. Here is how I would approach your HR (Tier 1) and Finance (Tier 2) scenario. OPTION 1: SINGLE ORCHESTRA...

  • 1 kudos
2 More Replies
Labels