cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

susanne
by Databricks Partner
  • 1875 Views
  • 4 replies
  • 0 kudos

Resolved! Authentication failure Lakeflow SQL Server Ingestion

Hi all I am trying to create a Lakeflow Ingestion Pipeline for SQL Server, but I am running into the following authentication error when using my Databricks Database User for the connection:Gateway is stopping. Authentication failure while obtaining ...

  • 1875 Views
  • 4 replies
  • 0 kudos
Latest Reply
rkhbo3003
New Contributor II
  • 0 kudos

I am also facing the same issue. We have user id as service principal name however in sql log it shows applicationID that it cannot login . Setvice principal ( name) has highest privileges in sql db . howevrr same is working fine through jdbc 

  • 0 kudos
3 More Replies
ChristianRRL
by Honored Contributor
  • 97 Views
  • 2 replies
  • 2 kudos

Resolved! Unity Catalog - How to read prod data in dev with appropriate read-only access?

Hi there,Our team is currently migrating to using Unity Catalog. We have two databricks workspaces for dev & prod, and one thing that I'm wondering is if there is a simple/appropriate way to have only two catalogs dev & prod, where the prod databrick...

  • 97 Views
  • 2 replies
  • 2 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 2 kudos

Yes — you can accomplish exactly what you described with only two catalogs (dev + prod). You do not need a third prod_readonly catalog.There are two complementary control planes in Unity Catalog:Workspace-level restriction (workspace-catalog binding)...

  • 2 kudos
1 More Replies
chiruinfo5262
by New Contributor II
  • 1362 Views
  • 6 replies
  • 0 kudos

Trying to convert oracle sql to databricks sql but not getting the desired output

ORACLE SQL: COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN SELECTED_PERIOD_START_DATE AND SELECTED_PERIOD_END_DATE THEN 1 END ) SELECTED_PERIOD_BM,COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN COMPARISON_PERIOD_START_DATE AND COMPARISON_...

  • 1362 Views
  • 6 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

You’re using date_format(...) which turns dates into strings, so BETWEEN becomes a string comparison. You can also look up for databricks lakebridge that can assist you in code conversion or migrations. https://databrickslabs.github.io/lakebridge/ 

  • 0 kudos
5 More Replies
Danish11052000
by Contributor
  • 84 Views
  • 2 replies
  • 1 kudos

Need to fetch Mount Point details

Hi Team,I’m currently working on building a consolidated view of access permissions across our Databricks environment.For Unity Catalog (UC) objects, I’m able to retrieve permission details using system tables (privileges / audit logs).However, for l...

  • 84 Views
  • 2 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 1 kudos

Hello @Danish11052000  !Thank you for the question it really helped me to review my knowledge and go back and pay attention to this subject and guess what ? you are correct because UC permissions alone will not give complete access governance for leg...

  • 1 kudos
1 More Replies
mnissen1337
by New Contributor
  • 120 Views
  • 4 replies
  • 3 kudos

Resolved! AI/BI Dashboard refresh via DABs + Jobs executes successfully but dashboard does not update without

I’m migrating a solution from an on-prem setup to Databricks AI/BI Dashboards, and I’m trying to replicate a near real-time dashboard experience (around ~1 minute latency is acceptable).In the legacy setup, we used DirectQuery combined with automatic...

  • 120 Views
  • 4 replies
  • 3 kudos
Latest Reply
mnissen1337
New Contributor
  • 3 kudos

Thanks for the answer! Thats unfortunate. Do you think in the future Databricks will support the provided use case or will we need to do workarounds such as embedding the Dashboard in an DBKS app or maybe just create the entire Dashboard in an app us...

  • 3 kudos
3 More Replies
bi_123
by New Contributor II
  • 57 Views
  • 1 replies
  • 1 kudos

PII tags in Spark Declarative Pipelines

I need to add PII tags at both the table and column levels for a streaming table created using Spark Declarative Pipelines.I tried applying Unity Catalog tags with the following code inside the SDP Python pipeline:spark.sql(f"""ALTER TABLE {table_nam...

  • 57 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 1 kudos

Hi @bi_123  !You need to use UC tags outside the SPD definition not inside the SDP python function.@dp.table(table_properties=...) can set table properties but those are not the same as UC tags and spark.sql("ALTER TABLE ...") inside SDP python is no...

  • 1 kudos
MiriamHundemer
by New Contributor III
  • 52 Views
  • 1 replies
  • 1 kudos

Calls to databricks api taking more than 60 seconds to complete

Hi,since April 1st (2026) we are having problems calling the databricks /api/2.2/jobs/runs/list and the /api/2.0/sql/history/queries endpoint. Calls to these endpoints sometimes seem to take longer than 60 seconds now using the databricks python sdk ...

  • 52 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 1 kudos

Hello @MiriamHundemer  !I don't think this is a rate limit issue because it is indeed 30 req/sec per workspace but that only tells you when some throttling may happen and it does not guarantee that every call returns within 60 sec. Also don't forget ...

  • 1 kudos
ChristianRRL
by Honored Contributor
  • 204 Views
  • 3 replies
  • 3 kudos

Resolved! Declarative Automation Bundle - Reusable job_cluster configuration

Hi there, running into some trouble abstracting job_clusters configurations to improve reusability. At the moment, I have many job yaml files that require the following configuration:What would be the best approach(es) to remove this configuration fr...

ChristianRRL_0-1777669403132.png
  • 204 Views
  • 3 replies
  • 3 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 3 kudos

Hello @ChristianRRL My doubt about your issue is happening in cluster_definitions.yml because it is not only defining a reusable cluster profile it is also redefining the same jobs that already exist in the individual fleet_*.yml files.Why ? because ...

  • 3 kudos
2 More Replies
ChristianRRL
by Honored Contributor
  • 147 Views
  • 2 replies
  • 2 kudos

Resolved! run_if condition to handle prior task excluded?

Hi there, I kind of know the answer here but want to check in case I'm missing anything (or else maybe vent slightly and hope for new functionality in the future).Basically, I'm looking for a way to run a task if either (A) the prior step ran success...

ChristianRRL_1-1778008957285.png ChristianRRL_2-1778009198388.png
  • 147 Views
  • 2 replies
  • 2 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 2 kudos

Hello @ChristianRRL  !You are totally righy. With the current DBKS dependency semantics, a downstream task cannot run when all of its direct upstream dependencies are excluded regardless of the run_if option. If you check the doc it explicitly says (...

  • 2 kudos
1 More Replies
lrm_data
by New Contributor II
  • 94 Views
  • 1 replies
  • 0 kudos

Resolved! Is Lakeflow Connect SCD Type 2 output is incompatible with Spark dec pipeline streaming tables?

## ProblemWhen using Lakeflow Connect to ingest from SQL Server with SCD Type 2 enabled, any downstream Streaming Table (auto cdc flow) in a Spark Declarative pipeline will fail with the following error:"An error occurred because we detected an updat...

  • 94 Views
  • 1 replies
  • 0 kudos
Latest Reply
lrm_data
New Contributor II
  • 0 kudos

Following up with a recommendation from Databricks:For tables that need incremental processing - SQL Server →  Lakeflow Connect → Bronze SCD2 Streaming Table (CDF enabled → consume CDF, not base table using AUTO CDC → Silver SCD2 Streaming Table → Do...

  • 0 kudos
lrm_data
by New Contributor II
  • 378 Views
  • 3 replies
  • 2 kudos

Resolved! **Lakeflow Connect SQL Server — Snapshots Firing Outside Configured Full Refresh Window?**

Has anyone else seen full refresh snapshots trigger outside of their configured refresh window in Lakeflow Connect?Here's our situation:- We have a full refresh window configured to restrict snapshot operations to off-hours- On at least one occasion,...

  • 378 Views
  • 3 replies
  • 2 kudos
Latest Reply
lrm_data
New Contributor II
  • 2 kudos

Hello @Sumit_7 ,I have tested a few scenarios and logged a ticket with Databricks and discovered the following:Common MisconceptionThe start_window setting does not define a bounded time window during which full refreshesare contained. It is simply a...

  • 2 kudos
2 More Replies
lrm_data
by New Contributor II
  • 440 Views
  • 4 replies
  • 0 kudos

Resolved! Lakeflow Connect - SQL Server - Issues restarting after failure

Has anyone else run into a situation where a breaking schema change on a SQL Server source table leaves their Lakeflow Connect pipeline in a state it can't recover from — even after destroying and recreating the pipeline?Here's what happened to us:- ...

  • 440 Views
  • 4 replies
  • 0 kudos
Latest Reply
lrm_data
New Contributor II
  • 0 kudos

Hey all,Following up.I was able to recover. The one step I was missing is resetting CDC in the source side. After that, I was able to destroy and recreate the bundle and successfully refresh all tables. Thanks!

  • 0 kudos
3 More Replies
ittzzmalind
by New Contributor III
  • 209 Views
  • 2 replies
  • 0 kudos

Azure Databricks Serverless – SFTP Connectivity (external provider)

Hi,To establish connectivity from Azure Databricks serverless compute  to an external SFTP provider hosted outside organization (external provider).when i searched i figured out one way is whitelisting ip,1). The SFTP provider requires IP whitelistin...

  • 209 Views
  • 2 replies
  • 0 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 0 kudos

Recommendation: if the external SFTP vendor strictly requires source-IP allowlisting, the most reliable path is usually classic compute with your own NAT gateway/static public IP. For serverless, Azure Databricks can reach public external resources v...

  • 0 kudos
1 More Replies
bi_123
by New Contributor II
  • 113 Views
  • 2 replies
  • 1 kudos

Serverless compute throws OUT_OF_MEMORY exception

I'm running a Lakeflow Declarative Pipeline that reads data from a bronze table ingested by Auto Loader and writes it to a silver table with simple transformations.The source data contains struct columns with many deeply nested fields. The table curr...

  • 113 Views
  • 2 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 1 kudos

Hello @bi_123  !Serverless is normally the recommended default for lakeflow declarative pipelines because DBKS manages the infrastructure and uses enhanced autoscaling including horizontal and vertical scaling. However you may require classic compute...

  • 1 kudos
1 More Replies
Labels