cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sminamioka
by New Contributor III
  • 63 Views
  • 1 replies
  • 1 kudos

Compute tab doesn't show and doesn't give the option to create a cluster

I've just created an Azure Databricks workspace, tier (Premium) and when trying to create a cluster, when I click on compute, the UI opens automatically the menu SQL Warehouse, not sure if it's a glitch as shown below. Someone said "Ask the admin to ...

sminamioka_0-1778276402869.png
Data Engineering
cluster
clusters
  • 63 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 1 kudos

Hi  !You are probably seeing only SQL Warehouses because your user or group does not currently have permission to create classic compute.You need to grant allow unrestricted cluster creation or assign a compute policy such as personal compute. After ...

  • 1 kudos
IM_01
by Contributor III
  • 46 Views
  • 1 replies
  • 1 kudos

Lakeflow SDP partition error

Hi,I was trying to log an exception in Lakeflow SDP , firstly I am creating an empty streaming dataframe in case of exception and writing log into audit table as shown belowtry: raise Exception("testexception") return df except Exception as e: df=...

  • 46 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 1 kudos

Hi @IM_01  !I think that your issue is caused by using the rate source as a dummy empty stream.The rate source stores its partition count in the streaming checkpoint and because numPartitions was not explicitly set it can change between runs dependin...

  • 1 kudos
ideal_knee
by New Contributor III
  • 12114 Views
  • 7 replies
  • 8 kudos

Reading an Iceberg table with AWS Glue Data Catalog as metastore

I have created an Iceberg table using AWS Glue, however whenever I try to read it using a Databricks cluster, I get `java.lang.InstantiationException`. I have tried every combination of Spark configs for my Databricks compute cluster that I can think...

  • 12114 Views
  • 7 replies
  • 8 kudos
Latest Reply
ideal_knee
New Contributor III
  • 8 kudos

In case someone happens upon this in the future, I ended up using Unity Catalog with Hive metastore federation for Glue. The Iceberg support is currently "coming soon in Public Preview."

  • 8 kudos
6 More Replies
manish_de
by New Contributor
  • 67 Views
  • 0 replies
  • 1 kudos

query based connector snapshot feature

In ingestion pipeline, for query based connector there is option of selecting batch snapshot instead of column name under dropdown - Cursor column. If I choose batch snapshot, will the databricks engine run select * from my source table, say Sql serv...

  • 67 Views
  • 0 replies
  • 1 kudos
batch_bender
by New Contributor
  • 1031 Views
  • 4 replies
  • 2 kudos

create_auto_cdc_from_snapshot_flow vs create_auto_cdc_flow – when is snapshot CDC actually worth it?

I am deciding between create_auto_cdc_from_snapshot_flow() and create_auto_cdc_flow() in a pipeline.My source is a daily full snapshot table:No operation column (no insert/update/delete flags)Order can be derived from snapshot_date (sequence by)Rows ...

  • 1031 Views
  • 4 replies
  • 2 kudos
Latest Reply
manish_de
New Contributor
  • 2 kudos

Does this work only for tables with PK. What if the source table doesnt even have PK. Does it use any type of hashing by concatenating all columns and then use that key for merge? 

  • 2 kudos
3 More Replies
DushendRaghavan
by New Contributor
  • 168 Views
  • 1 replies
  • 0 kudos

How to handle MERGE with Schema Evolution in Delta Lake

How to handle MERGE with Schema Evolution in Delta LakeHi everyone,Schema evolution during MERGE is one of the trickiest parts of building robust Delta Lake pipelines. Databricks actually has a native SQL syntax for this — plus Python API options for...

  • 168 Views
  • 1 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

Great post. Would also like to consider the following points:Guardrails: schema evolution is powerful — it can also accidentally add garbage columns if upstream sends unexpected fields.Recommendation: validate/allowlist schema changes in higher envir...

  • 0 kudos
yit337
by Contributor
  • 149 Views
  • 2 replies
  • 0 kudos

Does Lakeflow Connect guarantee no out-of-order records?

I use Lakeflow Connect to load data from my source databases to bronze tables. Then I have auto_cdc to track SCD2 changes in my silver tables. I use _commit_timestamp from the bronze CDF, as sequence_by property in auto_cdc in order to order the vers...

  • 149 Views
  • 2 replies
  • 0 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 0 kudos

Recommendation: use a business/effective timestamp in sequence_by if your source can emit late/backdated changes and you want SCD2 history to reflect source event time, not bronze arrival/commit time. If ties are possible, use a STRUCT for determinis...

  • 0 kudos
1 More Replies
Danish11052000
by Contributor
  • 129 Views
  • 3 replies
  • 1 kudos

Need to fetch Mount Point details

Hi Team,I’m currently working on building a consolidated view of access permissions across our Databricks environment.For Unity Catalog (UC) objects, I’m able to retrieve permission details using system tables (privileges / audit logs).However, for l...

  • 129 Views
  • 3 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 1 kudos

Hello @Danish11052000  !Thank you for the question it really helped me to review my knowledge and go back and pay attention to this subject and guess what ? you are correct because UC permissions alone will not give complete access governance for leg...

  • 1 kudos
2 More Replies
susanne
by Databricks Partner
  • 1909 Views
  • 4 replies
  • 0 kudos

Resolved! Authentication failure Lakeflow SQL Server Ingestion

Hi all I am trying to create a Lakeflow Ingestion Pipeline for SQL Server, but I am running into the following authentication error when using my Databricks Database User for the connection:Gateway is stopping. Authentication failure while obtaining ...

  • 1909 Views
  • 4 replies
  • 0 kudos
Latest Reply
rkhbo3003
New Contributor II
  • 0 kudos

I am also facing the same issue. We have user id as service principal name however in sql log it shows applicationID that it cannot login . Setvice principal ( name) has highest privileges in sql db . howevrr same is working fine through jdbc 

  • 0 kudos
3 More Replies
ChristianRRL
by Honored Contributor
  • 176 Views
  • 2 replies
  • 2 kudos

Resolved! Unity Catalog - How to read prod data in dev with appropriate read-only access?

Hi there,Our team is currently migrating to using Unity Catalog. We have two databricks workspaces for dev & prod, and one thing that I'm wondering is if there is a simple/appropriate way to have only two catalogs dev & prod, where the prod databrick...

  • 176 Views
  • 2 replies
  • 2 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 2 kudos

Yes — you can accomplish exactly what you described with only two catalogs (dev + prod). You do not need a third prod_readonly catalog.There are two complementary control planes in Unity Catalog:Workspace-level restriction (workspace-catalog binding)...

  • 2 kudos
1 More Replies
chiruinfo5262
by New Contributor II
  • 1401 Views
  • 6 replies
  • 0 kudos

Trying to convert oracle sql to databricks sql but not getting the desired output

ORACLE SQL: COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN SELECTED_PERIOD_START_DATE AND SELECTED_PERIOD_END_DATE THEN 1 END ) SELECTED_PERIOD_BM,COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN COMPARISON_PERIOD_START_DATE AND COMPARISON_...

  • 1401 Views
  • 6 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

You’re using date_format(...) which turns dates into strings, so BETWEEN becomes a string comparison. You can also look up for databricks lakebridge that can assist you in code conversion or migrations. https://databrickslabs.github.io/lakebridge/ 

  • 0 kudos
5 More Replies
mnissen1337
by New Contributor
  • 228 Views
  • 4 replies
  • 3 kudos

Resolved! AI/BI Dashboard refresh via DABs + Jobs executes successfully but dashboard does not update without

I’m migrating a solution from an on-prem setup to Databricks AI/BI Dashboards, and I’m trying to replicate a near real-time dashboard experience (around ~1 minute latency is acceptable).In the legacy setup, we used DirectQuery combined with automatic...

  • 228 Views
  • 4 replies
  • 3 kudos
Latest Reply
mnissen1337
New Contributor
  • 3 kudos

Thanks for the answer! Thats unfortunate. Do you think in the future Databricks will support the provided use case or will we need to do workarounds such as embedding the Dashboard in an DBKS app or maybe just create the entire Dashboard in an app us...

  • 3 kudos
3 More Replies
bi_123
by New Contributor II
  • 83 Views
  • 1 replies
  • 1 kudos

PII tags in Spark Declarative Pipelines

I need to add PII tags at both the table and column levels for a streaming table created using Spark Declarative Pipelines.I tried applying Unity Catalog tags with the following code inside the SDP Python pipeline:spark.sql(f"""ALTER TABLE {table_nam...

  • 83 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 1 kudos

Hi @bi_123  !You need to use UC tags outside the SPD definition not inside the SDP python function.@dp.table(table_properties=...) can set table properties but those are not the same as UC tags and spark.sql("ALTER TABLE ...") inside SDP python is no...

  • 1 kudos
Labels