Data Engineering

Forum Posts

Sorted by:

by Data_NXT • New Contributor III

09-24-2025 10:22:07 AM

1024 Views
3 replies
5 kudos

Resolved! Databricks Business dashboards - Interactive cluster Total dollar spent

I'm working on Databricks Business Dashboards and trying to calculate interactive cluster compute time and total dollar spend per workspace.As per standard understanding, the total dollar spent = Interactive Clusters + Job Clusters + SQL Warehouses.I...

Data Engineering

1024 Views
3 replies
5 kudos

09-24-2025 10:22:07 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

09-24-2025 11:21:20 AM

5 kudos

Also the system table will not provide you the exact dollar amount that you spend in an interactive compute. Here is the cost breakdown for running interactive compute:ComponentDescriptionCost SourceDBU CostBased on workload type and tierDatabricksV...

5 kudos

09-24-2025 11:21:20 AM

2 More Replies

by Sunil_Poluri • Databricks Partner

07-07-2025 3:06:41 AM

1710 Views
1 replies
1 kudos

Resolved! Unexpected Schema ID Folder Creation in Unity Catalog External Location

I've set up Unity Catalog with an external location pointing to a storage account. For each schema, I’ve configured a dedicated container path. For example:abfss://schemas@<storage_account>.dfs.core.windows.net/_unityStorage/schemas/<schema_id>When I...

Data Engineering

1710 Views
1 replies
1 kudos

07-07-2025 3:06:41 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

09-24-2025 10:01:39 AM

1 kudos

Hey @Sunil_Poluri , I did some research (learned a few things) and here is what I found. Unity Catalog manages cloud storage mapping for schemas using internal IDs (schema_id) to ensure data isolation, governance, and uniqueness within a metastore—e...

1 kudos

09-24-2025 10:01:39 AM

by Anubhav2011 • New Contributor II

08-29-2025 6:23:11 PM

1428 Views
5 replies
4 kudos

What is the Power of DLT Pipeline to read streaming data

I am getting thousands of records every second in my bronze table from Qlik and every second the bronze table is getting truncated and load with new data by Qlik itself. How do I process this much data every second to my silver streaming table before...

Data Engineering

1428 Views
5 replies
4 kudos

08-29-2025 6:23:11 PM

View Replies

Latest Reply

Krishna_S
Databricks Employee

09-24-2025 8:45:26 AM

4 kudos

The Apply Changes API is getting deprecated. The AUTO CDC APIs replace the APPLY CHANGES APIs, and have the same syntax. The APPLY CHANGES APIs are still available, but Databricks recommends using the AUTO CDC APIs in their place. Please refer to the...

4 kudos

09-24-2025 8:45:26 AM

4 More Replies

by gayatrikhatale • Databricks Partner

09-24-2025 5:04:55 AM

1394 Views
3 replies
5 kudos

Resolved! How to stream data from azure event hub to databricks delta table

Hi,I want to stream data from azure event hub to databricks table.But I want to use service principal details for that not event hub connection string.Can anyone please share the code snippet?Thank you!

Data Engineering

1394 Views
3 replies
5 kudos

09-24-2025 5:04:55 AM

View Replies

Latest Reply

gayatrikhatale
Databricks Partner

09-24-2025 7:57:34 AM

5 kudos

Thank you @szymon_dybczak . It's working for me.I have also found one more way to do same thing. Below is the code snippet:from azure.identity import DefaultAzureCredential from azure.eventhub import EventHubConsumerClient # Replace with your Eve...

5 kudos

09-24-2025 7:57:34 AM

2 More Replies

by StephanK8 • Databricks Partner

07-01-2025 12:48:53 AM

2128 Views
2 replies
0 kudos

Updates of Materialized Views in Lakeflow Pipelines Produce MetadataChangedException the masses

Hi,We've set up materialized views (as dlt.table()) for something like 300 tables in a single Lakeflow pipeline. The pipeline is triggered externally by a workflow job (to run twice a day). Running the pipeline we experience something strange. A larg...

Data Engineering

2128 Views
2 replies
0 kudos

07-01-2025 12:48:53 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

09-24-2025 5:45:46 AM

0 kudos

Workarounds & Recommendations Limit Pipeline Parallelism: Modify the pipeline's configuration to reduce the maximum concurrency for DLT task execution, forcing more serialized or grouped updates. Restructure Pipeline Graph: Instead of 300+ separate...

0 kudos

09-24-2025 5:45:46 AM

1 More Replies

by LarsMewa • New Contributor III

09-23-2025 5:38:59 AM

1028 Views
4 replies
1 kudos

Resolved! Databricks Jobs & Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file

I'm getting the following SparkOutOfMemoryError message while reading a 500mb json file, see below. I'm loading four csv files (around 150mb per file) and the json file in the same pipeline. When I load the json file alone it reads fine, same when I ...

Data Engineering

1028 Views
4 replies
1 kudos

09-23-2025 5:38:59 AM

View Replies

Latest Reply

LarsMewa
New Contributor III

09-24-2025 2:37:23 AM

1 kudos

This fixed it:As a quick workaround to address out-of-memory errors when processing large JSON files in Databricks serverless pipelines, we recommend disabling the Photon JSON Scan. The Photon engine is optimized for performance, but scanning large J...

1 kudos

09-24-2025 2:37:23 AM

3 More Replies

by tana_sakakimiya • Contributor

09-23-2025 5:31:24 PM

970 Views
1 replies
1 kudos

Resolved! Any Advice on Dynamic Masking while maintaining performance?

I plan to mask columns with a specific tag like "sensitive", "PII" which represents that the column values are ought to be seen by privileged user groups because they contain credentials or personal identity data.To implement that i plan to create a ...

Data Engineering

970 Views
1 replies
1 kudos

09-23-2025 5:31:24 PM

View Replies

Latest Reply

saurabh18cs
Honored Contributor III

09-24-2025 1:48:04 AM

1 kudos

Hi @tana_sakakimiya Your approach—using Unity Catalog column tags (like "sensitive" or "PII") and applying masking policies based on those tags—is a recommended and scalable way to manage data access in Databricks, especially for compliance and priva...

1 kudos

09-24-2025 1:48:04 AM

by shweta_m • New Contributor III

09-23-2025 2:05:08 AM

577 Views
3 replies
2 kudos

Resolved! Assigning Databricks Account Admin role to User group

Hi,As per our company policy, individual users should not be given elevated privileges. Permissions should be assigned to user groups, so that group membership can be managed at the AD level.In that context, is there a way to assign the 'Databricks A...

Data Engineering

577 Views
3 replies
2 kudos

09-23-2025 2:05:08 AM

View Replies

Latest Reply

shweta_m
New Contributor III

09-23-2025 10:58:53 PM

2 kudos

Hi @szymon_dybczak I tried this and it worked.Thanks!

2 kudos

09-23-2025 10:58:53 PM

2 More Replies

by Travis84 • New Contributor II

09-23-2025 10:14:09 PM

516 Views
1 replies
1 kudos

Which table should i use for a range join hint?

I am a bit confused about how to use range join hints. Consider the following query ```SELECT p.id, p.ts, p.value, rg.metric1, rg.metric2, rg.ts AS range_tsFROM points pLEFT JOIN LATERAL ( SELECT r.metric1, r.metric2, r.ts FROM ranges r WHE...

Data Engineering

516 Views
1 replies
1 kudos

09-23-2025 10:14:09 PM

View Replies

Latest Reply

K_Anudeep
Databricks Employee

09-23-2025 10:40:13 PM

1 kudos

Hello @Travis84 , Below are the answers to your questions: Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join querie...

1 kudos

09-23-2025 10:40:13 PM

by youssefmrini • Databricks Employee

05-17-2023 12:49:14 AM

6157 Views
2 replies
0 kudos

Resolved! Can I use Alation with Partner Connect ?

Data Engineering

6157 Views
2 replies
0 kudos

05-17-2023 12:49:14 AM

View Replies

Latest Reply

idtaylor
New Contributor II

09-23-2025 5:28:05 PM

0 kudos

No you can not. Demo video was from 2023, Alation no longer appears in Databricks Marketplace, and Alation no longer allows free Trials, instead having you fill out a request for a demo--not the best for proving out practical functionality.

0 kudos

09-23-2025 5:28:05 PM

1 More Replies

by TetianaDromova • Databricks Partner

07-09-2025 9:50:50 AM

1829 Views
1 replies
1 kudos

PGP decryption in python file

The same decryption code works in notebook, but fails in python file: import gnupgfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)gpg = gnupg.GPG()decryption_key = dbutils.secrets.get(secret_scope, secret_name)gpg.import_keys(decryption_ke...

Data Engineering

1829 Views
1 replies
1 kudos

07-09-2025 9:50:50 AM

View Replies

Latest Reply

mmayorga
Databricks Employee

09-23-2025 5:17:36 PM

1 kudos

Hi @TetianaDromova Thank you for reaching out and waiting for a response. Having your code working on a notebook is a significant first step, so you are on the right path, but then moving into a Python file, we must consider specific details: How is...

1 kudos

09-23-2025 5:17:36 PM

by Phani1 • Databricks MVP

09-23-2025 1:55:23 AM

2939 Views
3 replies
1 kudos

Available Connectors for ServiceNow to Databricks

Hi Team,What are the available connectors to bring data and metadata from ServiceNow to Databricks, and what are the best options/best practices for integrating ServiceNow with Databricks ?Regards,Phani

Data Engineering

2939 Views
3 replies
1 kudos

09-23-2025 1:55:23 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-23-2025 2:03:42 AM

1 kudos

Hi @Phani1 ,There's official databricks service now connection. You can read about it at below link:Configure ServiceNow for Databricks ingestion - Azure Databricks | Microsoft LearnAnd here you have example of how to create ingestion pipeline using ...

1 kudos

09-23-2025 2:03:42 AM

2 More Replies

by khasim76 • New Contributor II

09-22-2025 2:18:56 PM

529 Views
3 replies
1 kudos

json files i cannot do a df show in community edition i am absolutely stunned/

Data Engineering

community edition

json

sql

529 Views
3 replies
1 kudos

09-22-2025 2:18:56 PM

View Replies

Latest Reply

Khaja_Zaffer
Esteemed Contributor

09-22-2025 2:53:57 PM

1 kudos

Hello @khasim76 Good day!!Can i know what is the error msg you are getting? is it legacy community edition or free community edition? Also there are some limitation like - Table results: ≤10,000 rows or 2 MB (whichever first). - Text outputs: Trunc...

1 kudos

09-22-2025 2:53:57 PM

2 More Replies

by seefoods • Valued Contributor

09-22-2025 8:30:27 AM

3072 Views
10 replies
4 kudos

Resolved! append using foreach batch autoloader

Hello Guys, when i append i have this error someone knows how to fix it? raise converted from None pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `s_test` because it already exists. Ch...

Data Engineering

3072 Views
10 replies
4 kudos

09-22-2025 8:30:27 AM

View Replies

Latest Reply

seefoods
Valued Contributor

09-23-2025 6:43:31 AM

4 kudos

Thanks a lot @szymon_dybczak

4 kudos

09-23-2025 6:43:31 AM

9 More Replies

by Malthe • Valued Contributor II

07-22-2025 4:04:23 AM

2381 Views
1 replies
2 kudos

Resolved! Trigger table sync from job

When setting up a table sync using the UI, a pipeline is created, but this is not visible through the Pipelines overview–presumably because it's "managed" by the target table (at least this is where you manage the data ingest process.This means that ...

Data Engineering

2381 Views
1 replies
2 kudos

07-22-2025 4:04:23 AM

View Replies

Latest Reply

Saritha_S
Databricks Employee

09-23-2025 8:18:18 AM

2 kudos

Hi @Malthe The recommended method is to manage and trigger the sync via table-level APIs or management interfaces, not the pipeline-level job triggers: For Unity Catalog synced tables (e.g., syncing to Postgres), triggering a sync or refresh is per...

2 kudos

09-23-2025 8:18:18 AM

Databricks Community

Forum Posts

Resolved! Databricks Business dashboards - Interactive cluster Total dollar spent

Resolved! Unexpected Schema ID Folder Creation in Unity Catalog External Location

What is the Power of DLT Pipeline to read streaming data

Resolved! How to stream data from azure event hub to databricks delta table

Updates of Materialized Views in Lakeflow Pipelines Produce MetadataChangedException the masses

Resolved! Databricks Jobs & Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file

Resolved! Any Advice on Dynamic Masking while maintaining performance?

Resolved! Assigning Databricks Account Admin role to User group

Which table should i use for a range join hint?

Resolved! Can I use Alation with Partner Connect ?

PGP decryption in python file

Available Connectors for ServiceNow to Databricks

json files i cannot do a df show in community edition i am absolutely stunned/

Resolved! append using foreach batch autoloader

Resolved! Trigger table sync from job

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template