Data Engineering

Forum Posts

Sorted by:

by gayatrikhatale • Contributor

09-24-2025 5:04:55 AM

358 Views
3 replies
5 kudos

Resolved! How to stream data from azure event hub to databricks delta table

Hi,I want to stream data from azure event hub to databricks table.But I want to use service principal details for that not event hub connection string.Can anyone please share the code snippet?Thank you!

Data Engineering

358 Views
3 replies
5 kudos

09-24-2025 5:04:55 AM

View Replies

Latest Reply

gayatrikhatale
Contributor

09-24-2025 7:57:34 AM

5 kudos

Thank you @szymon_dybczak . It's working for me.I have also found one more way to do same thing. Below is the code snippet:from azure.identity import DefaultAzureCredential from azure.eventhub import EventHubConsumerClient # Replace with your Eve...

5 kudos

09-24-2025 7:57:34 AM

2 More Replies

by StephanK8 • New Contributor

07-01-2025 12:48:53 AM

1747 Views
2 replies
0 kudos

Updates of Materialized Views in Lakeflow Pipelines Produce MetadataChangedException the masses

Hi,We've set up materialized views (as dlt.table()) for something like 300 tables in a single Lakeflow pipeline. The pipeline is triggered externally by a workflow job (to run twice a day). Running the pipeline we experience something strange. A larg...

Data Engineering

1747 Views
2 replies
0 kudos

07-01-2025 12:48:53 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

09-24-2025 5:45:46 AM

0 kudos

Workarounds & Recommendations Limit Pipeline Parallelism: Modify the pipeline's configuration to reduce the maximum concurrency for DLT task execution, forcing more serialized or grouped updates. Restructure Pipeline Graph: Instead of 300+ separate...

0 kudos

09-24-2025 5:45:46 AM

1 More Replies

by LarsMewa • New Contributor III

09-23-2025 5:38:59 AM

423 Views
4 replies
1 kudos

Resolved! Databricks Jobs & Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file

I'm getting the following SparkOutOfMemoryError message while reading a 500mb json file, see below. I'm loading four csv files (around 150mb per file) and the json file in the same pipeline. When I load the json file alone it reads fine, same when I ...

Data Engineering

423 Views
4 replies
1 kudos

09-23-2025 5:38:59 AM

View Replies

Latest Reply

LarsMewa
New Contributor III

09-24-2025 2:37:23 AM

1 kudos

This fixed it:As a quick workaround to address out-of-memory errors when processing large JSON files in Databricks serverless pipelines, we recommend disabling the Photon JSON Scan. The Photon engine is optimized for performance, but scanning large J...

1 kudos

09-24-2025 2:37:23 AM

3 More Replies

by tbailey • New Contributor II

06-16-2025 9:18:53 AM

2160 Views
2 replies
1 kudos

DABs, policies and cluster pools

My scenario,A policy called 'Job Pool', which has the following overrides:"instance_pool_id": { "type": "unlimited", "hidden": true }, "driver_instance_pool_id": { "type": "unlimited", "hidden": true }I have an asset bundle that sets a new cluster as...

Data Engineering

2160 Views
2 replies
1 kudos

06-16-2025 9:18:53 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-24-2025 2:30:14 AM

1 kudos

I have a similar issue. Also with the bid price.It seems that the databricks API/DAB does not take the correct values in case of mixed clusters (driver/workers).The funny part is that this only occurs when redeploying a dab, not the initial create.So...

1 kudos

09-24-2025 2:30:14 AM

1 More Replies

by tana_sakakimiya • Contributor

09-23-2025 5:31:24 PM

287 Views
1 replies
1 kudos

Resolved! Any Advice on Dynamic Masking while maintaining performance?

I plan to mask columns with a specific tag like "sensitive", "PII" which represents that the column values are ought to be seen by privileged user groups because they contain credentials or personal identity data.To implement that i plan to create a ...

Data Engineering

287 Views
1 replies
1 kudos

09-23-2025 5:31:24 PM

View Replies

Latest Reply

saurabh18cs
Honored Contributor II

09-24-2025 1:48:04 AM

1 kudos

Hi @tana_sakakimiya Your approach—using Unity Catalog column tags (like "sensitive" or "PII") and applying masking policies based on those tags—is a recommended and scalable way to manage data access in Databricks, especially for compliance and priva...

1 kudos

09-24-2025 1:48:04 AM

by shweta_m • New Contributor III

09-23-2025 2:05:08 AM

235 Views
3 replies
2 kudos

Resolved! Assigning Databricks Account Admin role to User group

Hi,As per our company policy, individual users should not be given elevated privileges. Permissions should be assigned to user groups, so that group membership can be managed at the AD level.In that context, is there a way to assign the 'Databricks A...

Data Engineering

235 Views
3 replies
2 kudos

09-23-2025 2:05:08 AM

View Replies

Latest Reply

shweta_m
New Contributor III

09-23-2025 10:58:53 PM

2 kudos

Hi @szymon_dybczak I tried this and it worked.Thanks!

2 kudos

09-23-2025 10:58:53 PM

2 More Replies

by Travis84 • New Contributor II

09-23-2025 10:14:09 PM

173 Views
1 replies
1 kudos

Which table should i use for a range join hint?

I am a bit confused about how to use range join hints. Consider the following query ```SELECT p.id, p.ts, p.value, rg.metric1, rg.metric2, rg.ts AS range_tsFROM points pLEFT JOIN LATERAL ( SELECT r.metric1, r.metric2, r.ts FROM ranges r WHE...

Data Engineering

173 Views
1 replies
1 kudos

09-23-2025 10:14:09 PM

View Replies

Latest Reply

K_Anudeep
Databricks Employee

09-23-2025 10:40:13 PM

1 kudos

Hello @Travis84 , Below are the answers to your questions: Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join querie...

1 kudos

09-23-2025 10:40:13 PM

by youssefmrini • Databricks Employee

05-17-2023 12:49:14 AM

5844 Views
2 replies
0 kudos

Resolved! Can I use Alation with Partner Connect ?

Data Engineering

5844 Views
2 replies
0 kudos

05-17-2023 12:49:14 AM

View Replies

Latest Reply

idtaylor
New Contributor II

09-23-2025 5:28:05 PM

0 kudos

No you can not. Demo video was from 2023, Alation no longer appears in Databricks Marketplace, and Alation no longer allows free Trials, instead having you fill out a request for a demo--not the best for proving out practical functionality.

0 kudos

09-23-2025 5:28:05 PM

1 More Replies

by TetianaDromova • New Contributor

07-09-2025 9:50:50 AM

1348 Views
1 replies
1 kudos

PGP decryption in python file

The same decryption code works in notebook, but fails in python file: import gnupgfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)gpg = gnupg.GPG()decryption_key = dbutils.secrets.get(secret_scope, secret_name)gpg.import_keys(decryption_ke...

Data Engineering

1348 Views
1 replies
1 kudos

07-09-2025 9:50:50 AM

View Replies

Latest Reply

mmayorga
Databricks Employee

09-23-2025 5:17:36 PM

1 kudos

Hi @TetianaDromova Thank you for reaching out and waiting for a response. Having your code working on a notebook is a significant first step, so you are on the right path, but then moving into a Python file, we must consider specific details: How is...

1 kudos

09-23-2025 5:17:36 PM

by Phani1 • Valued Contributor II

09-23-2025 1:55:23 AM

348 Views
3 replies
1 kudos

Available Connectors for ServiceNow to Databricks

Hi Team,What are the available connectors to bring data and metadata from ServiceNow to Databricks, and what are the best options/best practices for integrating ServiceNow with Databricks ?Regards,Phani

Data Engineering

348 Views
3 replies
1 kudos

09-23-2025 1:55:23 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-23-2025 2:03:42 AM

1 kudos

Hi @Phani1 ,There's official databricks service now connection. You can read about it at below link:Configure ServiceNow for Databricks ingestion - Azure Databricks | Microsoft LearnAnd here you have example of how to create ingestion pipeline using ...

1 kudos

09-23-2025 2:03:42 AM

2 More Replies

by khasim76 • New Contributor II

09-22-2025 2:18:56 PM

245 Views
3 replies
1 kudos

json files i cannot do a df show in community edition i am absolutely stunned/

Data Engineering

community edition

json

sql

245 Views
3 replies
1 kudos

09-22-2025 2:18:56 PM

View Replies

Latest Reply

Khaja_Zaffer
Contributor III

09-22-2025 2:53:57 PM

1 kudos

Hello @khasim76 Good day!!Can i know what is the error msg you are getting? is it legacy community edition or free community edition? Also there are some limitation like - Table results: ≤10,000 rows or 2 MB (whichever first). - Text outputs: Trunc...

1 kudos

09-22-2025 2:53:57 PM

2 More Replies

by seefoods • Valued Contributor

09-22-2025 8:30:27 AM

1267 Views
10 replies
4 kudos

Resolved! append using foreach batch autoloader

Hello Guys, when i append i have this error someone knows how to fix it? raise converted from None pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `s_test` because it already exists. Ch...

Data Engineering

1267 Views
10 replies
4 kudos

09-22-2025 8:30:27 AM

View Replies

Latest Reply

seefoods
Valued Contributor

09-23-2025 6:43:31 AM

4 kudos

Thanks a lot @szymon_dybczak

4 kudos

09-23-2025 6:43:31 AM

9 More Replies

by Malthe • Contributor II

07-22-2025 4:04:23 AM

1017 Views
1 replies
2 kudos

Resolved! Trigger table sync from job

When setting up a table sync using the UI, a pipeline is created, but this is not visible through the Pipelines overview–presumably because it's "managed" by the target table (at least this is where you manage the data ingest process.This means that ...

Data Engineering

1017 Views
1 replies
2 kudos

07-22-2025 4:04:23 AM

View Replies

Latest Reply

Saritha_S
Databricks Employee

09-23-2025 8:18:18 AM

2 kudos

Hi @Malthe The recommended method is to manage and trigger the sync via table-level APIs or management interfaces, not the pipeline-level job triggers: For Unity Catalog synced tables (e.g., syncing to Postgres), triggering a sync or refresh is per...

2 kudos

09-23-2025 8:18:18 AM

by Balram-snaplogi • New Contributor II

08-04-2025 11:13:20 PM

655 Views
1 replies
0 kudos

Resolved! Clarification on Default Socket Timeout Behavior Introduced in v2.6.35

Hi Team,I’m currently using the Databricks JDBC driver version 2.6.40, and I’ve noticed intermittent socket timeout errors during pipeline execution.I came across the release note for version 2.6.35 mentioning the following change:[SPARKJ-688] The co...

Data Engineering

655 Views
1 replies
0 kudos

08-04-2025 11:13:20 PM

View Replies

Latest Reply

Saritha_S
Databricks Employee

09-23-2025 8:07:37 AM

0 kudos

Hi @Balram-snaplogi The default socket timeout value introduced in Databricks JDBC driver version 2.6.35 is 300 seconds (5 minutes). Please refer to the below kb. https://kb.databricks.com/dbsql/job-timeout-when-connecting-to-a-sql-endpoint-over-jd...

0 kudos

09-23-2025 8:07:37 AM

by bhargavabasava • New Contributor III

09-19-2025 5:02:06 AM

278 Views
1 replies
0 kudos

Resolved! Establish connectivity from databricks serverless to Cloud SQL (GCP) Postgres database

Hi team,We have setup workspace using databricks managed network (databricks is setup on GCP). There is a cloud sql db in customer managed network. We want to establish connectivity from databricks serverless to database. Could someone please tell di...

Data Engineering

278 Views
1 replies
0 kudos

09-19-2025 5:02:06 AM

View Replies

Latest Reply

Saritha_S
Databricks Employee

09-23-2025 8:03:32 AM

0 kudos

Hi @bhargavabasava Can you please refer to the below doc for complete details. https://docs.databricks.com/gcp/en/security/network/classic/private-service-connect

0 kudos

09-23-2025 8:03:32 AM

Databricks Community

Forum Posts

Resolved! How to stream data from azure event hub to databricks delta table

Updates of Materialized Views in Lakeflow Pipelines Produce MetadataChangedException the masses

Resolved! Databricks Jobs & Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file

DABs, policies and cluster pools

Resolved! Any Advice on Dynamic Masking while maintaining performance?

Resolved! Assigning Databricks Account Admin role to User group

Which table should i use for a range join hint?

Resolved! Can I use Alation with Partner Connect ?

PGP decryption in python file

Available Connectors for ServiceNow to Databricks

json files i cannot do a df show in community edition i am absolutely stunned/

Resolved! append using foreach batch autoloader

Resolved! Trigger table sync from job

Resolved! Clarification on Default Socket Timeout Behavior Introduced in v2.6.35

Resolved! Establish connectivity from databricks serverless to Cloud SQL (GCP) Postgres database

Join Us as a Local Community Builder!

Unable to login to community edition

Learning Path for Spark Developer Associate

DLT Pipeline Stopped working

Migrating Talend ETL Jobs to Databricks – Best Pra...

Conversational Agent App integration with genie in...