cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

gayatrikhatale
by Contributor
  • 358 Views
  • 3 replies
  • 5 kudos

Resolved! How to stream data from azure event hub to databricks delta table

Hi,I want to stream data from azure event hub to databricks table.But I want to use service principal details for that not event hub connection string.Can anyone please share the code snippet?Thank you!

  • 358 Views
  • 3 replies
  • 5 kudos
Latest Reply
gayatrikhatale
Contributor
  • 5 kudos

Thank you @szymon_dybczak . It's working for me.I have also found one more way to do same thing. Below is the code snippet:from azure.identity import DefaultAzureCredential from azure.eventhub import EventHubConsumerClient # Replace with your Eve...

  • 5 kudos
2 More Replies
StephanK8
by New Contributor
  • 1747 Views
  • 2 replies
  • 0 kudos

Updates of Materialized Views in Lakeflow Pipelines Produce MetadataChangedException the masses

Hi,We've set up materialized views (as dlt.table()) for something like 300 tables in a single Lakeflow pipeline. The pipeline is triggered externally by a workflow job (to run twice a day). Running the pipeline we experience something strange. A larg...

  • 1747 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Workarounds & Recommendations Limit Pipeline Parallelism: Modify the pipeline's configuration to reduce the maximum concurrency for DLT task execution, forcing more serialized or grouped updates. Restructure Pipeline Graph: Instead of 300+ separate...

  • 0 kudos
1 More Replies
LarsMewa
by New Contributor III
  • 423 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks Jobs & Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file

I'm getting the following SparkOutOfMemoryError message while reading a 500mb json file, see below. I'm loading four csv files (around 150mb per file) and the json file in the same pipeline. When I load the json file alone it reads fine, same when I ...

  • 423 Views
  • 4 replies
  • 1 kudos
Latest Reply
LarsMewa
New Contributor III
  • 1 kudos

This fixed it:As a quick workaround to address out-of-memory errors when processing large JSON files in Databricks serverless pipelines, we recommend disabling the Photon JSON Scan. The Photon engine is optimized for performance, but scanning large J...

  • 1 kudos
3 More Replies
tbailey
by New Contributor II
  • 2160 Views
  • 2 replies
  • 1 kudos

DABs, policies and cluster pools

My scenario,A policy called 'Job Pool', which has the following overrides:"instance_pool_id": { "type": "unlimited", "hidden": true }, "driver_instance_pool_id": { "type": "unlimited", "hidden": true }I have an asset bundle that sets a new cluster as...

  • 2160 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I have a similar issue. Also with the bid price.It seems that the databricks API/DAB does not take the correct values in case of mixed clusters (driver/workers).The funny part is that this only occurs when redeploying a dab, not the initial create.So...

  • 1 kudos
1 More Replies
tana_sakakimiya
by Contributor
  • 287 Views
  • 1 replies
  • 1 kudos

Resolved! Any Advice on Dynamic Masking while maintaining performance?

I plan to mask columns with a specific tag like "sensitive", "PII" which represents that the column values are ought to be seen by privileged user groups because they contain credentials or personal identity data.To implement that i plan to create a ...

  • 287 Views
  • 1 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

Hi @tana_sakakimiya Your approach—using Unity Catalog column tags (like "sensitive" or "PII") and applying masking policies based on those tags—is a recommended and scalable way to manage data access in Databricks, especially for compliance and priva...

  • 1 kudos
shweta_m
by New Contributor III
  • 235 Views
  • 3 replies
  • 2 kudos

Resolved! Assigning Databricks Account Admin role to User group

Hi,As per our company policy, individual users should not be given elevated privileges. Permissions should be assigned to user groups, so that group membership can be managed at the AD level.In that context, is there a way to assign the 'Databricks A...

  • 235 Views
  • 3 replies
  • 2 kudos
Latest Reply
shweta_m
New Contributor III
  • 2 kudos

Hi @szymon_dybczak I tried this and it worked.Thanks!

  • 2 kudos
2 More Replies
Travis84
by New Contributor II
  • 173 Views
  • 1 replies
  • 1 kudos

Which table should i use for a range join hint?

I am a bit confused about how to use range join hints. Consider the following query  ```SELECT  p.id,  p.ts,  p.value,  rg.metric1,  rg.metric2,  rg.ts AS range_tsFROM points pLEFT JOIN LATERAL (  SELECT r.metric1, r.metric2, r.ts  FROM ranges r  WHE...

  • 173 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Travis84 , Below are the answers to your questions: Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join querie...

  • 1 kudos
youssefmrini
by Databricks Employee
  • 5844 Views
  • 2 replies
  • 0 kudos
  • 5844 Views
  • 2 replies
  • 0 kudos
Latest Reply
idtaylor
New Contributor II
  • 0 kudos

No you can not. Demo video was from 2023, Alation no longer appears in Databricks Marketplace, and Alation no longer allows free Trials, instead having you fill out a request for a demo--not the best for proving out practical functionality.

  • 0 kudos
1 More Replies
TetianaDromova
by New Contributor
  • 1348 Views
  • 1 replies
  • 1 kudos

PGP decryption in python file

The same decryption code works in notebook, but fails in python file: import gnupgfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)gpg = gnupg.GPG()decryption_key = dbutils.secrets.get(secret_scope, secret_name)gpg.import_keys(decryption_ke...

  • 1348 Views
  • 1 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

Hi @TetianaDromova  Thank you for reaching out and waiting for a response. Having your code working on a notebook is a significant first step, so you are on the right path, but then moving into a Python file, we must consider specific details: How is...

  • 1 kudos
Phani1
by Valued Contributor II
  • 348 Views
  • 3 replies
  • 1 kudos

Available Connectors for ServiceNow to Databricks

 Hi Team,What are the available connectors to bring data and metadata from ServiceNow to Databricks, and what are the best options/best practices for integrating ServiceNow with Databricks ?Regards,Phani

  • 348 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Phani1 ,There's official databricks service now connection. You can read about it at below link:Configure ServiceNow for Databricks ingestion - Azure Databricks | Microsoft LearnAnd here you have example of how to create ingestion pipeline using ...

  • 1 kudos
2 More Replies
khasim76
by New Contributor II
  • 245 Views
  • 3 replies
  • 1 kudos
Data Engineering
community edition
json
sql
  • 245 Views
  • 3 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 1 kudos

Hello @khasim76 Good day!!Can i know what is the error msg you are getting?  is it  legacy community edition or free community edition? Also there are some limitation like - Table results: ≤10,000 rows or 2 MB (whichever first). - Text outputs: Trunc...

  • 1 kudos
2 More Replies
seefoods
by Valued Contributor
  • 1267 Views
  • 10 replies
  • 4 kudos

Resolved! append using foreach batch autoloader

Hello Guys, when i append i have this error someone knows how to fix it? raise converted from None pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `s_test` because it already exists. Ch...

  • 1267 Views
  • 10 replies
  • 4 kudos
Latest Reply
seefoods
Valued Contributor
  • 4 kudos

Thanks a lot @szymon_dybczak 

  • 4 kudos
9 More Replies
Malthe
by Contributor II
  • 1017 Views
  • 1 replies
  • 2 kudos

Resolved! Trigger table sync from job

When setting up a table sync using the UI, a pipeline is created, but this is not visible through the Pipelines overview–presumably because it's "managed" by the target table (at least this is where you manage the data ingest process.This means that ...

  • 1017 Views
  • 1 replies
  • 2 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 2 kudos

Hi @Malthe  The recommended method is to manage and trigger the sync via table-level APIs or management interfaces, not the pipeline-level job triggers: For Unity Catalog synced tables (e.g., syncing to Postgres), triggering a sync or refresh is per...

  • 2 kudos
Balram-snaplogi
by New Contributor II
  • 655 Views
  • 1 replies
  • 0 kudos

Resolved! Clarification on Default Socket Timeout Behavior Introduced in v2.6.35

Hi Team,I’m currently using the Databricks JDBC driver version 2.6.40, and I’ve noticed intermittent socket timeout errors during pipeline execution.I came across the release note for version 2.6.35 mentioning the following change:[SPARKJ-688] The co...

  • 655 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @Balram-snaplogi  The default socket timeout value introduced in Databricks JDBC driver version 2.6.35 is 300 seconds (5 minutes). Please refer to the below kb.  https://kb.databricks.com/dbsql/job-timeout-when-connecting-to-a-sql-endpoint-over-jd...

  • 0 kudos
bhargavabasava
by New Contributor III
  • 278 Views
  • 1 replies
  • 0 kudos

Resolved! Establish connectivity from databricks serverless to Cloud SQL (GCP) Postgres database

Hi team,We have setup workspace using databricks managed network (databricks is setup on GCP). There is a cloud sql db in customer managed network. We want to establish connectivity from databricks serverless to database. Could someone please tell di...

  • 278 Views
  • 1 replies
  • 0 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 0 kudos

Hi @bhargavabasava  Can you please refer to the below doc for complete details.  https://docs.databricks.com/gcp/en/security/network/classic/private-service-connect

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels