cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Data_NXT
by New Contributor III
  • 1024 Views
  • 3 replies
  • 5 kudos

Resolved! Databricks Business dashboards - Interactive cluster Total dollar spent

I'm working on Databricks Business Dashboards and trying to calculate interactive cluster compute time and total dollar spend per workspace.As per standard understanding, the total dollar spent = Interactive Clusters + Job Clusters + SQL Warehouses.I...

  • 1024 Views
  • 3 replies
  • 5 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 5 kudos

Also the system table will not provide you the  exact dollar amount that you spend in an interactive compute. Here is the cost breakdown for running interactive compute:ComponentDescriptionCost SourceDBU CostBased on workload type and tierDatabricksV...

  • 5 kudos
2 More Replies
Sunil_Poluri
by Databricks Partner
  • 1710 Views
  • 1 replies
  • 1 kudos

Resolved! Unexpected Schema ID Folder Creation in Unity Catalog External Location

I've set up Unity Catalog with an external location pointing to a storage account. For each schema, I’ve configured a dedicated container path. For example:abfss://schemas@<storage_account>.dfs.core.windows.net/_unityStorage/schemas/<schema_id>When I...

  • 1710 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @Sunil_Poluri , I did some research (learned a few things) and here is what I found.  Unity Catalog manages cloud storage mapping for schemas using internal IDs (schema_id) to ensure data isolation, governance, and uniqueness within a metastore—e...

  • 1 kudos
Anubhav2011
by New Contributor II
  • 1428 Views
  • 5 replies
  • 4 kudos

What is the Power of DLT Pipeline to read streaming data

I am getting thousands of records every second in my bronze table from Qlik and every second the bronze table is getting truncated and load with new data by Qlik itself. How do I process this much data every second to my silver streaming table before...

  • 1428 Views
  • 5 replies
  • 4 kudos
Latest Reply
Krishna_S
Databricks Employee
  • 4 kudos

The Apply Changes API is getting deprecated. The AUTO CDC APIs replace the APPLY CHANGES APIs, and have the same syntax. The APPLY CHANGES APIs are still available, but Databricks recommends using the AUTO CDC APIs in their place. Please refer to the...

  • 4 kudos
4 More Replies
gayatrikhatale
by Databricks Partner
  • 1394 Views
  • 3 replies
  • 5 kudos

Resolved! How to stream data from azure event hub to databricks delta table

Hi,I want to stream data from azure event hub to databricks table.But I want to use service principal details for that not event hub connection string.Can anyone please share the code snippet?Thank you!

  • 1394 Views
  • 3 replies
  • 5 kudos
Latest Reply
gayatrikhatale
Databricks Partner
  • 5 kudos

Thank you @szymon_dybczak . It's working for me.I have also found one more way to do same thing. Below is the code snippet:from azure.identity import DefaultAzureCredential from azure.eventhub import EventHubConsumerClient # Replace with your Eve...

  • 5 kudos
2 More Replies
StephanK8
by Databricks Partner
  • 2128 Views
  • 2 replies
  • 0 kudos

Updates of Materialized Views in Lakeflow Pipelines Produce MetadataChangedException the masses

Hi,We've set up materialized views (as dlt.table()) for something like 300 tables in a single Lakeflow pipeline. The pipeline is triggered externally by a workflow job (to run twice a day). Running the pipeline we experience something strange. A larg...

  • 2128 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Workarounds & Recommendations Limit Pipeline Parallelism: Modify the pipeline's configuration to reduce the maximum concurrency for DLT task execution, forcing more serialized or grouped updates. Restructure Pipeline Graph: Instead of 300+ separate...

  • 0 kudos
1 More Replies
LarsMewa
by New Contributor III
  • 1028 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks Jobs & Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file

I'm getting the following SparkOutOfMemoryError message while reading a 500mb json file, see below. I'm loading four csv files (around 150mb per file) and the json file in the same pipeline. When I load the json file alone it reads fine, same when I ...

  • 1028 Views
  • 4 replies
  • 1 kudos
Latest Reply
LarsMewa
New Contributor III
  • 1 kudos

This fixed it:As a quick workaround to address out-of-memory errors when processing large JSON files in Databricks serverless pipelines, we recommend disabling the Photon JSON Scan. The Photon engine is optimized for performance, but scanning large J...

  • 1 kudos
3 More Replies
tana_sakakimiya
by Contributor
  • 970 Views
  • 1 replies
  • 1 kudos

Resolved! Any Advice on Dynamic Masking while maintaining performance?

I plan to mask columns with a specific tag like "sensitive", "PII" which represents that the column values are ought to be seen by privileged user groups because they contain credentials or personal identity data.To implement that i plan to create a ...

  • 970 Views
  • 1 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 1 kudos

Hi @tana_sakakimiya Your approach—using Unity Catalog column tags (like "sensitive" or "PII") and applying masking policies based on those tags—is a recommended and scalable way to manage data access in Databricks, especially for compliance and priva...

  • 1 kudos
shweta_m
by New Contributor III
  • 577 Views
  • 3 replies
  • 2 kudos

Resolved! Assigning Databricks Account Admin role to User group

Hi,As per our company policy, individual users should not be given elevated privileges. Permissions should be assigned to user groups, so that group membership can be managed at the AD level.In that context, is there a way to assign the 'Databricks A...

  • 577 Views
  • 3 replies
  • 2 kudos
Latest Reply
shweta_m
New Contributor III
  • 2 kudos

Hi @szymon_dybczak I tried this and it worked.Thanks!

  • 2 kudos
2 More Replies
Travis84
by New Contributor II
  • 516 Views
  • 1 replies
  • 1 kudos

Which table should i use for a range join hint?

I am a bit confused about how to use range join hints. Consider the following query  ```SELECT  p.id,  p.ts,  p.value,  rg.metric1,  rg.metric2,  rg.ts AS range_tsFROM points pLEFT JOIN LATERAL (  SELECT r.metric1, r.metric2, r.ts  FROM ranges r  WHE...

  • 516 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Travis84 , Below are the answers to your questions: Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join querie...

  • 1 kudos
youssefmrini
by Databricks Employee
  • 6157 Views
  • 2 replies
  • 0 kudos
  • 6157 Views
  • 2 replies
  • 0 kudos
Latest Reply
idtaylor
New Contributor II
  • 0 kudos

No you can not. Demo video was from 2023, Alation no longer appears in Databricks Marketplace, and Alation no longer allows free Trials, instead having you fill out a request for a demo--not the best for proving out practical functionality.

  • 0 kudos
1 More Replies
TetianaDromova
by Databricks Partner
  • 1829 Views
  • 1 replies
  • 1 kudos

PGP decryption in python file

The same decryption code works in notebook, but fails in python file: import gnupgfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)gpg = gnupg.GPG()decryption_key = dbutils.secrets.get(secret_scope, secret_name)gpg.import_keys(decryption_ke...

  • 1829 Views
  • 1 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

Hi @TetianaDromova  Thank you for reaching out and waiting for a response. Having your code working on a notebook is a significant first step, so you are on the right path, but then moving into a Python file, we must consider specific details: How is...

  • 1 kudos
Phani1
by Databricks MVP
  • 2939 Views
  • 3 replies
  • 1 kudos

Available Connectors for ServiceNow to Databricks

 Hi Team,What are the available connectors to bring data and metadata from ServiceNow to Databricks, and what are the best options/best practices for integrating ServiceNow with Databricks ?Regards,Phani

  • 2939 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Phani1 ,There's official databricks service now connection. You can read about it at below link:Configure ServiceNow for Databricks ingestion - Azure Databricks | Microsoft LearnAnd here you have example of how to create ingestion pipeline using ...

  • 1 kudos
2 More Replies
khasim76
by New Contributor II
  • 529 Views
  • 3 replies
  • 1 kudos
Data Engineering
community edition
json
sql
  • 529 Views
  • 3 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 1 kudos

Hello @khasim76 Good day!!Can i know what is the error msg you are getting?  is it  legacy community edition or free community edition? Also there are some limitation like - Table results: ≤10,000 rows or 2 MB (whichever first). - Text outputs: Trunc...

  • 1 kudos
2 More Replies
seefoods
by Valued Contributor
  • 3072 Views
  • 10 replies
  • 4 kudos

Resolved! append using foreach batch autoloader

Hello Guys, when i append i have this error someone knows how to fix it? raise converted from None pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `s_test` because it already exists. Ch...

  • 3072 Views
  • 10 replies
  • 4 kudos
Latest Reply
seefoods
Valued Contributor
  • 4 kudos

Thanks a lot @szymon_dybczak 

  • 4 kudos
9 More Replies
Malthe
by Valued Contributor II
  • 2381 Views
  • 1 replies
  • 2 kudos

Resolved! Trigger table sync from job

When setting up a table sync using the UI, a pipeline is created, but this is not visible through the Pipelines overview–presumably because it's "managed" by the target table (at least this is where you manage the data ingest process.This means that ...

  • 2381 Views
  • 1 replies
  • 2 kudos
Latest Reply
Saritha_S
Databricks Employee
  • 2 kudos

Hi @Malthe  The recommended method is to manage and trigger the sync via table-level APIs or management interfaces, not the pipeline-level job triggers: For Unity Catalog synced tables (e.g., syncing to Postgres), triggering a sync or refresh is per...

  • 2 kudos
Labels