cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cz0
by Databricks Partner
  • 2578 Views
  • 2 replies
  • 1 kudos

Monitoring structured streaming and Log4J properties

Hi guys, I would like to monitor streaming job on metrics like delay, processing time and more. I found this documentation but I get message on starting and terminating phase and not while I process a record. The job is a pretty easy streaming which ...

  • 2578 Views
  • 2 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 1 kudos

Hi @cz0  The StreamingQueryListener in Spark is designed to give you metrics at the micro-batch level (not per individual record), which is typical for Spark Structured StreamingonQueryStarted: Called when the streaming job starts.onQueryProgress: Ca...

  • 1 kudos
1 More Replies
jeff2
by New Contributor
  • 1746 Views
  • 1 replies
  • 1 kudos

Resolved! When embedding redash, is it possible to make it visible without an account?

Same as the title.I created a redash in Databricks and want to embed it to show it in another portal. However, other users have accounts in the portal but not in Databricks.In this case, is it possible to show Redash to all these users?

  • 1746 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

It is possible to embed a Redash dashboard created in Databricks into another portal so that users without Databricks accounts can view it, but this requires specific setup and permission management. How Embedding Works To show the dashboard to users...

  • 1 kudos
TinaDouglass
by New Contributor
  • 1484 Views
  • 3 replies
  • 1 kudos

Resolved! Summarized Data from Source system into Bronze

Hello,We are just starting with Databricks. Quick question.  We have a table in our legacy source system that summarizes values that are used on legacy reports and used for payment in our legacy system.  The business wants a dashboard on our new plat...

  • 1484 Views
  • 3 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @TinaDouglass! Did the suggestions shared above help address your concern? If so, please consider marking one as the accepted solution.

  • 1 kudos
2 More Replies
juanjomendez96
by Contributor
  • 2539 Views
  • 16 replies
  • 4 kudos

Resolved! Control Databricks Platform version

Hello there! I have noticed that my Databricks UI has been changed from time to time, I was wondering how can I control the Databricks Platform version so I don't keep having new changes and new ways/names in my UI. I have found a release page https:...

  • 2539 Views
  • 16 replies
  • 4 kudos
Latest Reply
georgeb
New Contributor II
  • 4 kudos

Hi,  Can we get an official feedback when the issue with adding new users/groups in Databricks apps UI (not working) will be fixed? I tried with python SDK as well and it does not work.  The issue was posted previously and applies to my case as well....

  • 4 kudos
15 More Replies
y_sanjay
by New Contributor
  • 2800 Views
  • 2 replies
  • 2 kudos

Temporary view

Hi,I wrote a query to create temp view in my catalog, query execution was successful and returned the result as 'OK' in SQL editor window. However, when I executed the command 'Show Tables' and' Select * {temp_view}', it's not identifying the view. W...

  • 2800 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @y_sanjay ,I guess that is somehow related to how sessions are managed within SQL Editor. For instance when I ran following queries in SQL Editor all at once it worked and I've got 3 result sets:1) First result set with OK status - which means tha...

  • 2 kudos
1 More Replies
Data_NXT
by New Contributor III
  • 1072 Views
  • 3 replies
  • 5 kudos

Resolved! Databricks Business dashboards - Interactive cluster Total dollar spent

I'm working on Databricks Business Dashboards and trying to calculate interactive cluster compute time and total dollar spend per workspace.As per standard understanding, the total dollar spent = Interactive Clusters + Job Clusters + SQL Warehouses.I...

  • 1072 Views
  • 3 replies
  • 5 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 5 kudos

Also the system table will not provide you the  exact dollar amount that you spend in an interactive compute. Here is the cost breakdown for running interactive compute:ComponentDescriptionCost SourceDBU CostBased on workload type and tierDatabricksV...

  • 5 kudos
2 More Replies
Sunil_Poluri
by Databricks Partner
  • 1781 Views
  • 1 replies
  • 1 kudos

Resolved! Unexpected Schema ID Folder Creation in Unity Catalog External Location

I've set up Unity Catalog with an external location pointing to a storage account. For each schema, I’ve configured a dedicated container path. For example:abfss://schemas@<storage_account>.dfs.core.windows.net/_unityStorage/schemas/<schema_id>When I...

  • 1781 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @Sunil_Poluri , I did some research (learned a few things) and here is what I found.  Unity Catalog manages cloud storage mapping for schemas using internal IDs (schema_id) to ensure data isolation, governance, and uniqueness within a metastore—e...

  • 1 kudos
Anubhav2011
by New Contributor II
  • 1568 Views
  • 5 replies
  • 4 kudos

What is the Power of DLT Pipeline to read streaming data

I am getting thousands of records every second in my bronze table from Qlik and every second the bronze table is getting truncated and load with new data by Qlik itself. How do I process this much data every second to my silver streaming table before...

  • 1568 Views
  • 5 replies
  • 4 kudos
Latest Reply
Krishna_S
Databricks Employee
  • 4 kudos

The Apply Changes API is getting deprecated. The AUTO CDC APIs replace the APPLY CHANGES APIs, and have the same syntax. The APPLY CHANGES APIs are still available, but Databricks recommends using the AUTO CDC APIs in their place. Please refer to the...

  • 4 kudos
4 More Replies
gayatrikhatale
by Databricks Partner
  • 1481 Views
  • 3 replies
  • 5 kudos

Resolved! How to stream data from azure event hub to databricks delta table

Hi,I want to stream data from azure event hub to databricks table.But I want to use service principal details for that not event hub connection string.Can anyone please share the code snippet?Thank you!

  • 1481 Views
  • 3 replies
  • 5 kudos
Latest Reply
gayatrikhatale
Databricks Partner
  • 5 kudos

Thank you @szymon_dybczak . It's working for me.I have also found one more way to do same thing. Below is the code snippet:from azure.identity import DefaultAzureCredential from azure.eventhub import EventHubConsumerClient # Replace with your Eve...

  • 5 kudos
2 More Replies
StephanK8
by Databricks Partner
  • 2208 Views
  • 2 replies
  • 0 kudos

Updates of Materialized Views in Lakeflow Pipelines Produce MetadataChangedException the masses

Hi,We've set up materialized views (as dlt.table()) for something like 300 tables in a single Lakeflow pipeline. The pipeline is triggered externally by a workflow job (to run twice a day). Running the pipeline we experience something strange. A larg...

  • 2208 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Workarounds & Recommendations Limit Pipeline Parallelism: Modify the pipeline's configuration to reduce the maximum concurrency for DLT task execution, forcing more serialized or grouped updates. Restructure Pipeline Graph: Instead of 300+ separate...

  • 0 kudos
1 More Replies
LarsMewa
by New Contributor III
  • 1094 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks Jobs & Pipelines: Serverless SparkOutOfMemoryError while reading 500mb json file

I'm getting the following SparkOutOfMemoryError message while reading a 500mb json file, see below. I'm loading four csv files (around 150mb per file) and the json file in the same pipeline. When I load the json file alone it reads fine, same when I ...

  • 1094 Views
  • 4 replies
  • 1 kudos
Latest Reply
LarsMewa
New Contributor III
  • 1 kudos

This fixed it:As a quick workaround to address out-of-memory errors when processing large JSON files in Databricks serverless pipelines, we recommend disabling the Photon JSON Scan. The Photon engine is optimized for performance, but scanning large J...

  • 1 kudos
3 More Replies
tana_sakakimiya
by Contributor
  • 1059 Views
  • 1 replies
  • 1 kudos

Resolved! Any Advice on Dynamic Masking while maintaining performance?

I plan to mask columns with a specific tag like "sensitive", "PII" which represents that the column values are ought to be seen by privileged user groups because they contain credentials or personal identity data.To implement that i plan to create a ...

  • 1059 Views
  • 1 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 1 kudos

Hi @tana_sakakimiya Your approach—using Unity Catalog column tags (like "sensitive" or "PII") and applying masking policies based on those tags—is a recommended and scalable way to manage data access in Databricks, especially for compliance and priva...

  • 1 kudos
shweta_m
by New Contributor III
  • 643 Views
  • 3 replies
  • 2 kudos

Resolved! Assigning Databricks Account Admin role to User group

Hi,As per our company policy, individual users should not be given elevated privileges. Permissions should be assigned to user groups, so that group membership can be managed at the AD level.In that context, is there a way to assign the 'Databricks A...

  • 643 Views
  • 3 replies
  • 2 kudos
Latest Reply
shweta_m
New Contributor III
  • 2 kudos

Hi @szymon_dybczak I tried this and it worked.Thanks!

  • 2 kudos
2 More Replies
Travis84
by New Contributor II
  • 551 Views
  • 1 replies
  • 1 kudos

Which table should i use for a range join hint?

I am a bit confused about how to use range join hints. Consider the following query  ```SELECT  p.id,  p.ts,  p.value,  rg.metric1,  rg.metric2,  rg.ts AS range_tsFROM points pLEFT JOIN LATERAL (  SELECT r.metric1, r.metric2, r.ts  FROM ranges r  WHE...

  • 551 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Travis84 , Below are the answers to your questions: Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join querie...

  • 1 kudos
youssefmrini
by Databricks Employee
  • 6187 Views
  • 2 replies
  • 0 kudos
  • 6187 Views
  • 2 replies
  • 0 kudos
Latest Reply
idtaylor
New Contributor II
  • 0 kudos

No you can not. Demo video was from 2023, Alation no longer appears in Databricks Marketplace, and Alation no longer allows free Trials, instead having you fill out a request for a demo--not the best for proving out practical functionality.

  • 0 kudos
1 More Replies
Labels