cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

smoortema
by Contributor
  • 483 Views
  • 2 replies
  • 2 kudos

Resolved! check statistics of clustering columns per file to see how liquid clustering works

I have a Delta table on which I set up liquid clustering using three columns. I would like to check file statistics to see how the clustering column values are distributed along the files. How can I write a query that shows min and max values, etc. o...

  • 483 Views
  • 2 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @smoortema, There are several approaches for inspecting per-file column statistics on a liquid-clustered Delta table. Here is a walkthrough from simplest to most detailed. APPROACH 1: CONFIRM CLUSTERING CONFIGURATION First, verify that clustering ...

  • 2 kudos
1 More Replies
MRTN
by Contributor
  • 408 Views
  • 3 replies
  • 2 kudos

Resolved! Reading data from Serverless Warehouse from Azure Functions in Python - using managed identities

We are trying to run a simple service on an Azure Function app, where we need to query some data from a Databricks Warehouse. We want to avoid managing secrets, and hence try to use Microsoft Entra authentication all the way. Using various available ...

  • 408 Views
  • 3 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @MRTN, The 403 Forbidden error you are seeing when using DefaultAzureCredential from the Azure Function (while it works locally with AzureCliCredential) comes down to a key distinction in how the token is being used and who the token represents. U...

  • 2 kudos
2 More Replies
Seunghyun
by Contributor
  • 305 Views
  • 2 replies
  • 0 kudos

Resolved! Freezing the Page Filter Section During Scroll in Databricks

I have a question regarding Databricks Dashboards.I would like to fix the filter area within the dashboard so that it remains visible at all times, even when scrolling up or down. Is there a way to make the Page Filter area 'sticky' or frozen, exclud...

  • 305 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Seunghyun, AI/BI Dashboards do not currently have a built-in option to make page-level filters sticky or frozen during scroll. However, there are a couple of approaches you can use to keep filters accessible for your viewers. OPTION 1: USE GLOBAL...

  • 0 kudos
1 More Replies
MRTN
by Contributor
  • 659 Views
  • 6 replies
  • 6 kudos

Error reading an external table - when using a serverless compute.

I am trying to read an external table in Databricks, created and maintained by using the delta-rs Python module. This usually works just fine, but after a recent checkpoint generation, I get the error below. However, the error only appers when readin...

  • 659 Views
  • 6 replies
  • 6 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 6 kudos

Hi @MRTN, Errors reading external tables from serverless compute are almost always caused by how serverless handles cloud storage access compared to classic compute. Serverless compute runs in a Databricks-managed compute plane, so it cannot use DBFS...

  • 6 kudos
5 More Replies
mjtd
by New Contributor III
  • 437 Views
  • 4 replies
  • 2 kudos

Resolved! Spark suddenly can't seem read .compacted.json transaction log files.

I can't read a table, and the error message is this: Unable to reconstruct state at version 1023 as the transaction log has been truncated due to manual deletion or the log retention policy.The _delta_log folder contains these files:00000000000000001...

  • 437 Views
  • 4 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @mjtd, The error "Unable to reconstruct state at version 1023 as the transaction log has been truncated due to manual deletion or the log retention policy" is telling you that Delta cannot find a continuous chain of log entries from the most recen...

  • 2 kudos
3 More Replies
surajitDE
by Contributor
  • 400 Views
  • 2 replies
  • 2 kudos

Resolved! How can we disable incremental refresh for a Materialized View when using Databricks DLT

How can we disable incremental refresh for a Materialized View when using Databricks Delta Live Tables (DLT)?I am using serverless compute,here is the code@Dlt.table(    name="orders_destination_table_testing_16")def orders_final():       return (   ...

  • 400 Views
  • 2 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @surajitDE, First, a quick naming note: Delta Live Tables (DLT) has been renamed to Lakeflow Spark Declarative Pipelines (SDP). The functionality is the same, just a new name. MATERIALIZED VIEW REFRESH BEHAVIOR For materialized views in SDP, the p...

  • 2 kudos
1 More Replies
cmantilla
by New Contributor II
  • 308 Views
  • 2 replies
  • 1 kudos

Resolved! How can I (and my org) subscribe to any breaking changes?

My team would like to learn about databricks releases, specifically any breaking changes that are made. What's the best way to subscribe and learn about these changes?

  • 308 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @cmantilla, There are several channels you can use to stay on top of breaking changes and platform updates. Here is a rundown of each one. RSS FEED FOR DOCUMENTATION RELEASE NOTES The Databricks docs site publishes an RSS feed that covers product ...

  • 1 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 827 Views
  • 4 replies
  • 1 kudos

Resolved! pytest unit testing help

Hi there, I am hoping someone can help me understand why I'm having issues with a simple pytest unit test... test?When attempting to run a `run_tests_notebook` in our all-purpose compute cluster (Runtime 14.3) with the following:import pytest import ...

  • 827 Views
  • 4 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @ChristianRRL, Your follow-up solution is spot on. To summarize and add some context for others who land here: THE ROOT CAUSE The ModuleNotFoundError you initially saw happens because the Workspace filesystem (/Workspace/...) does not fully suppor...

  • 1 kudos
3 More Replies
hobrob_ex
by New Contributor III
  • 552 Views
  • 4 replies
  • 2 kudos

Resolved! Calling stored procs using identifier function

hi folksI'm hitting an error when trying to call a stored procedure using the identifier function, potentially looks like it could be a bug.Calling the proc with a normal reference as follows works just fine.`call my_catalog.my_schema.my_proc('2026-0...

  • 552 Views
  • 4 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @hobrob_ex, The IDENTIFIER() clause does not currently support the CALL statement for stored procedures. The IDENTIFIER clause documentation lists its supported contexts: DDL operations (CREATE, ALTER, DROP, UNDROP), DML operations (MERGE, UPDATE,...

  • 2 kudos
3 More Replies
Aloknath_Ganage
by Databricks Partner
  • 446 Views
  • 2 replies
  • 0 kudos

Lakebridge Analyzer stopped working.

Hi There,I was using the lakebridge Analyzer and tranpiler for the last 2 months and it was working fine and was providing the expected output. But from the last 2-3 days when I'm running the Analyzer command for any of the dialect it is generating a...

  • 446 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Aloknath_Ganage, The symptoms you describe (the .tmp file contains all the data but the .xlsx is empty with only placeholder sheet tabs) point to the Excel report merge/generation step failing after the core analysis completes successfully. Here ...

  • 0 kudos
1 More Replies
AgusBudianto
by Contributor
  • 634 Views
  • 5 replies
  • 1 kudos

Resolved! Why count Run Status not Showing

Hi EveryoneMay I ask for monitoring jobs Databrick, why count Run Status not Showing Thanks, YouAnto

  • 634 Views
  • 5 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @AgusBudianto, The run status counts (Active, Completed, Failed, etc.) that appear on the right side of the Workflows page are only visible when you are on the "Jobs" tab. If you are on a different tab such as "Job Runs" or "Delta Live Tables," th...

  • 1 kudos
4 More Replies
holunder42
by New Contributor II
  • 576 Views
  • 4 replies
  • 2 kudos

Resolved! Using built-in display method modules

The builtin `display` function is very helpful. but we're moving code from notebooks into python modules.Here, it seems that `display` is defined differently which results in poor visualization.Example:```df = spark.createDataFrame([{'x': 1}])display...

  • 576 Views
  • 4 replies
  • 2 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 2 kudos

Hi @holunder42, The behavior you are seeing is expected. The display() function is not a standard Python built-in. It is injected into the notebook's global namespace by the Databricks runtime when a notebook cell executes. When you move code into an...

  • 2 kudos
3 More Replies
Danish11052000
by Contributor
  • 795 Views
  • 5 replies
  • 1 kudos

Resolved! How should I correctly extract the full table name from request_params in audit logs?

’m trying to build a UC usage/refresh tracking table for every workspace. For each workspace, I want to know how many times a UC table was refreshed or accessed each month. To do this, I’m reading the Databricks audit logs and I need to extract only ...

  • 795 Views
  • 5 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Danish11052000, You are on the right track with the COALESCE approach. The reason for the inconsistency is that different Unity Catalog action types populate different keys in request_params. Here is a breakdown of the key fields and which action...

  • 1 kudos
4 More Replies
Malthe
by Valued Contributor II
  • 643 Views
  • 7 replies
  • 1 kudos

Resolved! Genie generates MEASURE expression with "filter" clause

Genie generated a query against a metric view that introduces a "filter" clause as a second parameter to MEASURE:SELECT `countryName`, MEASURE(`deviceCount`, `isActive` = true) AS `online`, MEASURE(`deviceCount`, `isActive` = false) AS `offline...

  • 643 Views
  • 7 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Malthe, The behavior you are seeing is indeed the LLM generating invalid SQL syntax. The MEASURE() function takes exactly one argument, which is a reference to a measure column defined in a metric view. There is no second "filter" parameter, and ...

  • 1 kudos
6 More Replies
peterlewis
by New Contributor II
  • 328 Views
  • 2 replies
  • 0 kudos

LaTeX Markdown

It looks like in-line LaTeX is not supported in Markdown cells. Is that accurate? 

peterlewis_0-1772564790286.png
  • 328 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @peterlewis, Databricks notebooks support LaTeX mathematical notation inside %md (markdown) cells. The rendering engine uses MathJax, so standard LaTeX math syntax works. Here is a rundown of how to use it and some common patterns. INLINE MATH Wra...

  • 0 kudos
1 More Replies
Labels