Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi, I have a job where a job cluster is reused twice for task A and task C. Between A and C, task B runs for 4 hours on a different interactive cluster. The issue here is that the job cluster doesn't terminate as soon as Task A is completed and sits ...
Are there any databricks accelerators to convert the c# and qlikview code to pyspark ? We are using the Open source AI tools to convert now but wondering is there any better way to do the same?Thanks in advance
Hi @nkrish ,Unfortunately, I don't think so. Available accelerators you can find here:Databricks Solution Accelerators for Data & AI | DatabricksBut I haven't heard anything about accelerator for c# and qlikview specifically.
Hello Databricks Community,I am working on a Dash dashboard (Python/Flask backend) deployed on Databricks, and I need to play or stream MP4 video files stored in a Unity Catalog Volume. I have tried accessing these files both from a Databricks notebo...
@GergoBo - Since notebooks cannot reach out to the file system to stream, you must embed the video as a Base64 encoded string. I tried below code and it works well in Notebook as it plays the video in the output. import base64from IPython.display imp...
Hi @Malthe ,This might be because of New DBR (18.0) GA release yesterday(January 2026 - Azure Databricks | Microsoft Learn). you might need to use a custom spark version by the time engineering team fixes this issue in DBR. Below is the response from...
TLDR - UDFs work fine when imported from `utilities/` folder in DLT pipelines, but custom Python DataSource APIs fail with ModuleNotFoundError: No module named 'utilities'` during serialization. Only inline definitions work. Need reusable DataSource ...
I am noticing a difference between using autoloader in an interactive notebook vs using it in a Spark Declarative Pipeline (DLT Pipeline). This issue seems to be very similar to this other unanswered post from a few years ago. Bug report: the delimit...
I’m working with Databricks Unity Catalog and observing an inconsistent permission behavior for views.ScenarioA view exists that was created by another userI have sufficient privileges on the catalog/schema/view (SELECT, MODIFY, ALL PRIVILEGES)I can:...
Interesting, for UC, COMMENT ON COLUMN requires MODIFY on a table, and OWNER on a view. If multiple people need to maintain a view, the recommended pattern is to make a group the owner and grant that group the required access to source data. Why is i...
I am building a Gold table using Delta Live Tables (DLT). The Gold table contains aggregated data derived from a Silver table. Aggregation happens monthly. However, the requirement is Only the current (year, month) should be recalculated. Previous mo...
Hi @deepu1 ,
Assuming that @dlt.table refers to a Materialized View (MV), you are correct that this is the standard way to create aggregated tables in the Gold layer. A Materialized View is essentially a table that stores the results of a specific qu...
Hello ,We have implemented Data pipeline to ingest data from Oracle UCM using SOAP API, This was working fine with Job and all Purpose clusters. Recently we wanted to use Serverless to take advantage of the server startup time. In this case we were n...
Salutations,I'm using SDP for an ETL that extracts data from HANA and put it in the Unity Catalog. I defined a Policy with the needed driver:But I get this error:An error occurred while calling o1013.load. : java.lang.ClassNotFoundException: com.sap....
At this time, Databricks does not offer native connectors for SAP HANA. You can find the complete list of managed connectors currently available in Databricks here.
We generally recommend beginning with SAP’s own commercial tools, prioritizing SAP Bu...
How we can run a SDP pipeline in parallel manner with dynamic parameter parsing on pipeline level. How we can consume job level parameter in Pipeline. If similar name parameters are defined in pipeline level then job level parameters are getting over...
To run an SDP (Spark Declarative Pipeline) in parallel with dynamic parameters, you need to understand that SDP is "smart"—it builds a dependency graph and runs everything it can at the same time by default.Here is a simple breakdown of how to handle...
Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?%sqlcreate or replace function status_map(status int)returns stringreturn map(10, "STATU...
Scoped variables in a transform() are not accessible by UDFs. However, you can workaround this using explode():# equivalent of: select transform(arr, e -> status_map(e.v1)) from s1
select collect_list(status_map(status_id))
from explode((select trans...
For all it's positives, one of the first general issues we had with databricks was case sensitivity.We have a lot of data specific filters in our codeProblem is, we land and view data from lots of different case insensitive source systems e.g. SQL Se...
Hi @dpc ,I think you can try to use a collation for that purpose. A collation is a set of rules that determines how string comparisons are performed. Collations are used to compare strings in a case-insensitive, accent-insensitive, or trailing space ...
Hello guyz, Happy new year and best wishes for all of us. I am catching both Pyspark and Python exceptions but i want to write this logging error inside a delta table when i logging. Someone knows the best practise for this ? Thanks Cordially,
Hello Databricks Community,I’m facing multiple issues while working in Azure Databricks notebooks, and I’d appreciate guidance or troubleshooting suggestions.Issue 1: Failed to reconnectWhile running a notebook, I frequently see a “Failed to reconnec...