Data Engineering

Forum Posts

Sorted by:

by surajitDE • New Contributor III

4 hours ago

22 Views
2 replies
0 kudos

Question on assigning email_notification_group to DLT Job Notifications?

Hi Folks,I wanted to check if there’s a way to assign an email notification group to a Delta Live Tables (DLT) job for notifications.I know that it’s possible to configure Teams workflows and email notification groups for Databricks jobs, but in the ...

Data Engineering

22 Views
2 replies
0 kudos

4 hours ago

View Replies

Latest Reply

SP_6721
Honored Contributor

23m ago

0 kudos

Hi @surajitDE ,At the moment, DLT doesn’t support linking existing email notification groups or Teams workflows directly. You can only add individual email addresses in the DLT UI.If you have a group email alias, you can use it as a single address so...

0 kudos

23m ago

1 More Replies

by sgreenuk • Visitor

3 hours ago

23 Views
1 replies
0 kudos

Orphaned __dlt_materialization schemas left behind after dropping materialized views

Hi everyone,I’m seeing several internal schemas under the __databricks_internal catalog that were auto-created when I built a few materialized views in Databricks SQL. However, after dropping the materialized views, the schemas were not automatically...

Data Engineering

23 Views
1 replies
0 kudos

3 hours ago

View Replies

Latest Reply

nayan_wylde
Honored Contributor III

3 hours ago

0 kudos

Yes, this is expected behavior in Databricks. The __databricks_internal catalog contains system-owned schemas that support features like materialized views and Delta Live Tables (DLT). When you create materialized views, Databricks generates internal...

0 kudos

3 hours ago

by databricksero • Visitor

4 hours ago

23 Views
1 replies
0 kudos

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

Hi everyone,I’m running into an issue with a Delta Live Tables (DLT) pipeline that processes a few transformation layers (raw → intermediate → primary → feature).When I trigger the entire pipeline, it fails with the following error:can not infer sche...

Data Engineering

23 Views
1 replies
0 kudos

4 hours ago

View Replies

Latest Reply

ManojkMohan
Honored Contributor

4 hours ago

0 kudos

@databricksero The error occurs right at this line:pythondf_spark = spark.createDataFrame(df_cleaned)This issue arises because, during the end-to-end execution of the pipeline, df_cleaned might end up being an empty pandas DataFrame. This can happen...

0 kudos

4 hours ago

by pranaav93 • New Contributor II

yesterday

38 Views
1 replies
1 kudos

Databricks Compute Metrics Alerts

Hi All,Im looking for some implementation ideas where i can use information from the system.compute.node_timeline table to catch memory spikes and if above a given threshold restart the cluster through an API call. Have any of you implemented a simil...

Data Engineering

38 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

NandiniN
Databricks Employee

4 hours ago

1 kudos

Hey @pranaav93 A very common use case for using system table system.compute.node_timeline to build alerting and remediation. Check this KB https://kb.databricks.com/en_US/clusters/getting-node-specific-instead-of-cluster-wide-memory-usage-data-from-...

1 kudos

4 hours ago

by Mous92i • Visitor

4 hours ago

20 Views
1 replies
0 kudos

Liquid Clustering With Merge

Hello I’m facing severe performance issues with a merge into databricksmerge_condition = """ source.data_hierarchy = target.data_hierarchy AND source.sensor_id = target.sensor_id AND source.timestamp = target.timestamp """The target Delt...

Data Engineering

20 Views
1 replies
0 kudos

4 hours ago

View Replies

Latest Reply

ManojkMohan
Honored Contributor

4 hours ago

0 kudos

@Mous92i Root CauseObserving the log:"MERGE operation, scanning files for matches … 32 min | 3113/3113 files scanned (~72.2 GiB)" shows that every data file in the target is scanned during the merge. This leads to high input/output and long execution...

0 kudos

4 hours ago

by georgemichael40 • New Contributor II

4 hours ago

16 Views
0 replies
0 kudos

Python Wheel in Serverless Job in DAB

Hey,I am trying to run a job with serverless compute, that runs python scripts.I need the paramiko package to get my scripts to work. I managed to get it working by doing:environments:- environment_key: default# Full documentation of this spec can be...

Data Engineering

16 Views
0 replies
0 kudos

4 hours ago

by chanukya-pekala • Contributor II

6 hours ago

39 Views
2 replies
2 kudos

Lost access to Databricks account console on Free Edition

Hi everyone,I'm having trouble accessing the Databricks account console and need some guidance.Background:I successfully set up Databricks Free Edition with Terraform using my personal accountI was able to access accounts.cloud.databricks.com to obta...

Data Engineering

39 Views
2 replies
2 kudos

6 hours ago

View Replies

Latest Reply

chanukya-pekala
Contributor II

5 hours ago

2 kudos

Yah, I have seen the docs. I wonder how I could get the s3 url, I do have an account with my email before Free edition, may be community edition. When i opened accounts console, it did show up, was thinking it could be the free edition only! Thanks a...

2 kudos

5 hours ago

1 More Replies

by vpacik • New Contributor

04-15-2024 2:27:04 AM

2170 Views
1 replies
0 kudos

Databricks-connect OpenSSL Handshake failed on WSL2

When trying to setup databricks-connect on WSL2 using 13.3 cluster, I receive the following error regarding OpenSSL CERTIFICATE_ERIFY_FAILED.The authentication is done via SPARK_REMOTE env. variable. E0415 11:24:26.646129568 142172 ssl_transport_sec...

Data Engineering

2170 Views
1 replies
0 kudos

04-15-2024 2:27:04 AM

View Replies

Latest Reply

ez
New Contributor

5 hours ago

0 kudos

@vpacik Was it solved? I have the same issue

0 kudos

5 hours ago

by donlxz • New Contributor II

7 hours ago

20 Views
1 replies
0 kudos

deadlock occurs with use statement

When issuing a query from Informatica using a Delta connection, the statement use catalog_name.schema_name is executed first. At that time, the following error appeared in the query history:Query could not be scheduled: (conn=5073499)Deadlock found w...

Data Engineering

20 Views
1 replies
0 kudos

7 hours ago

View Replies

Latest Reply

ManojkMohan
Honored Contributor

6 hours ago

0 kudos

@donlxz Informatica documentation and community discussions mention deadlock retry strategies, primarily for DML operations. However, metadata locks for catalog or schema operations can also lead to deadlocks.https://docs.informatica.com/data-integra...

0 kudos

6 hours ago

by Hritik_Moon • New Contributor II

a week ago

266 Views
7 replies
3 kudos

Resolved! create delta table in free edition

table_name = f"project.bronze.{file_name}"spark.sql( f""" CREATE TABLE IF NOT EXISTS {table_name} USING DELTA """) what am I getting wrong?

Data Engineering

266 Views
7 replies
3 kudos

a week ago

View Replies

Latest Reply

Hritik_Moon
New Contributor II

a week ago

3 kudos

yes, multiline solved it. .Is there any better approach to this scenario?

3 kudos

a week ago

6 More Replies

by Jonathan_ • New Contributor II

yesterday

73 Views
2 replies
3 kudos

Slow PySpark operations after long DAG that contains many joins and transformations

We are using PySpark and notice that when we are doing many transformations/aggregations/joins of the data then at some point the execution time of simple task (count, display, union of 2 tables, ...) become very slow even if we have a small data (ex...

Data Engineering

73 Views
2 replies
3 kudos

yesterday

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

8 hours ago

3 kudos

@Jonathan_ I think @Khaja_Zaffer has raised some great points. With Spark, broadcast joins will ensure the smaller table is in memory across all of the worker nodes. This should certainly help with speed. Shuffling is certainly always going to take s...

3 kudos

8 hours ago

1 More Replies

by B_Stam • New Contributor II

10 hours ago

47 Views
1 replies
1 kudos

Resolved! Set default tblproperties for pipeline

I like to set tblproperties ("delta.feature.timestampNtz" = "supported") for all tables in a pipeline. instead of set this option for every table definition. The property must be set direct on creation. I have tried it in the pipeline settings - conf...

Data Engineering

47 Views
1 replies
1 kudos

10 hours ago

View Replies

Latest Reply

ManojkMohan
Honored Contributor

9 hours ago

1 kudos

Databricks does not allow you to set a global default for all TBLPROPERTIES. However, you can use the spark.databricks.delta.properties.defaults configuration key to set defaults for new Delta tables created in a specific session or pipeline.If you w...

1 kudos

9 hours ago

by sumitkumar_284 • New Contributor

yesterday

108 Views
3 replies
1 kudos

Not able to refresh powerbi dashboar form databricks jobs

I am trying to refresh Power BI Dashboard using Databricks jobs and constantly getting this error, but I am providing optional parameters which includes catalog and database. Also, things to note that I am able to do refresh on Power BI UI using both...

Data Engineering

108 Views
3 replies
1 kudos

yesterday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

yesterday

1 kudos

Hi @sumitkumar_284 ,Can you provide us more details? Are you using Unity Catalog? Which authentication mechanism you have? In which version of Power BI Desktop you've developed your semantic model/dashboard? Do you meet all below requirements?Publish...

1 kudos

yesterday

2 More Replies

by donlxz • New Contributor II

yesterday

73 Views
2 replies
3 kudos

Resolved! Error occurs on create materialized view with spark.sql

When creating materialized view with spark.sql function it returns following error message.[MATERIALIZED_VIEW_OPERATION_NOT_ALLOWED.MV_NOT_ENABLED] The materialized view operation CREATE is not allowed: Materialized view features are not enabled for ...

Data Engineering

73 Views
2 replies
3 kudos

yesterday

View Replies

Latest Reply

donlxz
New Contributor II

yesterday

3 kudos

Hi, @szymon_dybczak Thank you for your response.You're right, it was mentioned in the documentation—I missed it when checking.I understand now that it's not possible to do this with spark.sql. Thanks for clarifying!

3 kudos

yesterday

1 More Replies

by fellipeao • New Contributor III

05-22-2025 6:44:00 AM

1987 Views
9 replies
3 kudos

Resolved! How to create parameters that works in Power BI Report Builder (SSRS)

Hello!I'm trying to create an item in Power Bi Report Server (SSRS) connected to Databricks. I can connect normally, but I'm having trouble using a parameter that Databricks recognizes.First, I'll illustrate what I do when I connect to SQL Server and...

Data Engineering

1987 Views
9 replies
3 kudos

05-22-2025 6:44:00 AM

View Replies

Latest Reply

J-Usef
New Contributor II

3 weeks ago

3 kudos

@fellipeao This is the only way I found that works well with databricks since positional arguments (?) was a fail for me. This is the latest version of paginated report builder.https://learn.microsoft.com/en-us/power-bi/paginated-reports/report-build...

3 kudos

3 weeks ago

8 More Replies

Databricks Community

Forum Posts

Question on assigning email_notification_group to DLT Job Notifications?

Orphaned __dlt_materialization schemas left behind after dropping materialized views

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

Databricks Compute Metrics Alerts

Liquid Clustering With Merge

Python Wheel in Serverless Job in DAB

Lost access to Databricks account console on Free Edition

Databricks-connect OpenSSL Handshake failed on WSL2

deadlock occurs with use statement

Resolved! create delta table in free edition

Slow PySpark operations after long DAG that contains many joins and transformations

Resolved! Set default tblproperties for pipeline

Not able to refresh powerbi dashboar form databricks jobs

Resolved! Error occurs on create materialized view with spark.sql

Resolved! How to create parameters that works in Power BI Report Builder (SSRS)

Join Us as a Local Community Builder!

Set default tblproperties for pipeline

AttributeError: module 'numpy' has no attribute 't...

Error occurs on create materialized view with spar...

How to create parameters that works in Power BI Re...

Data profiling monitoring with foreign catalog