cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

surajitDE
by New Contributor III
  • 22 Views
  • 2 replies
  • 0 kudos

Question on assigning email_notification_group to DLT Job Notifications?

Hi Folks,I wanted to check if there’s a way to assign an email notification group to a Delta Live Tables (DLT) job for notifications.I know that it’s possible to configure Teams workflows and email notification groups for Databricks jobs, but in the ...

  • 22 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @surajitDE ,At the moment, DLT doesn’t support linking existing email notification groups or Teams workflows directly. You can only add individual email addresses in the DLT UI.If you have a group email alias, you can use it as a single address so...

  • 0 kudos
1 More Replies
sgreenuk
by Visitor
  • 23 Views
  • 1 replies
  • 0 kudos

Orphaned __dlt_materialization schemas left behind after dropping materialized views

Hi everyone,I’m seeing several internal schemas under the __databricks_internal catalog that were auto-created when I built a few materialized views in Databricks SQL. However, after dropping the materialized views, the schemas were not automatically...

  • 23 Views
  • 1 replies
  • 0 kudos
Latest Reply
nayan_wylde
Honored Contributor III
  • 0 kudos

Yes, this is expected behavior in Databricks. The __databricks_internal catalog contains system-owned schemas that support features like materialized views and Delta Live Tables (DLT). When you create materialized views, Databricks generates internal...

  • 0 kudos
databricksero
by Visitor
  • 23 Views
  • 1 replies
  • 0 kudos

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

Hi everyone,I’m running into an issue with a Delta Live Tables (DLT) pipeline that processes a few transformation layers (raw → intermediate → primary → feature).When I trigger the entire pipeline, it fails with the following error:can not infer sche...

  • 23 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 0 kudos

@databricksero  The error occurs right at this line:pythondf_spark = spark.createDataFrame(df_cleaned)This issue arises because, during the end-to-end execution of the pipeline, df_cleaned might end up being an empty pandas DataFrame. This can happen...

  • 0 kudos
pranaav93
by New Contributor II
  • 38 Views
  • 1 replies
  • 1 kudos

Databricks Compute Metrics Alerts

Hi All,Im looking for some implementation ideas where i can use information from the system.compute.node_timeline table to catch memory spikes and if above a given threshold restart the cluster through an API call. Have any of you implemented a simil...

  • 38 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hey @pranaav93  A very common use case for using system table system.compute.node_timeline to build alerting and remediation. Check this KB https://kb.databricks.com/en_US/clusters/getting-node-specific-instead-of-cluster-wide-memory-usage-data-from-...

  • 1 kudos
Mous92i
by Visitor
  • 20 Views
  • 1 replies
  • 0 kudos

Liquid Clustering With Merge

Hello I’m facing severe performance issues with a  merge into databricksmerge_condition = """ source.data_hierarchy = target.data_hierarchy AND source.sensor_id = target.sensor_id AND source.timestamp = target.timestamp """The target Delt...

  • 20 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 0 kudos

@Mous92i Root CauseObserving the log:"MERGE operation, scanning files for matches … 32 min | 3113/3113 files scanned (~72.2 GiB)" shows that every data file in the target is scanned during the merge. This leads to high input/output and long execution...

  • 0 kudos
georgemichael40
by New Contributor II
  • 16 Views
  • 0 replies
  • 0 kudos

Python Wheel in Serverless Job in DAB

Hey,I am trying to run a job with serverless compute, that runs python scripts.I need the paramiko package to get my scripts to work. I managed to get it working by doing:environments:- environment_key: default# Full documentation of this spec can be...

  • 16 Views
  • 0 replies
  • 0 kudos
chanukya-pekala
by Contributor II
  • 39 Views
  • 2 replies
  • 2 kudos

Lost access to Databricks account console on Free Edition

Hi everyone,I'm having trouble accessing the Databricks account console and need some guidance.Background:I successfully set up Databricks Free Edition with Terraform using my personal accountI was able to access accounts.cloud.databricks.com to obta...

  • 39 Views
  • 2 replies
  • 2 kudos
Latest Reply
chanukya-pekala
Contributor II
  • 2 kudos

Yah, I have seen the docs. I wonder how I could get the s3 url, I do have an account with my email before Free edition, may be community edition. When i opened accounts console, it did show up, was thinking it could be the free edition only! Thanks a...

  • 2 kudos
1 More Replies
vpacik
by New Contributor
  • 2170 Views
  • 1 replies
  • 0 kudos

Databricks-connect OpenSSL Handshake failed on WSL2

When trying to setup databricks-connect on WSL2 using 13.3 cluster, I receive the following error regarding OpenSSL CERTIFICATE_ERIFY_FAILED.The authentication is done via SPARK_REMOTE env. variable. E0415 11:24:26.646129568 142172 ssl_transport_sec...

  • 2170 Views
  • 1 replies
  • 0 kudos
Latest Reply
ez
New Contributor
  • 0 kudos

@vpacik Was it solved? I have the same issue

  • 0 kudos
donlxz
by New Contributor II
  • 20 Views
  • 1 replies
  • 0 kudos

deadlock occurs with use statement

When issuing a query from Informatica using a Delta connection, the statement use catalog_name.schema_name is executed first. At that time, the following error appeared in the query history:Query could not be scheduled: (conn=5073499)Deadlock found w...

  • 20 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 0 kudos

@donlxz Informatica documentation and community discussions mention deadlock retry strategies, primarily for DML operations. However, metadata locks for catalog or schema operations can also lead to deadlocks.https://docs.informatica.com/data-integra...

  • 0 kudos
Hritik_Moon
by New Contributor II
  • 266 Views
  • 7 replies
  • 3 kudos

Resolved! create delta table in free edition

table_name = f"project.bronze.{file_name}"spark.sql(    f"""    CREATE TABLE IF NOT EXISTS {table_name}    USING DELTA    """) what am I getting wrong?

  • 266 Views
  • 7 replies
  • 3 kudos
Latest Reply
Hritik_Moon
New Contributor II
  • 3 kudos

yes, multiline solved it. .Is there any better approach to this scenario?

  • 3 kudos
6 More Replies
Jonathan_
by New Contributor II
  • 73 Views
  • 2 replies
  • 3 kudos

Slow PySpark operations after long DAG that contains many joins and transformations

We are using PySpark and notice that when we are doing many transformations/aggregations/joins of the data then at some point the execution time of simple task (count, display, union of 2 tables, ...) become very slow even if we have a small data (ex...

  • 73 Views
  • 2 replies
  • 3 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 3 kudos

@Jonathan_ I think @Khaja_Zaffer has raised some great points. With Spark, broadcast joins will ensure the smaller table is in memory across all of the worker nodes. This should certainly help with speed. Shuffling is certainly always going to take s...

  • 3 kudos
1 More Replies
B_Stam
by New Contributor II
  • 47 Views
  • 1 replies
  • 1 kudos

Resolved! Set default tblproperties for pipeline

I like to set tblproperties ("delta.feature.timestampNtz" = "supported") for all tables in a pipeline. instead of set this option for every table definition. The property must be set direct on creation. I have tried it in the pipeline settings - conf...

  • 47 Views
  • 1 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 1 kudos

Databricks does not allow you to set a global default for all TBLPROPERTIES. However, you can use the spark.databricks.delta.properties.defaults configuration key to set defaults for new Delta tables created in a specific session or pipeline.If you w...

  • 1 kudos
sumitkumar_284
by New Contributor
  • 108 Views
  • 3 replies
  • 1 kudos

Not able to refresh powerbi dashboar form databricks jobs

I am trying to refresh Power BI Dashboard using Databricks jobs and constantly getting this error, but I am providing optional parameters which includes catalog and database. Also, things to note that I am able to do refresh on Power BI UI using both...

  • 108 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @sumitkumar_284 ,Can you provide us more details? Are you using Unity Catalog? Which authentication mechanism you have? In which version of Power BI Desktop you've developed your semantic model/dashboard? Do you meet all below requirements?Publish...

  • 1 kudos
2 More Replies
donlxz
by New Contributor II
  • 73 Views
  • 2 replies
  • 3 kudos

Resolved! Error occurs on create materialized view with spark.sql

When creating materialized view with spark.sql function it returns following error message.[MATERIALIZED_VIEW_OPERATION_NOT_ALLOWED.MV_NOT_ENABLED] The materialized view operation CREATE is not allowed: Materialized view features are not enabled for ...

  • 73 Views
  • 2 replies
  • 3 kudos
Latest Reply
donlxz
New Contributor II
  • 3 kudos

Hi, @szymon_dybczak Thank you for your response.You're right, it was mentioned in the documentation—I missed it when checking.I understand now that it's not possible to do this with spark.sql. Thanks for clarifying!

  • 3 kudos
1 More Replies
fellipeao
by New Contributor III
  • 1987 Views
  • 9 replies
  • 3 kudos

Resolved! How to create parameters that works in Power BI Report Builder (SSRS)

Hello!I'm trying to create an item in Power Bi Report Server (SSRS) connected to Databricks. I can connect normally, but I'm having trouble using a parameter that Databricks recognizes.First, I'll illustrate what I do when I connect to SQL Server and...

fellipeao_0-1747918499426.png fellipeao_1-1747918679264.png fellipeao_2-1747918734966.png fellipeao_3-1747918927934.png
  • 1987 Views
  • 9 replies
  • 3 kudos
Latest Reply
J-Usef
New Contributor II
  • 3 kudos

@fellipeao This is the only way I found that works well with databricks since positional arguments (?) was a fail for me. This is the latest version of paginated report builder.https://learn.microsoft.com/en-us/power-bi/paginated-reports/report-build...

  • 3 kudos
8 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels