Data Engineering

Forum Posts

Sorted by:

by george_ognyanov • New Contributor III

11-09-2023 1:54:44 AM

1901 Views
7 replies
3 kudos

Resolved! Terraform Azure Databricks Unity Catalogue - Failed to check metastore quota limit for region

I am trying to create a metastore via the Terraform Azure databricks_metastore resource but I keep getting the error: This is the exact code I am using to create the resource:I have tried using both my Databricks account and a service principal appli...

Data Engineering

1901 Views
7 replies
3 kudos

11-09-2023 1:54:44 AM

View Replies

Latest Reply

george_ognyanov
New Contributor III

11-09-2023 3:15:19 AM

3 kudos

Hi @Kaniz as far as I understand one region can have one metastore. I am able to create a metastore in the same region if I log into the Databricks GUI and do it there.Alternatively, if I already have a metastore created and try to execute the above ...

3 kudos

11-09-2023 3:15:19 AM

6 More Replies

by Nathant93 • New Contributor II

11-08-2023 3:50:15 AM

1027 Views
2 replies
0 kudos

SQL Server OUTPUT clause alternative

I am looking at after a merge or insert has happened to get the records in that batch that had been inserted via either method, much like the OUTPUT clause in sql server.Does anyone have any suggestions, the only thing I can think of is to add a time...

Data Engineering

1027 Views
2 replies
0 kudos

11-08-2023 3:50:15 AM

View Replies

Latest Reply

Nathant93
New Contributor II

11-29-2023 2:35:28 AM

0 kudos

I've managed to do it like this qry = spark.sql(f"DESCRIBE history <table_name> limit 1").collect()current_version = int(qry[0][0])prev_version = current_version - 1Then do an except statement between the versions.

0 kudos

11-29-2023 2:35:28 AM

1 More Replies

by delta_bravo • New Contributor

11-10-2023 11:59:57 AM

1030 Views
1 replies
0 kudos

Cluster termination issue

I am using Databricks as a Community Edition user with a limited cluster (just 1 Driver: 15.3 GB Memory, 2 Cores, 1 DBU). I am trying to run some custom algorithms for continuous calculations and writing results to the delta table every 15 minutes al...

Data Engineering

1030 Views
1 replies
0 kudos

11-10-2023 11:59:57 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-29-2023 1:40:51 AM

0 kudos

Hi @delta_bravo, I understand your frustration with the cluster termination issue in Databricks Community Edition. Let’s explore some potential solutions to keep your cluster alive and address the limitations you’re facing: Cluster Termination Reason...

0 kudos

11-29-2023 1:40:51 AM

by Anonymous47 • New Contributor II

11-09-2023 12:40:47 AM

929 Views
1 replies
0 kudos

Best practices to load single delta table in parallel from multiple processes.

Hi all,A delta lake table is created with identity column, and it is not possible to load the data parallelly to this table from multiple process as it leads to MetadataChangedException.Based on another post from community, we can have try to repeat ...

Data Engineering

929 Views
1 replies
0 kudos

11-09-2023 12:40:47 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-29-2023 1:21:21 AM

0 kudos

Hi @Anonymous47 , Let’s dive into your questions regarding Delta Lake and parallel writes: Best Practices for Parallel Writes: Partitioning: Choose an appropriate partition column for your Delta table. Typically, the most commonly used partition co...

0 kudos

11-29-2023 1:21:21 AM

by xssdfd • New Contributor II

11-08-2023 4:03:47 AM

4609 Views
2 replies
0 kudos

Resolved! Workspace API

Hello friends. I am having problem with Workspace API. I have many folders inside my /Workspace (200+) which I would like to copy my Program, whole Program folder, which includes 20 spark scripts are Databricks notebooks. I tried Workspace API and I ...

Data Engineering

4609 Views
2 replies
0 kudos

11-08-2023 4:03:47 AM

View Replies

Latest Reply

xssdfd
New Contributor II

11-08-2023 4:05:28 AM

0 kudos

I am using this as api = /api/2.0/workspace/import

0 kudos

11-08-2023 4:05:28 AM

1 More Replies

by Chalki • New Contributor III

11-08-2023 6:41:56 AM

1428 Views
1 replies
0 kudos

DataBricks Asset Bundles - Don't deploy to the workspace, update only the repo

Hello Guys, So basically me and my team have bunch of jobs, which are pointing to a remote repo directly - they are not pointing to the workspace of the related environment. Is there a way to update the repo part in our databricks environment, instea...

Data Engineering

1428 Views
1 replies
0 kudos

11-08-2023 6:41:56 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-29-2023 12:50:17 AM

0 kudos

Hi @Chalki, Let’s address both aspects of your question: Updating the Repo in Databricks Environment: If your jobs are currently pointing directly to a remote repository and you want to update the code without deploying it to the workspace, you can...

0 kudos

11-29-2023 12:50:17 AM

by Akshay_127877 • New Contributor II

03-21-2023 11:45:03 AM

19776 Views
7 replies
1 kudos

How to open Streamlit URL that is hosted by Databricks in local web browser?

I have run this webapp code on Databricks notebook. It works properly without any errors. With databricks acting as server, I am unable open this link on my browser for this webapp.But when I run the code on my local IDE, I am able to just open the U...

Data Engineering

19776 Views
7 replies
1 kudos

03-21-2023 11:45:03 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-26-2023 9:54:27 PM

1 kudos

Hi @Akshay Aravinnakshan Thank you for posting your question in our community!Your input matters! Help our community thrive by coming back and marking the most helpful and accurate answers. Together, we can make a difference!

1 kudos

03-26-2023 9:54:27 PM

6 More Replies

by KNYSJOA • New Contributor

07-18-2023 8:16:05 AM

1238 Views
4 replies
0 kudos

SDK Workspace client HTTP Connection Pool

Hello.Do you know how to solve issue with the HTTPSConnectionPool when we are using SDK WorkspaceClient in notebook via workflow?I would like to trigger job when some conditions are met. These condition are done using Python. I am using SDK to trigge...

Data Engineering

1238 Views
4 replies
0 kudos

07-18-2023 8:16:05 AM

View Replies

Latest Reply

Dribka
New Contributor III

11-28-2023 11:18:42 AM

0 kudos

It seems like the issue you're facing with the HTTPSConnectionPool in the SDK WorkspaceClient when using it within a workflow may be related to the environment variables or credentials not being propagated correctly. When running the notebook manuall...

0 kudos

11-28-2023 11:18:42 AM

3 More Replies

by Deexith • New Contributor

04-27-2023 9:31:00 AM

2418 Views
3 replies
0 kudos

getting this error in logs Status logger error unable to locate configured logger context factory though i am able to connect with databricks db and retrive the data for mulesoft integration

ERROR StatusLogger Unable to locate configured LoggerContextFactory org.mule.runtime.module.launcher.log4j2.MuleLog4jContextFactoryERROR StatusLogger Unable to load class org.apache.logging.log4j.core.config.xml.XmlConfigurationFactoryjava.lang.Class...

Data Engineering

2418 Views
3 replies
0 kudos

04-27-2023 9:31:00 AM

View Replies

Latest Reply

DataBricks1565
New Contributor II

11-28-2023 9:14:46 AM

0 kudos

Hi @Uppala Deexith Any update on how you fixed this issue would greatly appreciated.

0 kudos

11-28-2023 9:14:46 AM

2 More Replies

by Agus1 • New Contributor III

11-08-2023 10:07:12 AM

1090 Views
3 replies
1 kudos

Obtain the source table version number from checkpoint file when using Structured Streaming

Hello!I'm using Structured Streaming to write to a delta table. The source is another delta table written with Structured Streaming as well. In order to datacheck the results I'm attempting to obtain from the checkpoint files of the target table the ...

Data Engineering

1090 Views
3 replies
1 kudos

11-08-2023 10:07:12 AM

View Replies

Latest Reply

Agus1
New Contributor III

11-28-2023 7:01:31 AM

1 kudos

Hello @Kaniz, thank you for your answer.I'm a bit confused here because you seem to be describing the opposite behavior of what I've seen in our checkpoint files.Here I repost my examples to try to understand better.First checkpoint file:{"sourceVers...

1 kudos

11-28-2023 7:01:31 AM

2 More Replies

by CKBertrams • New Contributor III

11-28-2023 4:25:14 AM

729 Views
2 replies
2 kudos

Resolved! Stream failure notifications

Hi all,I have a job running three consecutive streams, when just one of them fails I want to get notified. The notification only triggers when all tasks have failed or are skipped/canceled. Does anyone have a suggestion on how to implement this?

Data Engineering

729 Views
2 replies
2 kudos

11-28-2023 4:25:14 AM

View Replies

Latest Reply

deng_dev
New Contributor III

11-28-2023 5:28:54 AM

2 kudos

Hi!You can add notifications directly on tasks

2 kudos

11-28-2023 5:28:54 AM

1 More Replies

by Kayla • Contributor

11-17-2023 6:36:53 AM

713 Views
2 replies
0 kudos

Clusters Suddenly Failing - java.lang.RuntimeException: abort: DriverClient destroyed

I'm having clusters randomly failing that we've been using without issue for weeks. We're able to run a handful of cells and then get an error about "java.lang.RuntimeException: abort: DriverClient destroyed". Has anyone run into this before?Edit: I ...

Data Engineering

713 Views
2 replies
0 kudos

11-17-2023 6:36:53 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-27-2023 12:14:20 AM

0 kudos

Hi @Kayla , Let’s explore some potential solutions to address this issue: Cluster Configuration: You mentioned that the same code worked before with a smaller 6-node cluster but started failing after upgrading to a 12-node cluster. Consider the f...

0 kudos

11-27-2023 12:14:20 AM

1 More Replies

by Trifa • New Contributor II

11-28-2023 3:23:16 AM

232 Views
0 replies
0 kudos

Override DLT Fille Refresh using a Job parameter

HelloI have a Job with a DLT pipeline as a first task. From time to time, I want to execute this Job with a Full Refresh of the DLT pipeline. How could I override my default "full_refresh = false" ?This was possible before using the Legacy parameters...

Data Engineering

232 Views
0 replies
0 kudos

11-28-2023 3:23:16 AM

by mk3 • New Contributor II

11-23-2023 4:24:38 AM

1874 Views
2 replies
1 kudos

Resolved! Plotly error while plotting into ipywidget output on Databricks

Hello, while plotting into databricks notebook with plotly into ipywidget output, I am getting following error SyntaxError: expected expression, got ','. Here is my snippet. import ipywidgets, random import plotly.express as px from plotly.offline ...

Data Engineering

Plotly

1874 Views
2 replies
1 kudos

11-23-2023 4:24:38 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-26-2023 11:11:02 PM

1 kudos

Hi @mk3 , Instead of using interactive_output, consider directly displaying the Plotly figure as a widget. You can create a Plotly figure widget and update it dynamically based on widget interactions.

1 kudos

11-26-2023 11:11:02 PM

1 More Replies

by nag_kanchan • New Contributor III

11-28-2023 12:59:00 AM

383 Views
0 replies
0 kudos

Applying SCD in DLT using 3 different tables at source

My organization has recently started using Delta Live Tables in Databricks for data modeling. One of the dimensions I am trying to model takes data from 3 existing tables in the data lake and needs to be slowly changing dimensions (SCD Type 1).This a...

Data Engineering

SCD

383 Views
0 replies
0 kudos

11-28-2023 12:59:00 AM

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! Terraform Azure Databricks Unity Catalogue - Failed to check metastore quota limit for region

SQL Server OUTPUT clause alternative

Cluster termination issue

Best practices to load single delta table in parallel from multiple processes.

Resolved! Workspace API

DataBricks Asset Bundles - Don't deploy to the workspace, update only the repo

How to open Streamlit URL that is hosted by Databricks in local web browser?

SDK Workspace client HTTP Connection Pool

getting this error in logs Status logger error unable to locate configured logger context factory though i am able to connect with databricks db and retrive the data for mulesoft integration

Obtain the source table version number from checkpoint file when using Structured Streaming

Resolved! Stream failure notifications

Clusters Suddenly Failing - java.lang.RuntimeException: abort: DriverClient destroyed

Override DLT Fille Refresh using a Job parameter

Resolved! Plotly error while plotting into ipywidget output on Databricks

Applying SCD in DLT using 3 different tables at source

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...