Get Started Discussions

by Gal_Sb • New Contributor

03-19-2025 6:18:15 AM

1455 Views
1 replies
0 kudos

Text alignment in databricks dashboard markdown

Hi All,How can I align the text inside the Dashboard markdown to the middle?Is there an option to do this?Thanks,Gal

Get Started Discussions

Reply

1455 Views
1 replies
0 kudos

03-19-2025 6:18:15 AM

View Replies

Latest Reply

Advika
Databricks Employee

03-24-2025 8:11:17 AM

0 kudos

Hello @Gal_Sb! Databricks markdown does not support text alignment, and HTML/CSS do not work for this purpose in Databricks dashboards. You can try formatting options like headers or spacing adjustments. I'll also check with the team to explore possi...

0 kudos

03-24-2025 8:11:17 AM

by T0M • New Contributor III

02-22-2025 5:50:07 AM

1568 Views
3 replies
1 kudos

Resolved! DLT Pipeline Validate will always spawn new cluster

Hi all!I've started learning DLT-Pipelines but I am struggling with the development of a pipeline.As far as I understand it, once I click on “Validate” a cluster will spin-up and stay (by default for 2hours), if the pipeline is in “Development” mode....

Get Started Discussions

Reply

1568 Views
3 replies
1 kudos

02-22-2025 5:50:07 AM

View Replies

Latest Reply

T0M
New Contributor III

03-24-2025 3:47:51 AM

1 kudos

Well, turns out if I do not make any changes to the cluster settings when creating a new pipeline (i.e. keep default) it works as expected (every new "validate" skips the "waiting for resources"-step).Initially, I reduced the number of workers to a m...

1 kudos

03-24-2025 3:47:51 AM

2 More Replies

by surajitDE • New Contributor III

03-16-2025 11:29:26 PM

1422 Views
4 replies
0 kudos

DLT refresh time for combination of streaming and non streaming tables?

@dlt.tabledef joined_table(): dim_df = spark.read.table("dim_table") # Reloads every batch fact_df = spark.readStream.table("fact_stream") return fact_df.join(dim_df, "id", "left")

Get Started Discussions

Reply

1422 Views
4 replies
0 kudos

03-16-2025 11:29:26 PM

View Replies

Latest Reply

brycejune
New Contributor III

03-22-2025 4:03:15 AM

0 kudos

Hi,Current approach reloads dim_df in every batch, which can be inefficient. To optimize, consider broadcasting dim_df if it's small or using a mapGroupsWithState function for stateful joins. Also, ensure that fact_df has sufficient watermarking to h...

0 kudos

03-22-2025 4:03:15 AM

3 More Replies

by dollyb • Contributor II

03-03-2024 8:54:51 AM

9949 Views
2 replies
0 kudos

How to detect if running in a workflow job?

Hi there,what's the best way to differentiate in what environment my Spark session is running? Locally I develop with databricks-connect's DatabricksSession, but that doesn't work when running a workflow job which requires SparkSession.getOrCreate()....

Get Started Discussions

Reply

9949 Views
2 replies
0 kudos

03-03-2024 8:54:51 AM

View Replies

Latest Reply

Rob-Altmiller
Databricks Employee

03-21-2025 8:08:17 PM

0 kudos

import json def get_job_context(): """Retrieve job-related context from the current Databricks notebook.""" # Retrieve the notebook context ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext() # Convert the context...

0 kudos

03-21-2025 8:08:17 PM

1 More Replies

by SB93 • New Contributor II

03-20-2025 12:50:44 AM

1239 Views
1 replies
0 kudos

Help Needed: Executor Lost Error in Multi-Node Distributed Training with PyTorch

Hi everyone,I'm currently working on distributed training of a PyTorch model, following the example provided here. The training runs perfectly on a single node with a single GPU. However, when I attempt multi-node training using the following configu...

Get Started Discussions

Reply

1239 Views
1 replies
0 kudos

03-20-2025 12:50:44 AM

View Replies

Latest Reply

cgrant
Databricks Employee

03-21-2025 9:02:56 AM

0 kudos

We do not recommend using spot instances with distributed ML training workloads that use barrier mode, like TorchDistributor as these workloads are extremely sensitive to executor loss. Please disable spot/pre-emption and try again.

0 kudos

03-21-2025 9:02:56 AM

by manoj_2355ca • New Contributor III

01-05-2024 1:15:58 AM

4874 Views
2 replies
0 kudos

cannot create external location: invalid Databricks Workspace configuration

HI AllI am trying to create databricks storage credentials , external location and catalog with terraform.cloud : AzureMy storage credentials code is working correctly . But the external location code is throwing below error when executing the Terraf...

Get Started Discussions

azuredatabricks

Reply

4874 Views
2 replies
0 kudos

01-05-2024 1:15:58 AM

View Replies

Latest Reply

badari_narayan
New Contributor II

03-20-2025 9:54:13 PM

0 kudos

Hi @manoj_2355ca , I am also facing the same error, did you get the solution for it?

0 kudos

03-20-2025 9:54:13 PM

1 More Replies

by vigneshkannan12 • New Contributor

06-07-2024 3:13:34 AM

5669 Views
5 replies
0 kudos

typing extensions import match error

I am trying to install the stanza library and try to create a udf function to create NER tags for my chunk_text in the dataframe.Cluster Config: DBR 14.3 LTS SPARK 3.5.0 SCALA 2.12below code:def extract_entities(text import stanza nlp = stanza....

Get Started Discussions

Reply

5669 Views
5 replies
0 kudos

06-07-2024 3:13:34 AM

View Replies

Latest Reply

Optimusprime
New Contributor II

03-20-2025 3:38:13 PM

0 kudos

@SaadhikaB Hi, when I run dbutils.library.restartPython(), I get the following error

0 kudos

03-20-2025 3:38:13 PM

4 More Replies

by unj1m • New Contributor III

12-27-2024 12:02:54 PM

9736 Views
4 replies
0 kudos

Resolved! What version of Python is used for the 16.1 runtime

I'm trying to create a spark udf for a registered model and getting:Exception: Python versions in the Spark Connect client and server are different. To execute user-defined functions, client and server should have the same minor Python version. Pleas...

Get Started Discussions

Reply

9736 Views
4 replies
0 kudos

12-27-2024 12:02:54 PM

View Replies

Latest Reply

AndriusVitkausk
New Contributor III

03-19-2025 4:59:40 AM

0 kudos

Does this mean that:1. A new dbx runtime comes out2. Serverless compute automatically switches to the new runtime + new python version3. Any external environments that use serverless ie, local VScode / CICD environments also need to upgrade their pyt...

0 kudos

03-19-2025 4:59:40 AM

3 More Replies

by nikhil_2212 • New Contributor II

03-13-2025 12:29:14 PM

975 Views
1 replies
0 kudos

Lakehouse monitoring metrices tables not created automatically.

Hello,I have an external table created in databricks unity catalog workspace and trying to "Create a monitor" for the same from quality tab.While creating the same the dashboard is getting created however the two metrices tables "profile" & "drift" a...

Get Started Discussions

Reply

975 Views
1 replies
0 kudos

03-13-2025 12:29:14 PM

View Replies

Latest Reply

Advika
Databricks Employee

03-19-2025 2:00:21 AM

0 kudos

Hello @nikhil_2212! It looks like this post duplicates the one you recently posted. A response has already been provided to the Original post. I recommend continuing the discussion in that thread to keep the conversation focused and organised.

0 kudos

03-19-2025 2:00:21 AM

by VijayP • New Contributor

03-16-2025 6:42:05 AM

778 Views
1 replies
0 kudos

Stream processing large number of JSON files and handling exception

application writes several JSON (small) files and the expected volumes of these files are high ( Estimate: 1 million during the peak season in a hourly window) . As per current design, these files are streamed through Spark Stream and we use autolo...

Get Started Discussions

Reply

778 Views
1 replies
0 kudos

03-16-2025 6:42:05 AM

View Replies

Latest Reply

cgrant
Databricks Employee

03-18-2025 2:50:27 PM

0 kudos

We have customers that read millions of files per hour+ using Databricks Auto Loader. For high-volume use cases, we recommend enabling file notification mode, which, instead of continuously performing list operations on the filesystem, uses cloud nat...

0 kudos

03-18-2025 2:50:27 PM

by Pooviond • New Contributor

03-17-2025 8:21:22 PM

1132 Views
1 replies
0 kudos

Urgent: Need Authentication Reset for Databricks Workspace Access

I am unable to access my Databricks workspace because it is still redirecting to Microsoft Entra ID (Azure AD) authentication, even after I have removed the Azure AD enterprise application and changed the AWS IAM Identity Center settings.Issue Detail...

Get Started Discussions

Reply

1132 Views
1 replies
0 kudos

03-17-2025 8:21:22 PM

View Replies

Latest Reply

Advika
Databricks Employee

03-18-2025 6:46:24 AM

0 kudos

Hello @Pooviond! Please submit a ticket with the Databricks Support team for assistance in resolving this issue.

0 kudos

03-18-2025 6:46:24 AM

by mrstevegross • Contributor III

03-12-2025 10:33:38 AM

4539 Views
4 replies
1 kudos

Resolved! How best to measure the time-spent-waiting-for-an-instance?

I'm exploring using an instance pool. Can someone clarify for me which job event log tells me the time-spent-waiting-for-an-instance? I've found 2 candidates:1. The delta between "waitingForCluster" and "started" on the "run events" log, accessible v...

Get Started Discussions

Reply

4539 Views
4 replies
1 kudos

03-12-2025 10:33:38 AM

View Replies

Latest Reply

julieAnderson
New Contributor II

03-17-2025 7:07:24 PM

1 kudos

System Logs or Event Timings

1 kudos

03-17-2025 7:07:24 PM

3 More Replies

by Forssen • New Contributor II

01-30-2025 10:44:09 AM

1632 Views
2 replies
1 kudos

Resolved! When is it time to change from ETL in notebooks to whl/py?

Hi!I would like some input/tips from the community regarding when is it time to go from a working solution in notebooks to something more "stable", like whl/py-files?What are the pros/cons with notebooks compared to whl/py?The way i structured things...

Get Started Discussions

Reply

1632 Views
2 replies
1 kudos

01-30-2025 10:44:09 AM

View Replies

Latest Reply

Isi
Honored Contributor III

02-02-2025 3:17:06 PM

1 kudos

Hey @Forssen ,My advice:Using .py files and .whl packages is generally more secure and scalable, especially when working in a team. One of the key advantages is that code reviews and version control are much more efficient with .py files, as changes ...

1 kudos

02-02-2025 3:17:06 PM

1 More Replies

by hbs59 • New Contributor III

01-11-2024 11:45:43 AM

9210 Views
7 replies
2 kudos

Resolved! Move multiple notebooks at the same time (programmatically)

If I want to move multiple (hundreds of) notebooks at the same time from one folder to another, what is the best way to do that? Other than going to each individual notebook and clicking "Move".Is there a way to programmatically move notebooks? Like ...

Get Started Discussions

Reply

9210 Views
7 replies
2 kudos

01-11-2024 11:45:43 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

03-17-2025 9:55:20 AM

2 kudos

You can use the export and import API calls in order to export this notebook to your local machine and then import it to the new workspace.Export: https://docs.databricks.com/api/workspace/workspace/exportImport: https://docs.databricks.com/api/works...

2 kudos

03-17-2025 9:55:20 AM

6 More Replies

by LasseL • New Contributor III

03-14-2025 3:59:11 AM

3559 Views
1 replies
0 kudos

Resolved! Deduplication with rocksdb, should old state files be deleted manually (to manage storage size)?

Hi, I have following streaming setup:I want to remove duplicates in streaming.1) deduplication strategy is defined by two fields: extraction_timestamp and hash (row wise hash)2) watermark strategy: extraction_timestamp with "10 seconds" interval--> R...

Get Started Discussions

Reply

3559 Views
1 replies
0 kudos

03-14-2025 3:59:11 AM

View Replies

Latest Reply

LasseL
New Contributor III

03-17-2025 9:34:21 AM

0 kudos

Found solution. https://kb.databricks.com/streaming/how-to-efficiently-manage-state-store-files-in-apache-spark-streaming-applications <-- these two parameters.

0 kudos

03-17-2025 9:34:21 AM

Databricks Community

Forum Posts

Text alignment in databricks dashboard markdown

Resolved! DLT Pipeline Validate will always spawn new cluster

DLT refresh time for combination of streaming and non streaming tables?

How to detect if running in a workflow job?

Help Needed: Executor Lost Error in Multi-Node Distributed Training with PyTorch

cannot create external location: invalid Databricks Workspace configuration

typing extensions import match error

Resolved! What version of Python is used for the 16.1 runtime

Lakehouse monitoring metrices tables not created automatically.

Stream processing large number of JSON files and handling exception

Urgent: Need Authentication Reset for Databricks Workspace Access

Resolved! How best to measure the time-spent-waiting-for-an-instance?

Resolved! When is it time to change from ETL in notebooks to whl/py?

Resolved! Move multiple notebooks at the same time (programmatically)

Resolved! Deduplication with rocksdb, should old state files be deleted manually (to manage storage size)?

Join Us as a Local Community Builder!

SQL cell v spark.sql in notebooks

API call fails to initiate create Service Principa...

Data bricks is not mounting with storage account g...

External MCP representing user data permissions

serialized_dashboard