cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

surajitDE
by New Contributor III
  • 429 Views
  • 4 replies
  • 0 kudos

DLT refresh time for combination of streaming and non streaming tables?

@dlt.tabledef joined_table():    dim_df = spark.read.table("dim_table")  # Reloads every batch    fact_df = spark.readStream.table("fact_stream")    return fact_df.join(dim_df, "id", "left")

  • 429 Views
  • 4 replies
  • 0 kudos
Latest Reply
brycejune
New Contributor III
  • 0 kudos

Hi,Current approach reloads dim_df in every batch, which can be inefficient. To optimize, consider broadcasting dim_df if it's small or using a mapGroupsWithState function for stateful joins. Also, ensure that fact_df has sufficient watermarking to h...

  • 0 kudos
3 More Replies
dollyb
by Contributor
  • 8245 Views
  • 2 replies
  • 0 kudos

How to detect if running in a workflow job?

Hi there,what's the best way to differentiate in what environment my Spark session is running? Locally I develop with databricks-connect's DatabricksSession, but that doesn't work when running a workflow job which requires SparkSession.getOrCreate()....

  • 8245 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rob-Altmiller
Databricks Employee
  • 0 kudos

import json def get_job_context(): """Retrieve job-related context from the current Databricks notebook.""" # Retrieve the notebook context ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext() # Convert the context...

  • 0 kudos
1 More Replies
SB93
by New Contributor II
  • 306 Views
  • 1 replies
  • 0 kudos

Help Needed: Executor Lost Error in Multi-Node Distributed Training with PyTorch

Hi everyone,I'm currently working on distributed training of a PyTorch model, following the example provided here. The training runs perfectly on a single node with a single GPU. However, when I attempt multi-node training using the following configu...

  • 306 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

We do not recommend using spot instances with distributed ML training workloads that use barrier mode, like TorchDistributor as these workloads are extremely sensitive to executor loss. Please disable spot/pre-emption and try again.

  • 0 kudos
manoj_2355ca
by New Contributor III
  • 3694 Views
  • 2 replies
  • 0 kudos

cannot create external location: invalid Databricks Workspace configuration

HI AllI am trying to create databricks storage credentials , external location and catalog with terraform.cloud : AzureMy storage credentials code is working correctly . But the external location code is throwing below error when executing the Terraf...

Get Started Discussions
azuredatabricks
  • 3694 Views
  • 2 replies
  • 0 kudos
Latest Reply
badari_narayan
New Contributor II
  • 0 kudos

Hi @manoj_2355ca , I am also facing the same error, did you get the solution for it?

  • 0 kudos
1 More Replies
vigneshkannan12
by New Contributor
  • 3867 Views
  • 5 replies
  • 0 kudos

typing extensions import match error

I am trying to install the stanza library and try to create a udf function to create NER tags for my chunk_text in the dataframe.Cluster Config: DBR 14.3 LTS SPARK 3.5.0 SCALA 2.12below code:def extract_entities(text    import stanza    nlp = stanza....

  • 3867 Views
  • 5 replies
  • 0 kudos
Latest Reply
Optimusprime
New Contributor II
  • 0 kudos

@SaadhikaB Hi, when I run  dbutils.library.restartPython(), I get the following error 

  • 0 kudos
4 More Replies
ramisinghl01
by New Contributor
  • 397 Views
  • 0 replies
  • 0 kudos

PYTEST: Module not found error

Hi,Apologies, as I am trying to use Pytest first time. I know this question has been raised but I went through previous answers but the issue still exists.I am following DAtabricks and other articles using pytest. My structure is simple as -tests--co...

  • 397 Views
  • 0 replies
  • 0 kudos
ismaelhenzel
by Contributor
  • 3000 Views
  • 4 replies
  • 3 kudos

Failure when deploying a custom serving endpoint LLLM

I'm currently experimenting with vector search using Databricks. Everything runs smoothly when I load the model deployed in Unity Catalog into a notebook session and ask questions using Python. However, when I attempt to serve it, I encounter a gener...

ismaelhenzel_0-1715091515103.png
  • 3000 Views
  • 4 replies
  • 3 kudos
Latest Reply
Usmanr000
New Contributor II
  • 3 kudos

"Deploying a custom serving endpoint for LLMs can be challenging, especially when handling model dependencies and scaling issues. Has anyone found a reliable workaround for deployment failures? Also, for those looking for updates on government assist...

  • 3 kudos
3 More Replies
unj1m
by New Contributor III
  • 4742 Views
  • 4 replies
  • 0 kudos

Resolved! What version of Python is used for the 16.1 runtime

I'm trying to create a spark udf for a registered model and getting:Exception: Python versions in the Spark Connect client and server are different. To execute user-defined functions, client and server should have the same minor Python version. Pleas...

  • 4742 Views
  • 4 replies
  • 0 kudos
Latest Reply
AndriusVitkausk
New Contributor III
  • 0 kudos

Does this mean that:1. A new dbx runtime comes out2. Serverless compute automatically switches to the new runtime + new python version3. Any external environments that use serverless ie, local VScode / CICD environments also need to upgrade their pyt...

  • 0 kudos
3 More Replies
nikhil_2212
by New Contributor
  • 295 Views
  • 1 replies
  • 0 kudos

Lakehouse monitoring metrices tables not created automatically.

Hello,I have an external table created in databricks unity catalog workspace and trying to "Create a monitor" for the same from quality tab.While creating the same the dashboard is getting created however the two metrices tables "profile" & "drift" a...

  • 295 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @nikhil_2212! It looks like this post duplicates the one you recently posted. A response has already been provided to the Original post. I recommend continuing the discussion in that thread to keep the conversation focused and organised.

  • 0 kudos
VijayP
by New Contributor
  • 308 Views
  • 1 replies
  • 0 kudos

Stream processing large number of JSON files and handling exception

application writes several JSON  (small)  files and the expected volumes of these files are high ( Estimate: 1 million during the peak season in a hourly window) . As per current design, these files are streamed through Spark Stream and we use autolo...

  • 308 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

We have customers that read millions of files per hour+ using Databricks Auto Loader. For high-volume use cases, we recommend enabling file notification mode, which, instead of continuously performing list operations on the filesystem, uses cloud nat...

  • 0 kudos
Pooviond
by New Contributor
  • 617 Views
  • 1 replies
  • 0 kudos

Urgent: Need Authentication Reset for Databricks Workspace Access

I am unable to access my Databricks workspace because it is still redirecting to Microsoft Entra ID (Azure AD) authentication, even after I have removed the Azure AD enterprise application and changed the AWS IAM Identity Center settings.Issue Detail...

  • 617 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Pooviond! Please submit a ticket with the Databricks Support team for assistance in resolving this issue.

  • 0 kudos
mrstevegross
by Contributor III
  • 1249 Views
  • 4 replies
  • 1 kudos

Resolved! How best to measure the time-spent-waiting-for-an-instance?

I'm exploring using an instance pool. Can someone clarify for me which job event log tells me the time-spent-waiting-for-an-instance? I've found 2 candidates:1. The delta between "waitingForCluster" and "started" on the "run events" log, accessible v...

mrstevegross_0-1741800626468.png mrstevegross_1-1741800790749.png
  • 1249 Views
  • 4 replies
  • 1 kudos
Latest Reply
julieAnderson
New Contributor II
  • 1 kudos

 System Logs or Event Timings

  • 1 kudos
3 More Replies
Forssen
by New Contributor II
  • 568 Views
  • 2 replies
  • 1 kudos

Resolved! When is it time to change from ETL in notebooks to whl/py?

Hi!I would like some input/tips from the community regarding when is it time to go from a working solution in notebooks to something more "stable", like whl/py-files?What are the pros/cons with notebooks compared to whl/py?The way i structured things...

  • 568 Views
  • 2 replies
  • 1 kudos
Latest Reply
Isi
Contributor
  • 1 kudos

Hey @Forssen ,My advice:Using .py files and .whl packages is generally more secure and scalable, especially when working in a team. One of the key advantages is that code reviews and version control are much more efficient with .py files, as changes ...

  • 1 kudos
1 More Replies
hbs59
by New Contributor III
  • 7290 Views
  • 7 replies
  • 2 kudos

Resolved! Move multiple notebooks at the same time (programmatically)

If I want to move multiple (hundreds of) notebooks at the same time from one folder to another, what is the best way to do that? Other than going to each individual notebook and clicking "Move".Is there a way to programmatically move notebooks? Like ...

  • 7290 Views
  • 7 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

You can use the export and import API calls in order to export this notebook to your local machine and then import it to the new workspace.Export: https://docs.databricks.com/api/workspace/workspace/exportImport: https://docs.databricks.com/api/works...

  • 2 kudos
6 More Replies
LasseL
by New Contributor III
  • 736 Views
  • 1 replies
  • 0 kudos

Resolved! Deduplication with rocksdb, should old state files be deleted manually (to manage storage size)?

Hi, I have following streaming setup:I want to remove duplicates in streaming.1) deduplication strategy is defined by two fields: extraction_timestamp and hash (row wise hash)2) watermark strategy: extraction_timestamp with "10 seconds" interval--> R...

  • 736 Views
  • 1 replies
  • 0 kudos
Latest Reply
LasseL
New Contributor III
  • 0 kudos

Found solution. https://kb.databricks.com/streaming/how-to-efficiently-manage-state-store-files-in-apache-spark-streaming-applications <-- these two parameters.

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels