cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

loinguyen3182
by New Contributor
  • 77 Views
  • 1 replies
  • 0 kudos

Spark Streaming Error Listing in GCS

I have faced a problem about error listing of _delta_log, when the spark read stream with delta format in GCS. This is the full log of the issue:org.apache.spark.sql.streaming.StreamingQueryException: Failed to get result: java.io.IOException: Error ...

  • 77 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

The key contributing factors to this issue, according to internal investigations and customer tickets, include: Large Number of Log Files in _delta_log: Delta Lake maintains a JSON transaction log that grows with every commit. The more files present...

  • 0 kudos
Monteiro_12
by New Contributor
  • 68 Views
  • 1 replies
  • 0 kudos

How to Add a Certified Tag to a Table Using a DLT Pipeline

Is there a table property or configuration that allows me to add a certified tag directly to a table when using a Delta Live Tables pipeline?

  • 68 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Contributor
  • 0 kudos

Hi @Monteiro_12 ,As far as I know, DLT pipeline doesn’t support adding a certified tag directly through table properties or pipeline configurations. Tags like system.Certified needs to be applied manually after the table is created via SQL

  • 0 kudos
mai_luca
by New Contributor III
  • 201 Views
  • 5 replies
  • 2 kudos

Resolved! Validation with views - Dlt pipeline expectations

I have a question about how expectations work when applied to views inside a Delta Live Tables (DLT) pipeline. For instance, suppose we define this view inside a pipeline to stop the pipeline if we spot some duplicates:@Dlt.view( name=view_name, ...

  • 201 Views
  • 5 replies
  • 2 kudos
Latest Reply
Yogesh_378691
New Contributor III
  • 2 kudos

In DLT, expectations defined with dlt.expect_or_fail() on views are only evaluated if the view is used downstream by a materialized table. Since views are logical and lazily evaluated, if no table depends on the view, the expectation is skipped and t...

  • 2 kudos
4 More Replies
cool_cool_cool
by New Contributor II
  • 1431 Views
  • 3 replies
  • 0 kudos

Databricks Workflow is stuck on the first task and doesnt do anyworkload

Heya I have a workflow in databricks with 2 tasks. They are configured to run on the same job cluster, and the second task depends on the first.I have a weird behavior that happened twice now - the job takes a long time (it usually finishes within 30...

  • 1431 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sri_M
New Contributor II
  • 0 kudos

@cool_cool_cool I am facing same issue as well.Is this issue resolved for you? If yes, can you please let me know what action have you taken?

  • 0 kudos
2 More Replies
hpant
by New Contributor III
  • 538 Views
  • 2 replies
  • 1 kudos

Is it possible to create external volume using databricks asset bundle?

Is it possible to create external volume using databricks asset bundle? I have this code from databricks.yml file which is working perfectly fine for manged volume:    resources:      volumes:        bronze_checkpoints_volume:          catalog_name: ...

  • 538 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Contributor
  • 1 kudos

bundle:name: my_azure_volume_bundleresources:volumes:my_external_volume:catalog_name: mainschema_name: my_schemaname: my_external_volumevolume_type: EXTERNALstorage_location: abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>...

  • 1 kudos
1 More Replies
PedroFaria2135
by New Contributor II
  • 299 Views
  • 1 replies
  • 0 kudos

Resolved! How to add permissions to a Databricks Workflow deployed via Asset Bundle YAML?

Hey! I was deploying a new Databricks Workflow into my workspace via Databricks Asset Bundles. Currently, I have a very simple workflow, defined in a YAML file like this: resources:  jobs:    example_job:      name: example_job      schedule:        ...

  • 299 Views
  • 1 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Hi @PedroFaria2135, this can be done using the permission key in the YAML file. Please refer to this document: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/reference#permissions    permissions: - level: CAN_VIEW group_name: te...

  • 0 kudos
stefan-vulpe
by New Contributor II
  • 390 Views
  • 2 replies
  • 1 kudos

Resolved! Batch Python UDFs in Unity Catalog and Spark SQL

Hello datanauts 六‍,I'm encountering a conceptual challenge regarding Batch Python UDFs within Spark SQL in Databricks. My primary question is: can Batch Python UDFs be used directly via Spark SQL? As a Databricks beginner, I'm seeking to understand ...

Data Engineering
spark sql
udf
Unity Catalog
  • 390 Views
  • 2 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 1 kudos

Hi @stefan-vulpe Looking at your code and the behavior you're describing, I can identify the core issue and provide some insights about Batch Python UDFs in Databricks.The Core ProblemThe issue you're encountering is related to session isolation and ...

  • 1 kudos
1 More Replies
Edoa
by New Contributor
  • 164 Views
  • 1 replies
  • 0 kudos

SFTP Connection Timeout on Job Cluster but Works on Serverless Compute

Hi all,I'm experiencing inconsistent behavior when connecting to an SFTP server using Paramiko in Databricks.When I run the code on Serverless Compute, the connection to xxx.yyy.com via SFTP works correctly.When I run the same code on a Job Cluster, ...

  • 164 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

Hi @Edoa This is a common networking issue in Databricks related to the different network configurations between Serverless Compute and Job Clusters.Here are the key differences and potential solutions:Root CauseServerless Compute runs in Databricks'...

  • 0 kudos
DarioB
by New Contributor II
  • 489 Views
  • 1 replies
  • 0 kudos

Resolved! DAB for_each_task - Passing task values

I am trying to deploy a job with a for_each_task using DAB and Terraform and I am unable to properly pass the task value into the subsequent task.These are my job tasks definition in the YAML:      tasks:        - task_key: FS_batching          job_c...

  • 489 Views
  • 1 replies
  • 0 kudos
Latest Reply
DarioB
New Contributor II
  • 0 kudos

We have been testing and find out the issue (I just realized that my anonymization of the names removed the source of the error).We have tracked down to the inputs parameter of the for_each_task. It seems that is unable to call to task names with das...

  • 0 kudos
arendon
by New Contributor II
  • 265 Views
  • 2 replies
  • 1 kudos

Resolved! Asset Bundles: How to mute job failure notifications until final retry?

I'm trying to configure a job to only send failure notifications on the final retry failure (not on intermediate retry failures). This feature is available in the Databricks UI as "Mute notifications until the last retry", but I can't get this to wor...

  • 265 Views
  • 2 replies
  • 1 kudos
Latest Reply
arendon
New Contributor II
  • 1 kudos

Thank you for the response, @lingareddy_Alva!I'll take a look at the workarounds you shared. 

  • 1 kudos
1 More Replies
Dnt_TchTheRolex
by New Contributor II
  • 804 Views
  • 4 replies
  • 0 kudos

Trouble Enabling File Events For An External Location

Hello all,I am trying to enable file events on my Azure Workspace for the File Arrival Trigger trigger mode for Databricks Workflows. I'm following this documentation exactly (I think) but I'm not seeing the option to enable them. As you can see here...

Dnt_TchTheRolex_1-1746752104483.png Dnt_TchTheRolex_0-1746751988442.png
  • 804 Views
  • 4 replies
  • 0 kudos
Latest Reply
Stentone
New Contributor II
  • 0 kudos

@Dnt_TchTheRolex - This does seem to be a feature that needs to be enabled on the Databricks side after talking with our Databricks solutions engineer. I recommend reaching out to them to see if they can enable it for your account.

  • 0 kudos
3 More Replies
anil_reddaboina
by New Contributor II
  • 334 Views
  • 2 replies
  • 0 kudos

Slow running Spark job issue - due to the unknown spark stages created by Databircks Compute cluster

Hi Team,Recently we migrated the spark jobs from self hosted spark(YARN) Cluster to Databricks.Currently we are using the Databricks workflows with Job_Compute clusters and the Job Type - Spark JAR type execution, so when we run the job in databricks...

databricks_new_stages.png
  • 334 Views
  • 2 replies
  • 0 kudos
Latest Reply
anil_reddaboina
New Contributor II
  • 0 kudos

Hey Brahma,Thanks for your reply. As a first step I will disable AQE config and test it. We are using the node pools with job_compute cluster type so that its not spinning up a new cluster for each Job. I'm configuring the below two configs also, do ...

  • 0 kudos
1 More Replies
BMex
by New Contributor III
  • 149 Views
  • 1 replies
  • 0 kudos

Folders in Workflows/Jobs

Would be great if we could "group" Workflows/Jobs in Databricks using folders.This way, the Workflows list won't be too cluttered with all Workflows/Jobs in the same root-level.

Data Engineering
Folders
ideas
Workflows
  • 149 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @BMex! You can submit this as a feature request through the Databricks Ideas Portal. This helps the product team consider it for future improvements

  • 0 kudos
mkwparth
by New Contributor III
  • 699 Views
  • 2 replies
  • 1 kudos

Resolved! Intermittent Timeout Error While Waiting for Python REPL to Start in Databricks

Hi everyone,I’ve been encountering an error that says "Timeout while waiting for the Python REPL to start. Took longer than 60 seconds" during my work in Databricks. The issue seems to happen intermittently - sometimes the REPL starts without any pro...

  • 699 Views
  • 2 replies
  • 1 kudos
Latest Reply
mkwparth
New Contributor III
  • 1 kudos

@Rohan2405"If everything else is in place, increasing the REPL startup timeout in the cluster configuration may help accommodate slower setups".Can you please guide me how to increase the REPL timeout in cluster configuration? Like I've add this conf...

  • 1 kudos
1 More Replies
Labels