cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kenmyers-8451
by Contributor II
  • 3345 Views
  • 9 replies
  • 13 kudos

Workflows now harder to find old failed runs

Some time in the past few weeks I think there was an update to databricks workflows. Previously you could:run a workflowit failsrepair the workflowclick into the workflowview past runs before that failed via a dropdown bar (like in the screenshot bel...

kenmyers8451_0-1758037918239.png kenmyers8451_1-1758038200242.png kenmyers8451_2-1758038323021.png
  • 3345 Views
  • 9 replies
  • 13 kudos
Latest Reply
hansonma-8451
New Contributor II
  • 13 kudos

I am a Databricks Admin in the workspace that @kenmyers-8451 is having problems in and I am getting the same issue where the retries show up for a brief second but then redirect/refresh and then the retries disappear.This seems to happen when the wor...

  • 13 kudos
8 More Replies
kranthit
by New Contributor II
  • 1216 Views
  • 2 replies
  • 0 kudos

Serverless base env setup in Databricks Asset Bundle (DAB)

I am trying to set a base environment for my task (notebook) which is running on serverless, following is the dab yaml i am using when i did bundle deploy -t users, its not throwing any error but its not installing the libraries from the base env, ca...

  • 1216 Views
  • 2 replies
  • 0 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 0 kudos

Your YAML is valid, but the reason the libraries are not being installed is because base_environment_path is not supported for serverless compute. Serverless jobs use a fully managed environment and you can’t override it with a custom base environmen...

  • 0 kudos
1 More Replies
ivni
by New Contributor III
  • 1471 Views
  • 8 replies
  • 1 kudos

JDBC driver CPU consumption

Hi,I am using JDBC driver to execute an insert statement with several thousand of rows (~4MB). It takes several seconds to complete and for some reason consumes 1 full CPU core for it.It seems like a lot of the time is spent in this method:com.databr...

  • 1471 Views
  • 8 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ivni ,Yes, that method could be CPU intensive. According to driver's docs it removes catalog name from query statement. But it doing this via regex patterns - this is heavy operation from CPU perspective, especially if you have a lot of complex q...

  • 1 kudos
7 More Replies
aravindan_tk
by Databricks Partner
  • 2092 Views
  • 1 replies
  • 1 kudos

Resolved! Issue with Lakebridge transpile installation – SSL Certificate Error

Hi Team,I am trying to use Lakebridge to test a small piece of code for conversion. The base installation of Lakebridge worked fine, but when I attempt to install transpile, I encounter SSL-related errors. I even tried to hardcode of the certificates...

  • 2092 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

The error indicates that while installing Lakebridge's transpile component on Databricks with Python 3.13, SSL certificate verification fails due to a "Missing Authority Key Identifier" in the certificate chain. This is a result of stricter requireme...

  • 1 kudos
Satyam_Patel
by New Contributor II
  • 1573 Views
  • 3 replies
  • 2 kudos

Resolved! Inconsistent behavior of LakeBridge transpiler for similar scripts

Hi Everyone,I am testing the LakeBridge prototype and noticed inconsistent behavior when converting stored procedures.For simple scripts, the conversion is correct.But for medium/complex scripts, especially those with multiple LEFT JOINs and column e...

Satyam_Patel_0-1758178789856.png Satyam_Patel_2-1758178926101.png
Data Engineering
Lakebridge
  • 1573 Views
  • 3 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

Is there a way to standardize/optimize the conversion so it doesn’t blow up into thousands of lines?   Yes, there are actionable methods to standardize and optimize LakeBridge conversions to prevent code from ballooning into thousands of lines, but t...

  • 2 kudos
2 More Replies
radhag
by Databricks Partner
  • 733 Views
  • 1 replies
  • 2 kudos

Resolved! Vacuuming clones in USER_ISOLATION mode and ThreadPool Executor

Hello,I run my process on a shared, interactive cluster (data security mode: USER_ISOLATION).I run operations on multiple tables having each of them as a separate thread, pseudo-code :try: truncate target_tablevacuum target_table (retain 0 hours with...

  • 733 Views
  • 1 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

This is most likely a bug. It certainly is unexpected and should be reported to Databricks support or your platform administrator for clarification and remediation. As a temporary workaround, running vacuum commands outside of threaded contexts or s...

  • 2 kudos
SharathE
by New Contributor III
  • 5542 Views
  • 5 replies
  • 0 kudos

Delta Live tables stream output to Kafka

Hello,Wanted to Know if we can write the stream output to a Kafka topic  in DLT pipeline?Please let me know.Thankyou.

  • 5542 Views
  • 5 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

Hi ! Ensure your code is set up to use these libraries. Here is the complete example:  Navigate to your cluster configuration:Go to your Databricks workspace.Click on "Clusters" and select your cluster.Go to the "Libraries" tab.  Install the necessar...

  • 0 kudos
4 More Replies
HoussemBL
by New Contributor III
  • 1769 Views
  • 5 replies
  • 0 kudos

Databricks bundle repository permission

Hi everyone,How can I use Databricks Asset Bundle configuration to set permissions on the workspace folder (root_path) where my code is deployed, in order to protect it from manual changes by users?My current bundle config for production looks like t...

  • 1769 Views
  • 5 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @HoussemBL ,In Databricks there is a users group to which by deafult all the workspace users belong (in UI displayed All workspace users). That group has default permission that cannot be revoked at the top-level Shared folder. So, any new folder ...

  • 0 kudos
4 More Replies
htu
by Contributor
  • 29697 Views
  • 25 replies
  • 30 kudos

Installing Databricks Connect breaks pyspark local cluster mode

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remot...

  • 29697 Views
  • 25 replies
  • 30 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 30 kudos

Hi Utu,try doing something like this , wrap import within a fixture itself.import osimport pytestfrom pyspark.sql import SparkSession_local_test = True@pytest.fixture(scope='session')def spark():    if 'DATABRICKS_RUNTIME_VERSION' in os.environ:     ...

  • 30 kudos
24 More Replies
gayatrikhatale
by Databricks Partner
  • 4746 Views
  • 6 replies
  • 7 kudos

Resolved! Retrieving Last Data and Metadata Refresh DateTimes of table in Databricks

Hi, I had a query regarding how to accurately retrieve the last data refresh datetime and last metadata refresh datetime for tables in Databricks. Currently, the only reliable approach I am aware of is using the DESCRIBE HISTORY command with filters ...

  • 4746 Views
  • 6 replies
  • 7 kudos
Latest Reply
gayatrikhatale
Databricks Partner
  • 7 kudos

Thank you @szymon_dybczak , @siva-anantha !

  • 7 kudos
5 More Replies
Gauri_Bhandari
by New Contributor
  • 960 Views
  • 1 replies
  • 2 kudos

inconsistency between the analyze and transpile commands with respect to SSIS support.

Hi Team,I'm using Databricks Labs LakeBridge and noticed an inconsistency between the analyze and transpile commands with respect to SSIS support.Analyzer:When I run the analyze command, I’m able to select SSIS as the source technology, and the tool ...

  • 960 Views
  • 1 replies
  • 2 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 2 kudos

Hi Gauri,As I can read but someone from databricks can also confirm: as of now, SSIS is not supported as a source dialect for the transpile command in Databricks Labs LakeBridge. The analyze command supports SSIS for assessment and reporting, but the...

  • 2 kudos
AanchalSoni
by Databricks Partner
  • 3206 Views
  • 5 replies
  • 3 kudos

Resolved! Unable to use Auto Loader for External Location in Community Edition

Using Community EditionI'm trying to create a pipeline(streaming) using auto loader (accessing external location) and each time my select query is thrown this error- " An error occurred while calling t.analyzeAndFormatResult. : java.lang.UnsupportedO...

  • 3206 Views
  • 5 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @AanchalSoni ,To make it clear - are you using Free Edition or Community Edition? Because people confuses them all the time so it's better to clarify this first.

  • 3 kudos
4 More Replies
mattstyl-ff
by Databricks Partner
  • 2258 Views
  • 8 replies
  • 1 kudos

Error with AutoLoader pipeline ingesting from external location: LOCATION_OVERLAP

Hello,I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in those posts, the error was related to the d...

  • 2258 Views
  • 8 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 1 kudos

also try doing this :csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/raw_data/dummy.csv" add another folder for file 

  • 1 kudos
7 More Replies
RikL
by Databricks Partner
  • 861 Views
  • 1 replies
  • 1 kudos

Resolved! PipelineSpec object does not seem to show event_log when defining a pipeline with DAB

Hi all, I am looking for help on a very specific subject.I am trying to access the event_log property (EventLogSpec) of an object from PipelineSpec that I get by running a query on the Workspace Client, which is part of the Databricks Python sdk:w.pi...

  • 861 Views
  • 1 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

hi @RikL  Thank you for reaching out. It doesn't seem you are doing anything wrong. Per the documentation, indeed, the event_log spec should be retrieved when you run the pipeline.get, then the spec. I was able to test and confirm the correct behavio...

  • 1 kudos
alexbush-mas
by New Contributor
  • 1000 Views
  • 1 replies
  • 1 kudos

Custom stream-stream join using transformWithState - expanded example

Hi all,Just wondering if anyone has more information / an expanded example of the "Custom stream-stream join using transformWithState" example on Stateful applications page: Example stateful applications | Databricks on AWSI'm looking to implement so...

  • 1000 Views
  • 1 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 1 kudos

Hello @alexbush-mas Good day!! Unioning the streams is the standard method for feeding multiple input streams into a single transformWithStateInPandas operation for custom stream-stream joins, so your intuition is correct. After grouping the input by...

  • 1 kudos
Labels