cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

kranthit
by New Contributor II
  • 338 Views
  • 2 replies
  • 0 kudos

Serverless base env setup in Databricks Asset Bundle (DAB)

I am trying to set a base environment for my task (notebook) which is running on serverless, following is the dab yaml i am using when i did bundle deploy -t users, its not throwing any error but its not installing the libraries from the base env, ca...

  • 338 Views
  • 2 replies
  • 0 kudos
Latest Reply
Yogesh_Verma_
Contributor
  • 0 kudos

Your YAML is valid, but the reason the libraries are not being installed is because base_environment_path is not supported for serverless compute. Serverless jobs use a fully managed environment and you can’t override it with a custom base environmen...

  • 0 kudos
1 More Replies
ivni
by New Contributor III
  • 455 Views
  • 8 replies
  • 1 kudos

JDBC driver CPU consumption

Hi,I am using JDBC driver to execute an insert statement with several thousand of rows (~4MB). It takes several seconds to complete and for some reason consumes 1 full CPU core for it.It seems like a lot of the time is spent in this method:com.databr...

  • 455 Views
  • 8 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ivni ,Yes, that method could be CPU intensive. According to driver's docs it removes catalog name from query statement. But it doing this via regex patterns - this is heavy operation from CPU perspective, especially if you have a lot of complex q...

  • 1 kudos
7 More Replies
aravindan_tk
by New Contributor
  • 606 Views
  • 1 replies
  • 1 kudos

Resolved! Issue with Lakebridge transpile installation – SSL Certificate Error

Hi Team,I am trying to use Lakebridge to test a small piece of code for conversion. The base installation of Lakebridge worked fine, but when I attempt to install transpile, I encounter SSL-related errors. I even tried to hardcode of the certificates...

  • 606 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

The error indicates that while installing Lakebridge's transpile component on Databricks with Python 3.13, SSL certificate verification fails due to a "Missing Authority Key Identifier" in the certificate chain. This is a result of stricter requireme...

  • 1 kudos
Satyam_Patel
by New Contributor
  • 465 Views
  • 3 replies
  • 2 kudos

Resolved! Inconsistent behavior of LakeBridge transpiler for similar scripts

Hi Everyone,I am testing the LakeBridge prototype and noticed inconsistent behavior when converting stored procedures.For simple scripts, the conversion is correct.But for medium/complex scripts, especially those with multiple LEFT JOINs and column e...

Satyam_Patel_0-1758178789856.png Satyam_Patel_2-1758178926101.png
Data Engineering
Lakebridge
  • 465 Views
  • 3 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

Is there a way to standardize/optimize the conversion so it doesn’t blow up into thousands of lines?   Yes, there are actionable methods to standardize and optimize LakeBridge conversions to prevent code from ballooning into thousands of lines, but t...

  • 2 kudos
2 More Replies
radhag
by New Contributor
  • 326 Views
  • 1 replies
  • 2 kudos

Resolved! Vacuuming clones in USER_ISOLATION mode and ThreadPool Executor

Hello,I run my process on a shared, interactive cluster (data security mode: USER_ISOLATION).I run operations on multiple tables having each of them as a separate thread, pseudo-code :try: truncate target_tablevacuum target_table (retain 0 hours with...

  • 326 Views
  • 1 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

This is most likely a bug. It certainly is unexpected and should be reported to Databricks support or your platform administrator for clarification and remediation. As a temporary workaround, running vacuum commands outside of threaded contexts or s...

  • 2 kudos
SharathE
by New Contributor III
  • 3824 Views
  • 5 replies
  • 0 kudos

Delta Live tables stream output to Kafka

Hello,Wanted to Know if we can write the stream output to a Kafka topic  in DLT pipeline?Please let me know.Thankyou.

  • 3824 Views
  • 5 replies
  • 0 kudos
Latest Reply
mtajmouati
Contributor
  • 0 kudos

Hi ! Ensure your code is set up to use these libraries. Here is the complete example:  Navigate to your cluster configuration:Go to your Databricks workspace.Click on "Clusters" and select your cluster.Go to the "Libraries" tab.  Install the necessar...

  • 0 kudos
4 More Replies
HoussemBL
by New Contributor III
  • 430 Views
  • 5 replies
  • 0 kudos

Databricks bundle repository permission

Hi everyone,How can I use Databricks Asset Bundle configuration to set permissions on the workspace folder (root_path) where my code is deployed, in order to protect it from manual changes by users?My current bundle config for production looks like t...

  • 430 Views
  • 5 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @HoussemBL ,In Databricks there is a users group to which by deafult all the workspace users belong (in UI displayed All workspace users). That group has default permission that cannot be revoked at the top-level Shared folder. So, any new folder ...

  • 0 kudos
4 More Replies
htu
by Contributor
  • 20803 Views
  • 25 replies
  • 28 kudos

Installing Databricks Connect breaks pyspark local cluster mode

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remot...

  • 20803 Views
  • 25 replies
  • 28 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 28 kudos

Hi Utu,try doing something like this , wrap import within a fixture itself.import osimport pytestfrom pyspark.sql import SparkSession_local_test = True@pytest.fixture(scope='session')def spark():    if 'DATABRICKS_RUNTIME_VERSION' in os.environ:     ...

  • 28 kudos
24 More Replies
gayatrikhatale
by Contributor
  • 1267 Views
  • 6 replies
  • 7 kudos

Resolved! Retrieving Last Data and Metadata Refresh DateTimes of table in Databricks

Hi, I had a query regarding how to accurately retrieve the last data refresh datetime and last metadata refresh datetime for tables in Databricks. Currently, the only reliable approach I am aware of is using the DESCRIBE HISTORY command with filters ...

  • 1267 Views
  • 6 replies
  • 7 kudos
Latest Reply
gayatrikhatale
Contributor
  • 7 kudos

Thank you @szymon_dybczak , @siva-anantha !

  • 7 kudos
5 More Replies
Gauri_Bhandari
by New Contributor
  • 343 Views
  • 1 replies
  • 2 kudos

inconsistency between the analyze and transpile commands with respect to SSIS support.

Hi Team,I'm using Databricks Labs LakeBridge and noticed an inconsistency between the analyze and transpile commands with respect to SSIS support.Analyzer:When I run the analyze command, I’m able to select SSIS as the source technology, and the tool ...

  • 343 Views
  • 1 replies
  • 2 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 2 kudos

Hi Gauri,As I can read but someone from databricks can also confirm: as of now, SSIS is not supported as a source dialect for the transpile command in Databricks Labs LakeBridge. The analyze command supports SSIS for assessment and reporting, but the...

  • 2 kudos
AanchalSoni
by New Contributor III
  • 1301 Views
  • 5 replies
  • 3 kudos

Resolved! Unable to use Auto Loader for External Location in Community Edition

Using Community EditionI'm trying to create a pipeline(streaming) using auto loader (accessing external location) and each time my select query is thrown this error- " An error occurred while calling t.analyzeAndFormatResult. : java.lang.UnsupportedO...

  • 1301 Views
  • 5 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @AanchalSoni ,To make it clear - are you using Free Edition or Community Edition? Because people confuses them all the time so it's better to clarify this first.

  • 3 kudos
4 More Replies
mattstyl-ff
by New Contributor II
  • 850 Views
  • 8 replies
  • 1 kudos

Error with AutoLoader pipeline ingesting from external location: LOCATION_OVERLAP

Hello,I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in those posts, the error was related to the d...

  • 850 Views
  • 8 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

also try doing this :csv_file_path = "abfss://storage-dm-int-container@devdomaindmdbxint01.dfs.core.windows.net/raw_data/dummy.csv" add another folder for file 

  • 1 kudos
7 More Replies
RikL
by New Contributor II
  • 259 Views
  • 1 replies
  • 1 kudos

Resolved! PipelineSpec object does not seem to show event_log when defining a pipeline with DAB

Hi all, I am looking for help on a very specific subject.I am trying to access the event_log property (EventLogSpec) of an object from PipelineSpec that I get by running a query on the Workspace Client, which is part of the Databricks Python sdk:w.pi...

  • 259 Views
  • 1 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

hi @RikL  Thank you for reaching out. It doesn't seem you are doing anything wrong. Per the documentation, indeed, the event_log spec should be retrieved when you run the pipeline.get, then the spec. I was able to test and confirm the correct behavio...

  • 1 kudos
alexbush-mas
by New Contributor
  • 458 Views
  • 1 replies
  • 1 kudos

Custom stream-stream join using transformWithState - expanded example

Hi all,Just wondering if anyone has more information / an expanded example of the "Custom stream-stream join using transformWithState" example on Stateful applications page: Example stateful applications | Databricks on AWSI'm looking to implement so...

  • 458 Views
  • 1 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 1 kudos

Hello @alexbush-mas Good day!! Unioning the streams is the standard method for feeding multiple input streams into a single transformWithStateInPandas operation for custom stream-stream joins, so your intuition is correct. After grouping the input by...

  • 1 kudos
Sinkrad
by New Contributor II
  • 2358 Views
  • 3 replies
  • 4 kudos

Resolved! Permission denied on schema evolution view

Hey Databricks community,We are registering views in Databricks with schema evolution, however these views fail when a user (other than the owner) is the first to query the view after the schema change.PERMISSION_DENIED: User is not an owner of Table...

  • 2358 Views
  • 3 replies
  • 4 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 4 kudos

Hi @Sinkrad, There is an internal case about this: ES-1260035, and a fix is going to be out for this quarter. The behavior where only the view owner can update the view definition after a schema change is not intended. It is expected that the view sh...

  • 4 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels