Data Engineering

Forum Posts

Sorted by:

by jonhieb • New Contributor III

02-04-2025 6:03:51 AM

647 Views
4 replies
0 kudos

Resolved! [Databricks Asset Bundles] Triggering Delta Live Tables

I would like to know how to schedule a DLT pipeline using DAB's.I'm trying to trigger a Delta Live Table pipeline using Databricks Asset Bundles. Below is my YAML code:resources: pipelines: data_quality_pipelines: name: data_quality_pipeline...

Data Engineering

647 Views
4 replies
0 kudos

02-04-2025 6:03:51 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

02-04-2025 6:08:30 AM

0 kudos

As of now, Databricks Asset Bundles do not support direct scheduling of DLT pipelines using cron expressions within the bundle configuration. Instead, you can achieve scheduling by creating a Databricks job that triggers the DLT pipeline and then sch...

0 kudos

02-04-2025 6:08:30 AM

3 More Replies

by HoussemBL • New Contributor III

Monday

234 Views
4 replies
1 kudos

DLT Pipeline & Automatic Liquid Clustering Syntax

Hi everyone,I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.However, I'm having trouble figuring o...

Data Engineering

234 Views
4 replies
1 kudos

Monday

View Replies

Latest Reply

RiyazAli
Valued Contributor II

Wednesday

1 kudos

Hey @HoussemBL You're correct about DLT not support Auto LC. You can assign any columns in the cluster_by but if you set it to auto, it will throw an error complaining about auto not being present in the list of columns.Maybe, altering thee table to ...

1 kudos

Wednesday

3 More Replies

by manish_tanwar • New Contributor II

a week ago

336 Views
5 replies
3 kudos

Databricks streamlit app for data ingestion in a table

I am using this code in a notebook to save data row on table. And it is working perfectly. And now I am using the same function to save data from a chatbot in streamlit chatbot application of databricks and I am getting error for ERROR ##############...

Data Engineering

336 Views
5 replies
3 kudos

a week ago

View Replies

Latest Reply

pradeepvatsvk
New Contributor III

Tuesday

3 kudos

Hi @manish_tanwar how can we work with streamlit apps in databricks , i have a use case where i want to ingest data from different csv files and ingest it into delta tables

3 kudos

Tuesday

4 More Replies

by harman • New Contributor II

2 weeks ago

188 Views
3 replies
0 kudos

Serverless Compute

Hi Team,We are using Azure Databricks Serverless Compute to execute workflows and notebooks. My question is :Does serverless compute support Maven library installations?I appreciate any insights or suggestions you might have. Thanks in advance for yo...

Data Engineering

188 Views
3 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

BigRoux
Databricks Employee

Wednesday

0 kudos

So, it appears that the there is conflicting documentation on this topic. I checked with our internal documention and what I found was that you CANNOT install JDBC or ODBC drivers on Serverless. See limitations here: https://docs.databricks.com/aws...

0 kudos

Wednesday

2 More Replies

by annagriv • New Contributor II

05-11-2023 12:30:02 PM

3395 Views
6 replies
5 kudos

Resolved! How to get git commit ID of the repository the script runs on?

I have a script in a repository on DataBricks. The script should log the current git commit ID of the repository. How can that be implemented? I tried various command, for example: result = subprocess.run(['git', 'rev-parse', 'HEAD'], stdout=subproce...

Data Engineering

3395 Views
6 replies
5 kudos

05-11-2023 12:30:02 PM

View Replies

Latest Reply

bestekov
New Contributor II

Wednesday

5 kudos

Here is a version of @vr 's solution that can be run from any folder within the rep. It uses regex to extract the root from the path in the form of \Repos\<username>\<some-repo:import os import re from databricks.sdk import WorkspaceClient w = Worksp...

5 kudos

Wednesday

5 More Replies

by Vasu_Kumar_T • New Contributor II

Tuesday

124 Views
3 replies
0 kudos

Default Code generated by Bladebridge converter

Hello all ,1. What is the default code generated by Bladebridge converter.for eg : When we migrate Teradat, Oracle to Databricks using Bladebridge whats the default code base.2.If the generated code is PYSPARK, do I have any control over the generate...

Data Engineering

124 Views
3 replies
0 kudos

Tuesday

View Replies

Latest Reply

RiyazAli
Valued Contributor II

Wednesday

0 kudos

Hello @Vasu_Kumar_T - We've used Bladebridge to convert from Redshift to Databricks. Bladebridge can definetly convert to Spark SQL, not sure about Scala Spark though.

0 kudos

Wednesday

2 More Replies

by AsgerLarsen • New Contributor III

Wednesday

246 Views
7 replies
0 kudos

Using yml variables as table owner through SQL

I'm trying to change the ownership of a table in the Unity Catalog created through a SQL script. I want to do this though code.I'm using a standard databricks bundle setup, which uses three workspaces: dev, test and prod.I have created a variable in ...

Data Engineering

246 Views
7 replies
0 kudos

Wednesday

View Replies

Latest Reply

-werners-
Esteemed Contributor III

Wednesday

0 kudos

I guess that is a safe bet.Good luck!

0 kudos

Wednesday

6 More Replies

by Vasu_Kumar_T • New Contributor II

Wednesday

47 Views
0 replies
0 kudos

ODI 12C to Databricks equvivalent

Hello All,We are planning to convert from ODI12C to data bricks equivalentsWhat are the steps involved, what are the limitations in this case Thanks,Vasu Kumar T

Data Engineering

47 Views
0 replies
0 kudos

Wednesday

by Aatma • New Contributor

08-20-2024 5:50:35 PM

1292 Views
3 replies
1 kudos

Resolved! DABs require library dependancies from GitHub private repository.

developing a python wheel file using DABs which require library dependancies from GitHub private repository. Please help me understand how to setup the git user and token in the resource.yml file and how to authenticate the GitHub package.pip install...

Data Engineering

1292 Views
3 replies
1 kudos

08-20-2024 5:50:35 PM

View Replies

Latest Reply

sandy311
New Contributor III

2 weeks ago

1 kudos

Could you please give a detailed example?how to define env varaibles? BUNDLE_VAR?

1 kudos

2 weeks ago

2 More Replies

by minhhung0507 • Contributor III

Tuesday

78 Views
1 replies
0 kudos

Handling Hanging Pipelines in Real-Time Environments: Leveraging Databricks’ Idle Event Monitoring

Hi everyone,I’m running multiple real-time pipelines on Databricks using a single job that submits them via a thread pool. While most pipelines are running smoothly, I’ve noticed that a few of them occasionally get “stuck” or hang for several hours w...

Data Engineering

78 Views
1 replies
0 kudos

Tuesday

View Replies

Latest Reply

-werners-
Esteemed Contributor III

Wednesday

0 kudos

may I ask why you use threadpools? with jobs you can define multiple tasks which do the same.I'm asking because threadpools and spark resource management can intervene with each other.

0 kudos

Wednesday

by jeremy98 • Contributor III

Wednesday

63 Views
1 replies
0 kudos

how to fallback the entire job in case of failure of the cluster?

Hi community,My team and I are using a job that is triggered based on dynamic scheduling, with the schedule defined within some of the job's tasks. However, this job is attached to a cluster that is always running and never terminated.I understand th...

Data Engineering

63 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

RiyazAli
Valued Contributor II

Wednesday

0 kudos

hi @jeremy98 Fundamentally, changing some design patterns would help save cluster costs and job failures due to cluster crashing.I understand that you're using a always running cluster to run your workflow, I'm not sure about the usecase but I'd sugg...

0 kudos

Wednesday

by jar • New Contributor III

Monday

284 Views
5 replies
1 kudos

Continuous workflow job creating new job clusters?

Hey.I am testing a continuous workflow job which executes the same notebook, so rather simple and it works well. It seems like it re-creates the job cluster for every iteration, instead of just re-using the one created at the first execution. Is that...

Data Engineering

284 Views
5 replies
1 kudos

Monday

View Replies

Latest Reply

jar
New Contributor III

Tuesday

1 kudos

Thank you all for your answers!I did use dbutils.notebook.run() inside a while-loop at first but ultimately would run into OOM errors, even if I tried writing in a clearing of cache after each iteration. I'm curious @RefactorDuncan, if you don't mind...

1 kudos

Tuesday

4 More Replies

by Jaclaglez13 • New Contributor II

03-20-2025 9:15:58 AM

323 Views
2 replies
2 kudos

[UNRESOLVED_ROUTINE] Cannot resolve function `date_format`

Hi all,We are getting the following error log in a Workflow:AnalysisException: [UNRESOLVED_ROUTINE] Cannot resolve function `date_format` on search path [`system`.`builtin`, `system`.`session`]. SQLSTATE: 42883The Workflow consists in different noteb...

Data Engineering

323 Views
2 replies
2 kudos

03-20-2025 9:15:58 AM

View Replies

Latest Reply

Brahmareddy
Honored Contributor III

03-20-2025 9:47:34 AM

2 kudos

Hi Jaclaglez13,How are you doing today?, As per my understanding, yeah, this issue usually happens when Unity Catalog affects function resolution across different tasks in a Workflow. Since your first task runs fine but the second one fails, it’s lik...

2 kudos

03-20-2025 9:47:34 AM

1 More Replies

by rodrigocms • New Contributor

12-01-2023 1:46:16 PM

2744 Views
1 replies
0 kudos

Get information from Power BI via XMLA

Hello everyone I am trying to get information from Power BI semantic models via XMLA endpoint using PySpark in Databricks.Can someone help me with that?tks

Data Engineering

2744 Views
1 replies
0 kudos

12-01-2023 1:46:16 PM

View Replies

Latest Reply

CacheMeOutside
New Contributor

Tuesday

0 kudos

I would like to see this too.

0 kudos

Tuesday

by PunithRaj • New Contributor

12-15-2022 6:24:42 AM

5686 Views
2 replies
2 kudos

How to read a PDF file from Azure Datalake blob storage to Databricks

I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...

Data Engineering

5686 Views
2 replies
2 kudos

12-15-2022 6:24:42 AM

View Replies

Latest Reply

Mykola_Melnyk
New Contributor III

Tuesday

2 kudos

@PunithRaj You can try to use PDF DataSource for Apache Spark for read pdf files directly to the DataFrame. So you will have extracted text and rendered page as image in output. More details here: https://stabrise.com/spark-pdf/df = spark.read.forma...

2 kudos

Tuesday

1 More Replies

User

Count

1611

768

348

286

252

Databricks Community

Forum Posts

Resolved! [Databricks Asset Bundles] Triggering Delta Live Tables

DLT Pipeline & Automatic Liquid Clustering Syntax

Databricks streamlit app for data ingestion in a table

Serverless Compute

Resolved! How to get git commit ID of the repository the script runs on?

Default Code generated by Bladebridge converter

Using yml variables as table owner through SQL

ODI 12C to Databricks equvivalent

Resolved! DABs require library dependancies from GitHub private repository.

Handling Hanging Pipelines in Real-Time Environments: Leveraging Databricks’ Idle Event Monitoring

how to fallback the entire job in case of failure of the cluster?

Continuous workflow job creating new job clusters?

[UNRESOLVED_ROUTINE] Cannot resolve function `date_format`

Get information from Power BI via XMLA

How to read a PDF file from Azure Datalake blob storage to Databricks

Join Us as a Local Community Builder!

SERVERLESS SQL WAREHOUSE

Unity Catalog Table in Databricks Asset Bundle

Databricks data engineer associate exam

How to delete/empty notebook output

Databricks Cluster Policies