cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jonhieb
by New Contributor III
  • 647 Views
  • 4 replies
  • 0 kudos

Resolved! [Databricks Asset Bundles] Triggering Delta Live Tables

I would like to know how to schedule a DLT pipeline using DAB's.I'm trying to trigger a Delta Live Table pipeline using Databricks Asset Bundles. Below is my YAML code:resources:  pipelines:    data_quality_pipelines:      name: data_quality_pipeline...

  • 647 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

As of now, Databricks Asset Bundles do not support direct scheduling of DLT pipelines using cron expressions within the bundle configuration. Instead, you can achieve scheduling by creating a Databricks job that triggers the DLT pipeline and then sch...

  • 0 kudos
3 More Replies
HoussemBL
by New Contributor III
  • 234 Views
  • 4 replies
  • 1 kudos

DLT Pipeline & Automatic Liquid Clustering Syntax

Hi everyone,I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.However, I'm having trouble figuring o...

  • 234 Views
  • 4 replies
  • 1 kudos
Latest Reply
RiyazAli
Valued Contributor II
  • 1 kudos

Hey @HoussemBL You're correct about DLT not support Auto LC. You can assign any columns in the cluster_by but if you set it to auto, it will throw an error complaining about auto not being present in the list of columns.Maybe, altering thee table to ...

  • 1 kudos
3 More Replies
manish_tanwar
by New Contributor II
  • 336 Views
  • 5 replies
  • 3 kudos

Databricks streamlit app for data ingestion in a table

I am using this code in a notebook to save data row on table. And it is working perfectly. And now I am using the same function to save data from a chatbot in streamlit chatbot application of databricks and I am getting error for ERROR ##############...

  • 336 Views
  • 5 replies
  • 3 kudos
Latest Reply
pradeepvatsvk
New Contributor III
  • 3 kudos

Hi @manish_tanwar  how can we work with streamlit apps in databricks , i have a use case where i want to ingest data from different csv files and ingest it into delta tables 

  • 3 kudos
4 More Replies
harman
by New Contributor II
  • 188 Views
  • 3 replies
  • 0 kudos

Serverless Compute

Hi Team,We are using Azure Databricks Serverless Compute to execute workflows and notebooks. My question is :Does serverless compute support Maven library installations?I appreciate any insights or suggestions you might have. Thanks in advance for yo...

  • 188 Views
  • 3 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

So, it appears that the there is conflicting documentation on this topic.  I checked with our internal documention and what I found was that you CANNOT install JDBC or ODBC drivers on Serverless.  See limitations here: https://docs.databricks.com/aws...

  • 0 kudos
2 More Replies
annagriv
by New Contributor II
  • 3395 Views
  • 6 replies
  • 5 kudos

Resolved! How to get git commit ID of the repository the script runs on?

I have a script in a repository on DataBricks. The script should log the current git commit ID of the repository. How can that be implemented? I tried various command, for example: result = subprocess.run(['git', 'rev-parse', 'HEAD'], stdout=subproce...

  • 3395 Views
  • 6 replies
  • 5 kudos
Latest Reply
bestekov
New Contributor II
  • 5 kudos

Here is a version of @vr 's solution that can be run from any folder within the rep. It uses regex to extract the root from the path in the form of \Repos\<username>\<some-repo:import os import re from databricks.sdk import WorkspaceClient w = Worksp...

  • 5 kudos
5 More Replies
Vasu_Kumar_T
by New Contributor II
  • 124 Views
  • 3 replies
  • 0 kudos

Default Code generated by Bladebridge converter

Hello all ,1. What is the default code generated by Bladebridge converter.for eg : When we migrate Teradat, Oracle to Databricks using Bladebridge whats the default code base.2.If the generated code is PYSPARK, do I have any control over the generate...

  • 124 Views
  • 3 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor II
  • 0 kudos

Hello @Vasu_Kumar_T - We've used Bladebridge to convert from Redshift to Databricks. Bladebridge can definetly convert to Spark SQL, not sure about Scala Spark though.

  • 0 kudos
2 More Replies
AsgerLarsen
by New Contributor III
  • 246 Views
  • 7 replies
  • 0 kudos

Using yml variables as table owner through SQL

I'm trying to change the ownership of a table in the Unity Catalog created through a SQL script. I want to do this though code.I'm using a standard databricks bundle setup, which uses three workspaces: dev, test and prod.I have created a variable in ...

  • 246 Views
  • 7 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I guess that is a safe bet.Good luck!

  • 0 kudos
6 More Replies
Aatma
by New Contributor
  • 1292 Views
  • 3 replies
  • 1 kudos

Resolved! DABs require library dependancies from GitHub private repository.

developing a python wheel file using DABs which require library dependancies from GitHub private repository. Please help me understand how to setup the git user and token in the resource.yml file and how to authenticate the GitHub package.pip install...

  • 1292 Views
  • 3 replies
  • 1 kudos
Latest Reply
sandy311
New Contributor III
  • 1 kudos

Could you please give a detailed example?how to define env varaibles? BUNDLE_VAR?

  • 1 kudos
2 More Replies
minhhung0507
by Contributor III
  • 78 Views
  • 1 replies
  • 0 kudos

Handling Hanging Pipelines in Real-Time Environments: Leveraging Databricks’ Idle Event Monitoring

Hi everyone,I’m running multiple real-time pipelines on Databricks using a single job that submits them via a thread pool. While most pipelines are running smoothly, I’ve noticed that a few of them occasionally get “stuck” or hang for several hours w...

  • 78 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

may I ask why you use threadpools?  with jobs you can define multiple tasks which do the same.I'm asking because threadpools and spark resource management can intervene with each other.

  • 0 kudos
jeremy98
by Contributor III
  • 63 Views
  • 1 replies
  • 0 kudos

how to fallback the entire job in case of failure of the cluster?

Hi community,My team and I are using a job that is triggered based on dynamic scheduling, with the schedule defined within some of the job's tasks. However, this job is attached to a cluster that is always running and never terminated.I understand th...

  • 63 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor II
  • 0 kudos

hi @jeremy98 Fundamentally, changing some design patterns would help save cluster costs and job failures due to cluster crashing.I understand that you're using a always running cluster to run your workflow, I'm not sure about the usecase but I'd sugg...

  • 0 kudos
jar
by New Contributor III
  • 284 Views
  • 5 replies
  • 1 kudos

Continuous workflow job creating new job clusters?

Hey.I am testing a continuous workflow job which executes the same notebook, so rather simple and it works well. It seems like it re-creates the job cluster for every iteration, instead of just re-using the one created at the first execution. Is that...

  • 284 Views
  • 5 replies
  • 1 kudos
Latest Reply
jar
New Contributor III
  • 1 kudos

Thank you all for your answers!I did use dbutils.notebook.run() inside a while-loop at first but ultimately would run into OOM errors, even if I tried writing in a clearing of cache after each iteration. I'm curious @RefactorDuncan, if you don't mind...

  • 1 kudos
4 More Replies
Jaclaglez13
by New Contributor II
  • 323 Views
  • 2 replies
  • 2 kudos

[UNRESOLVED_ROUTINE] Cannot resolve function `date_format`

Hi all,We are getting the following error log in a Workflow:AnalysisException: [UNRESOLVED_ROUTINE] Cannot resolve function `date_format` on search path [`system`.`builtin`, `system`.`session`]. SQLSTATE: 42883The Workflow consists in different noteb...

  • 323 Views
  • 2 replies
  • 2 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 2 kudos

Hi Jaclaglez13,How are you doing today?, As per my understanding, yeah, this issue usually happens when Unity Catalog affects function resolution across different tasks in a Workflow. Since your first task runs fine but the second one fails, it’s lik...

  • 2 kudos
1 More Replies
PunithRaj
by New Contributor
  • 5686 Views
  • 2 replies
  • 2 kudos

How to read a PDF file from Azure Datalake blob storage to Databricks

I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...

  • 5686 Views
  • 2 replies
  • 2 kudos
Latest Reply
Mykola_Melnyk
New Contributor III
  • 2 kudos

@PunithRaj You can try to use  PDF DataSource for Apache Spark for read pdf files directly to the DataFrame. So you will have extracted text and rendered page as image in output. More details here: https://stabrise.com/spark-pdf/df = spark.read.forma...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels