cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

eniwoke
by Contributor II
  • 37 Views
  • 1 replies
  • 2 kudos

Databricks Bundle Inspector: A VS Code extension for local bundle review

Hi all,I have been working with Databricks Asset Bundles (now Declarative Automation Bundles) and kept running into the same friction point: there is no easy way to visually inspect a bundle locally before you deploy.Of course, you can read the YAML,...

  • 37 Views
  • 1 replies
  • 2 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 2 kudos

Hi @eniwoke  !Glad to see that ! I will try it for sure If you need any contribution from my side please let me know !

  • 2 kudos
mnissen1337
by Visitor
  • 52 Views
  • 1 replies
  • 1 kudos

Best practice for creating SQL views on top of continuously running Spark Structured Streaming jobs

I am working with a continuously running Spark Structured Streaming job in Databricks, deployed as a standalone job using continuous trigger mode via Databricks Asset Bundles (DABs).On top of the streaming output table (created via writeStream), I wa...

  • 52 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 1 kudos

Hello @mnissen1337  !Have you though about decoupling view DDL from the continuously running streaming job ?Do not make the view creation a downstream task of the stream because continuous jobs are not meant to reach success and in DBKS  continuous t...

  • 1 kudos
pepco
by New Contributor III
  • 127 Views
  • 8 replies
  • 6 kudos

DAB git - sometimes doesn't see modules

We are using DABs to deploy our jobs. DABs have source set to git branch or git tag depending on the environment.  Repository is structured in mono repo fashion. We don't use wheels for our modules. Sometimes when the jobs run they "randomly" fail th...

  • 127 Views
  • 8 replies
  • 6 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 6 kudos

Hello @pepco !I will share with you my personal experience about a very similar behaviour I got like you.If you check DBKS doc you will find that  git_source and task source: GIT are not recommended for DAB because local relative paths may not point ...

  • 6 kudos
7 More Replies
ChristianRRL
by Honored Contributor
  • 35 Views
  • 1 replies
  • 1 kudos

run_if condition to handle prior task excluded?

Hi there, I kind of know the answer here but want to check in case I'm missing anything (or else maybe vent slightly and hope for new functionality in the future).Basically, I'm looking for a way to run a task if either (A) the prior step ran success...

ChristianRRL_1-1778008957285.png ChristianRRL_2-1778009198388.png
  • 35 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 1 kudos

Hello @ChristianRRL  !You are totally righy. With the current DBKS dependency semantics, a downstream task cannot run when all of its direct upstream dependencies are excluded regardless of the run_if option. If you check the doc it explicitly says (...

  • 1 kudos
yanchr
by Visitor
  • 62 Views
  • 2 replies
  • 2 kudos

Migrating external tables to managed tables from HMS to UC

I think the easiest way to do that is to use DEEP CLONE. However, since the SET MANAGED approach was introduced in DBR 17, wouldn't it be better to first migrate the table as external and then convert it to managed using SET MANAGED? The Databricks a...

Data Engineering
migration
UC
Unity Catalog
  • 62 Views
  • 2 replies
  • 2 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 2 kudos

Hello @yanchr !Yes if the table can first be registered as a UC external table, then ALTER TABLE ... SET MANAGED is now the better practice than doing a direct DEEP CLONE especially on DBR 17+ / serverless.It is recommended to use SET MANAGED for con...

  • 2 kudos
1 More Replies
Karsten
by Visitor
  • 96 Views
  • 5 replies
  • 3 kudos

Tableau Prep to Databricks Error

Hi all,When writing from Tableau Prep to Databricks on Azure, we receive the following error message:[Databricks][Hardy] (52) Error communicating with the service: 403 Interestingly, reading from Databricks works without any issues. The Databricks us...

  • 96 Views
  • 5 replies
  • 3 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 3 kudos

Hello @Karsten !I don't think it is a read permission issue because Tableau prep write path is different from its read path.Tableau prep writes database output in stages so it generates rows, writes them to a temporary table/staging area then moves t...

  • 3 kudos
4 More Replies
Pradip007
by Visitor
  • 60 Views
  • 2 replies
  • 1 kudos

Why does the same Databricks SQL query take different time to run?

Hi all,I’m using Databricks Free Edition with a serverless SQL warehouse. I’m the only user in this workspace.Warehouse config:- Type: Serverless SQL- Size: Large- Max clusters: 2Query:SELECTw.workspace_name,ROUND(SUM(u.usage_quantity * COALESCE(lp.p...

  • 60 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @Pradip007, This is expected in the free edition. In the free edition, your queries run on a shared regional serverless pool with a small effective warehouse capacity and best-effort access. Even if you’re the only user in your workspace, you’re s...

  • 1 kudos
1 More Replies
yit337
by Contributor
  • 48 Views
  • 2 replies
  • 2 kudos

How to limit max concurrent tasks runs in a job?

I read somewhere that there's a max_concurrent_task_runs property, but can't find it anywhere in the docs. So, how to limit the maximum concurrent tasks run in a job?

  • 48 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @yit337, There isn’t a max_concurrent_task_runs setting in Databricks Jobs. The only setting you get is max_concurrent_runs, which limits how many runs of the same job can be active at once, plus a workspace-wide limit of 2000 concurrent task runs...

  • 2 kudos
1 More Replies
jmeer
by New Contributor II
  • 2455 Views
  • 6 replies
  • 2 kudos

Cannot click Compute tab

Hi, I want to change the cluster I am using. However, when I click on the "Compute" tab on the platform, I get automatically redirected to the "SQL Warehouses" page. I am not able to click and enter the "Compute" page. How can I solve this? Thank you

  • 2455 Views
  • 6 replies
  • 2 kudos
Latest Reply
vakulgoyal
Visitor
  • 2 kudos

THIS IS NOT RELATED TO A FREE ACCOUNTI ran into the same issue—there was no option to create general compute, only SQL Warehouse or serverless.Solution:When creating a Databricks resource in the Azure portal, it defaults to a serverless workspace, wh...

  • 2 kudos
5 More Replies
ChristianRRL
by Honored Contributor
  • 122 Views
  • 2 replies
  • 2 kudos

Declarative Automation Bundle - Reusable job_cluster configuration

Hi there, running into some trouble abstracting job_clusters configurations to improve reusability. At the moment, I have many job yaml files that require the following configuration:What would be the best approach(es) to remove this configuration fr...

ChristianRRL_0-1777669403132.png
  • 122 Views
  • 2 replies
  • 2 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 2 kudos

Hello @ChristianRRL My doubt about your issue is happening in cluster_definitions.yml because it is not only defining a reusable cluster profile it is also redefining the same jobs that already exist in the individual fleet_*.yml files.Why ? because ...

  • 2 kudos
1 More Replies
tt_921
by New Contributor II
  • 109 Views
  • 2 replies
  • 2 kudos

Resolved! Lakeflow Declarative Pipeline queue

In the January 2026 release notes, it was announced that: "Pipelines now support queued execution mode, where multiple update requests are automatically queued and executed sequentially instead of failing with conflicts. This simplifies operations fo...

Data Engineering
DAB
pipeline
  • 109 Views
  • 2 replies
  • 2 kudos
Latest Reply
tt_921
New Contributor II
  • 2 kudos

Thank you very much for the detailed response! We unfortunately can't proceed with option 1, as we do require multiple places that can trigger the pipeline (an API call to the parent job, and a direct API call to the pipeline itself). This is due to ...

  • 2 kudos
1 More Replies
Areqio
by New Contributor
  • 359 Views
  • 3 replies
  • 1 kudos

How to Stream Azure event hub to databricks delta table

I am trying to stream my IoT data from azure event hub to databricks. Im running Databricks runtime 17.3 LTS with scala 2.13. 

  • 359 Views
  • 3 replies
  • 1 kudos
Latest Reply
rohan22sri
New Contributor III
  • 1 kudos

Hi @Areqio ,def getKafkaOptions( env: String, ehNameSpace: String, ehName: String, scopeName: String, kafkaOffset: String, ehConnKey: String, maxOffsetsPerTrigger: String = "50000" ): Map[String, String] = { val connStr = dbutils.sec...

  • 1 kudos
2 More Replies
staskh
by Contributor
  • 184 Views
  • 3 replies
  • 4 kudos

Resolved! Delta update/insert from multiple source tables

[Sorry for a novice question.]I have multiple tables periodically updated from external sources (including insert, update, or delete). I need to update a target table, which is an outer join from multiple source tables without rewriting it each time....

  • 184 Views
  • 3 replies
  • 4 kudos
Latest Reply
staskh
Contributor
  • 4 kudos

Thank you for such informtive and helpfull response!

  • 4 kudos
2 More Replies
aonurdemir
by Contributor
  • 174 Views
  • 2 replies
  • 1 kudos

Liquid Clustering file pruning breaks when filtering on a high NULL numeric column in dataSkipping

EnvironmentCloud: AWSCompute: ServerlessTable: a_big_tableTable type: Streaming Table (SDP pipeline)Table size: 641 GB, 6,210 filesLiquid Clustering columns: [event_time, integer_userId]delta.dataSkippingStatsColumns:event_time, integer_userId, integ...

  • 174 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hello @aonurdemir , I looked into your query and have compiled some helpful tips: I don't have direct access to your workspace internals, so I can't prove this definitively. But what you're seeing is consistent with how Delta's stats-based data skipp...

  • 1 kudos
1 More Replies
AlexSantiago
by New Contributor II
  • 17608 Views
  • 25 replies
  • 4 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 17608 Views
  • 25 replies
  • 4 kudos
Latest Reply
ElevateNew
New Contributor
  • 4 kudos

In this context, Elevate New is relevant as a digital content platform that covers technology trends, online platforms, software ecosystems, and   modern internet-based solutions. As developers and tech communities continue discussing APIs, cloud ser...

  • 4 kudos
24 More Replies
Labels