cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yanchr
by Visitor
  • 34 Views
  • 2 replies
  • 1 kudos

Migrating external tables to managed tables from HMS to UC

I think the easiest way to do that is to use DEEP CLONE. However, since the SET MANAGED approach was introduced in DBR 17, wouldn't it be better to first migrate the table as external and then convert it to managed using SET MANAGED? The Databricks a...

Data Engineering
migration
UC
Unity Catalog
  • 34 Views
  • 2 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 1 kudos

Hello @yanchr !Yes if the table can first be registered as a UC external table, then ALTER TABLE ... SET MANAGED is now the better practice than doing a direct DEEP CLONE especially on DBR 17+ / serverless.It is recommended to use SET MANAGED for con...

  • 1 kudos
1 More Replies
Karsten
by Visitor
  • 52 Views
  • 5 replies
  • 3 kudos

Tableau Prep to Databricks Error

Hi all,When writing from Tableau Prep to Databricks on Azure, we receive the following error message:[Databricks][Hardy] (52) Error communicating with the service: 403 Interestingly, reading from Databricks works without any issues. The Databricks us...

  • 52 Views
  • 5 replies
  • 3 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 3 kudos

Hello @Karsten !I don't think it is a read permission issue because Tableau prep write path is different from its read path.Tableau prep writes database output in stages so it generates rows, writes them to a temporary table/staging area then moves t...

  • 3 kudos
4 More Replies
pepco
by New Contributor III
  • 98 Views
  • 7 replies
  • 5 kudos

DAB git - sometimes doesn't see modules

We are using DABs to deploy our jobs. DABs have source set to git branch or git tag depending on the environment.  Repository is structured in mono repo fashion. We don't use wheels for our modules. Sometimes when the jobs run they "randomly" fail th...

  • 98 Views
  • 7 replies
  • 5 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 5 kudos

Hello @pepco !I will share with you my personal experience about a very similar behaviour I got like you.If you check DBKS doc you will find that  git_source and task source: GIT are not recommended for DAB because local relative paths may not point ...

  • 5 kudos
6 More Replies
Pradip007
by Visitor
  • 42 Views
  • 2 replies
  • 1 kudos

Why does the same Databricks SQL query take different time to run?

Hi all,I’m using Databricks Free Edition with a serverless SQL warehouse. I’m the only user in this workspace.Warehouse config:- Type: Serverless SQL- Size: Large- Max clusters: 2Query:SELECTw.workspace_name,ROUND(SUM(u.usage_quantity * COALESCE(lp.p...

  • 42 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @Pradip007, This is expected in the free edition. In the free edition, your queries run on a shared regional serverless pool with a small effective warehouse capacity and best-effort access. Even if you’re the only user in your workspace, you’re s...

  • 1 kudos
1 More Replies
yit337
by Contributor
  • 38 Views
  • 2 replies
  • 2 kudos

How to limit max concurrent tasks runs in a job?

I read somewhere that there's a max_concurrent_task_runs property, but can't find it anywhere in the docs. So, how to limit the maximum concurrent tasks run in a job?

  • 38 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @yit337, There isn’t a max_concurrent_task_runs setting in Databricks Jobs. The only setting you get is max_concurrent_runs, which limits how many runs of the same job can be active at once, plus a workspace-wide limit of 2000 concurrent task runs...

  • 2 kudos
1 More Replies
jmeer
by New Contributor II
  • 2450 Views
  • 6 replies
  • 2 kudos

Cannot click Compute tab

Hi, I want to change the cluster I am using. However, when I click on the "Compute" tab on the platform, I get automatically redirected to the "SQL Warehouses" page. I am not able to click and enter the "Compute" page. How can I solve this? Thank you

  • 2450 Views
  • 6 replies
  • 2 kudos
Latest Reply
vakulgoyal
Visitor
  • 2 kudos

THIS IS NOT RELATED TO A FREE ACCOUNTI ran into the same issue—there was no option to create general compute, only SQL Warehouse or serverless.Solution:When creating a Databricks resource in the Azure portal, it defaults to a serverless workspace, wh...

  • 2 kudos
5 More Replies
ChristianRRL
by Honored Contributor
  • 116 Views
  • 2 replies
  • 2 kudos

Declarative Automation Bundle - Reusable job_cluster configuration

Hi there, running into some trouble abstracting job_clusters configurations to improve reusability. At the moment, I have many job yaml files that require the following configuration:What would be the best approach(es) to remove this configuration fr...

ChristianRRL_0-1777669403132.png
  • 116 Views
  • 2 replies
  • 2 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 2 kudos

Hello @ChristianRRL My doubt about your issue is happening in cluster_definitions.yml because it is not only defining a reusable cluster profile it is also redefining the same jobs that already exist in the individual fleet_*.yml files.Why ? because ...

  • 2 kudos
1 More Replies
tt_921
by New Contributor II
  • 91 Views
  • 2 replies
  • 2 kudos

Resolved! Lakeflow Declarative Pipeline queue

In the January 2026 release notes, it was announced that: "Pipelines now support queued execution mode, where multiple update requests are automatically queued and executed sequentially instead of failing with conflicts. This simplifies operations fo...

Data Engineering
DAB
pipeline
  • 91 Views
  • 2 replies
  • 2 kudos
Latest Reply
tt_921
New Contributor II
  • 2 kudos

Thank you very much for the detailed response! We unfortunately can't proceed with option 1, as we do require multiple places that can trigger the pipeline (an API call to the parent job, and a direct API call to the pipeline itself). This is due to ...

  • 2 kudos
1 More Replies
Areqio
by New Contributor
  • 355 Views
  • 3 replies
  • 1 kudos

How to Stream Azure event hub to databricks delta table

I am trying to stream my IoT data from azure event hub to databricks. Im running Databricks runtime 17.3 LTS with scala 2.13. 

  • 355 Views
  • 3 replies
  • 1 kudos
Latest Reply
rohan22sri
New Contributor III
  • 1 kudos

Hi @Areqio ,def getKafkaOptions( env: String, ehNameSpace: String, ehName: String, scopeName: String, kafkaOffset: String, ehConnKey: String, maxOffsetsPerTrigger: String = "50000" ): Map[String, String] = { val connStr = dbutils.sec...

  • 1 kudos
2 More Replies
staskh
by Contributor
  • 173 Views
  • 3 replies
  • 4 kudos

Resolved! Delta update/insert from multiple source tables

[Sorry for a novice question.]I have multiple tables periodically updated from external sources (including insert, update, or delete). I need to update a target table, which is an outer join from multiple source tables without rewriting it each time....

  • 173 Views
  • 3 replies
  • 4 kudos
Latest Reply
staskh
Contributor
  • 4 kudos

Thank you for such informtive and helpfull response!

  • 4 kudos
2 More Replies
aonurdemir
by Contributor
  • 172 Views
  • 2 replies
  • 1 kudos

Liquid Clustering file pruning breaks when filtering on a high NULL numeric column in dataSkipping

EnvironmentCloud: AWSCompute: ServerlessTable: a_big_tableTable type: Streaming Table (SDP pipeline)Table size: 641 GB, 6,210 filesLiquid Clustering columns: [event_time, integer_userId]delta.dataSkippingStatsColumns:event_time, integer_userId, integ...

  • 172 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hello @aonurdemir , I looked into your query and have compiled some helpful tips: I don't have direct access to your workspace internals, so I can't prove this definitively. But what you're seeing is consistent with how Delta's stats-based data skipp...

  • 1 kudos
1 More Replies
AlexSantiago
by New Contributor II
  • 17573 Views
  • 25 replies
  • 4 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 17573 Views
  • 25 replies
  • 4 kudos
Latest Reply
ElevateNew
New Contributor
  • 4 kudos

In this context, Elevate New is relevant as a digital content platform that covers technology trends, online platforms, software ecosystems, and   modern internet-based solutions. As developers and tech communities continue discussing APIs, cloud ser...

  • 4 kudos
24 More Replies
FantineM
by New Contributor
  • 351 Views
  • 4 replies
  • 3 kudos

Resolved! Vector index not syncing: DELTA_UNSUPPORTED_TIME_TRAVEL_BEYOND_DELETED_FILE_RETENTION_DURATION

Hi All,Lately I have had issues with my vector search index not syncing.The associated pipeline fails to create with error:failed to resolve flow: '__online_index_view'. com.databricks.sql.transaction.tahoe.DeltaAnalysisException: [DELTA_UNSUPPORTED_...

  • 351 Views
  • 4 replies
  • 3 kudos
Latest Reply
FantineM
New Contributor
  • 3 kudos

Thanks again for the kind help!

  • 3 kudos
3 More Replies
Kirankumarbs
by Contributor III
  • 905 Views
  • 4 replies
  • 2 kudos

Resolved! Serverless notebook idle timeout — is it configurable? What exactly am I paying for? Really Ambiguos

Been running notebooks on serverless compute and watching the indicator in the UI. After my last cell finishes, it goes from dark green to this fading green, sits there for maybe 5-10 minutes, then finally goes grey. Pretty sure I'm paying for that e...

  • 905 Views
  • 4 replies
  • 2 kudos
Latest Reply
hali
New Contributor
  • 2 kudos

I have the same concern and feedback as OP. I wish there's a way to set auto-terminate after the serverless cluster has been idle for X minutes and not be billed if our users left their notebooks attached to serverless compute and forgot to hit "term...

  • 2 kudos
3 More Replies
MrJava
by New Contributor III
  • 21110 Views
  • 18 replies
  • 13 kudos

How to know, who started a job run?

Hi there!We have different jobs/workflows configured in our Databricks workspace running on AWS and would like to know who actually started the job run? Are they started by a user or a service principle using curl?Currently one can only see, who is t...

  • 21110 Views
  • 18 replies
  • 13 kudos
Latest Reply
saibabu
New Contributor
  • 13 kudos

Any update on this feature ?

  • 13 kudos
17 More Replies
Labels