cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

furqanraza
by New Contributor
  • 83 Views
  • 2 replies
  • 0 kudos

ETL job execution failure in serverless compute

Hi,We facing an issue when executing the ELT pipeline on serverless compute. In the ETL pipeline, for some users, a task gets stuck every time we run the job. However, a similar ETL pipeline works fine for other users. Furthermore, there are canceled...

  • 83 Views
  • 2 replies
  • 0 kudos
Latest Reply
ozaaditya
New Contributor II
  • 0 kudos

Hi,Could you please share the error message or any logs you’re seeing when the task gets stuck? This will help in diagnosing the issue and identifying potential solutions.

  • 0 kudos
1 More Replies
ChristianRRL
by Valued Contributor
  • 1932 Views
  • 1 replies
  • 1 kudos

DLT Dedupping Best Practice in Medallion

Hi there, I have what may be a deceptively simple question but I suspect may have a variety of answers:What is the "right" place to handle dedupping using the medallion architecture?In my example, I already have everything properly laid out with data...

  • 1932 Views
  • 1 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

A typical recommendation is to not do any transformations as the data lands into the bronze layer (ELT). The idea is that you want your bronze layer to be as close of a representation of your source data as possible so if there are any mistakes later...

  • 1 kudos
lauraxyz
by New Contributor II
  • 287 Views
  • 8 replies
  • 4 kudos

How to execute .sql file in volume

I have giant queries (SELECT.. FROM) that i store in .sql files. I want to put those files in the Volume, and run the queries from a workflow task.I can load the file content into a 'text' format string, then run the query.  My question is,  is there...

  • 287 Views
  • 8 replies
  • 4 kudos
Latest Reply
lauraxyz
New Contributor II
  • 4 kudos

issue resolved:for .py, i was using spark, and I have to explicitly create the spark session so that it can be run properly and insert data. 

  • 4 kudos
7 More Replies
100databricks
by New Contributor III
  • 53 Views
  • 2 replies
  • 1 kudos

Resolved! How can I force a data frame to evaluate without saving it?

The problem in my hand requires me to take a set of actions on a very large data frame df_1. This set of actions results in a second data frame df_2, and from this second data frame, I have multiple downstream tasks, task_1, task_2 ...  By default, t...

  • 53 Views
  • 2 replies
  • 1 kudos
Latest Reply
filipniziol
Contributor III
  • 1 kudos

Hi @100databricks,Hi, yes, you can run df_2.cache() or df_2.persist()(df_2.cache() is a shortcut for df_2.persist(StorageLevel.MEMORY_ONLY)Here is the pseudo-code:# df_1 is your large initial DataFrame df_1 = ... # Perform expensive transformations ...

  • 1 kudos
1 More Replies
eballinger
by New Contributor
  • 77 Views
  • 2 replies
  • 0 kudos

Looking for ways to speed up DLT testing

Hi Guys,I am new to this community. I am guessing we have a typical setup (DLT tables, 3 layers - bronze, silver and gold) and while it works fine in our development environment I have always looked for ways to speed things up for testers. For exampl...

  • 77 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

There isn't a direct way to achieve this within the current DLT framework. When a DLT table is undeclared, it is designed to be removed from the pipeline, which includes the underlying data. However, there are a few strategies you can consider to spe...

  • 0 kudos
1 More Replies
darkolexis
by New Contributor
  • 893 Views
  • 2 replies
  • 1 kudos

Service Principal types in Azure Databricks

In Azure Databricks, we can create two types of Service Principals, namely:1. Databricks Managed SP2. Microsoft Entra ID Managed SP What is the difference between two, other than one being specific to single workspace, and another being usable from m...

  • 893 Views
  • 2 replies
  • 1 kudos
Latest Reply
arunprakash1986
  • 1 kudos

So, what use would it be in a situation where I have a Docker image that runs as a job using Databricks Compute. Here the Job has "Run As" which is set to a service principal, say "svc1" which is a databricks managed service principal. I believe that...

  • 1 kudos
1 More Replies
AcrobaticMonkey
by New Contributor II
  • 24 Views
  • 1 replies
  • 0 kudos

Alerts for Failed Queries in Databricks

How can we set up automated alerts to notify us when queries executed by a specific service principal fail in Databricks?

  • 24 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @AcrobaticMonkey, How are you triggering the queries using a service principal, is it through a workflow job? If so then you can use Job notifications. I would need more details but one possible solution is to implement Alerts: https://docs.databr...

  • 0 kudos
646901
by New Contributor II
  • 52 Views
  • 2 replies
  • 0 kudos

Get user who ran a job

From the databricks API/ CLI is it possible to get the user who triggered a job run programatically?The information can be found in the job "event log" and can be queried in the "audit log" but neither of these seem to have a API option. Is there a w...

  • 52 Views
  • 2 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Contributor II
  • 0 kudos

with the databricks cli you can get all infos about a job run with this command:databricks jobs get-run <run-id> replace <run-id> with your actual run-id

  • 0 kudos
1 More Replies
Nes_Hdr
by New Contributor II
  • 205 Views
  • 2 replies
  • 0 kudos

Limitations for Unity Catalog on single user access mode clusters

Hello! According to Databricks documentation on azure :"On Databricks Runtime 15.3 and below, fine-grained access control on single user compute is not supported. Specifically:You cannot access a table that has a row filter or column mask.You cannot ...

Nes_Hdr_0-1732872787713.png
  • 205 Views
  • 2 replies
  • 0 kudos
Latest Reply
Nes_Hdr
New Contributor II
  • 0 kudos

@MuthuLakshmi Thank you very much for the reply! Yes, I have serverless compute enabled on the workspace, but the cluster I am using is not serverless. The error happens whenever access mode is "Single user" (see pictures below). It also happens when...

  • 0 kudos
1 More Replies
Sanjeev
by New Contributor II
  • 72 Views
  • 1 replies
  • 0 kudos

Triggering a Databricks job more than once daily

Hi Team,I have a requirement to trigger a databricks job more than once daily, may be twice or thrice daily.I have checked the workflows but I couldn't find any option in the UI.Please advice

  • 72 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @Sanjeev, You might want to try this option:   Please refer to below document.You can use cron syntax to schedule jobs. https://docs.databricks.com/en/jobs/scheduled.html

  • 0 kudos
skarpeck
by New Contributor III
  • 164 Views
  • 2 replies
  • 0 kudos

Update set in foreachBatch

I need to track codes of records that were ingested in foreachBatch function, and pass it as a task value, so downstream tasks can take actions based on this output. What would be the best approach to achieve that? Now, I have a following solution, b...

  • 164 Views
  • 2 replies
  • 0 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 0 kudos

@skarpeck does your input df contain any filters? The empty codes variable could be due to empty microbatches maybe. Please check the numInputRows from your query's Stream Monitoring Metrics. I recommend you to check if there are input rows for the b...

  • 0 kudos
1 More Replies
ismaelhenzel
by New Contributor III
  • 231 Views
  • 0 replies
  • 0 kudos

Delta live tables - foreign keys

I'm creating ingestions using delta live tables, the dlt support the use of schema, with constraints like foreign keys. The problem is: how can i create foreign keys between the same pipeline, that has no read/write relation, but has foreign key rela...

  • 231 Views
  • 0 replies
  • 0 kudos
Syed-SnapLogic
by New Contributor
  • 94 Views
  • 1 replies
  • 1 kudos

Does Databricks support the password grant type?

Hi,For my azure databricks instance, I am able to generate an access token using client_credentials and authorization_code grant types. I would like to know if Databricks supports the password grant type or not.  Is there any document or reference to...

  • 94 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hello @Syed-SnapLogic, Databricks does not support the password grant type for generating access tokens. The supported grant types for generating access tokens in Databricks are client_credentials and authorization_code. For more information, you can...

  • 1 kudos
Dom1
by New Contributor III
  • 2312 Views
  • 5 replies
  • 3 kudos

Show log4j messages in run output

Hi,I have an issue when running JAR jobs. I expect to see logs in the output window of a run. Unfortunately, I can only see messages of that are generated with "System.out.println" or "System.err.println". Everything that is logged via slf4j is only ...

Dom1_0-1713189014582.png
  • 2312 Views
  • 5 replies
  • 3 kudos
Latest Reply
dbal
New Contributor III
  • 3 kudos

Any update on this? I am also facing this issue.

  • 3 kudos
4 More Replies
jeremy98
by New Contributor II
  • 87 Views
  • 2 replies
  • 0 kudos

start another workflow waiting the completion of a job-run of the same workflow

Hello community,I'm using DABs I want to know if It is possible to configure the yaml file a logic that allows me to run a workflow if the previous job run is finished of the same workflow. Is it possible to do it? Do I need to create a task that che...

  • 87 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @jeremy98, Yes, it is possible to configure a YAML file to run a workflow only if the previous job run of the same workflow has finished. You can achieve this by defining dependencies between tasks within the workflow. You can specify task depe...

  • 0 kudos
1 More Replies
Labels