Data Engineering

Forum Posts

Sorted by:

by harraz • New Contributor III

05-31-2023 3:50:32 PM

3173 Views
1 replies
0 kudos

Run result unavailable: run failed with error message Notebook not found:

I'm trying to create a workflow job that fetches the notebook from a remote git repository (Bitbucket cloud)I tried everything in the Path field and nothing is working. Note that the bitbucket repo is connected to databricks already and no issues che...

Data Engineering

3173 Views
1 replies
0 kudos

05-31-2023 3:50:32 PM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

06-07-2023 11:39:42 PM

0 kudos

Hi @harraz (Customer) , Could you please confirm if files in repos has been enabled? https://docs.databricks.com/files/workspace.html#configure-support-for-files-in-repos.You can use the command %sh pwd in a notebook inside a repo to check if Files ...

0 kudos

06-07-2023 11:39:42 PM

by deep_thought • Contributor

12-18-2022 9:27:54 PM

17498 Views
16 replies
9 kudos

Resolved! Schedule job to run sequentially after another job

Is there a way to schedule a job to run after some other job is complete?E.g. Schedule Job A, then upon it's completion run Job B.

Data Engineering

17498 Views
16 replies
9 kudos

12-18-2022 9:27:54 PM

View Replies

Latest Reply

claytonseverson
New Contributor II

06-01-2023 9:44:41 PM

9 kudos

Here is the User Guide for Jobs-as-Tasks - https://docs.google.com/document/d/1OJsc-g7IwAJjYooCp7T01Rxyt_xFkMPjmAAGdDGPkY4/edit#heading=h.oudvb5fyfd0n

9 kudos

06-01-2023 9:44:41 PM

15 More Replies

by MarsSu • New Contributor II

05-11-2023 7:23:35 PM

2235 Views
3 replies
3 kudos

Resolved! Does driver node of job compute have HA?

I would like to confirm and discuss HA mechanism about driver node of job compute. Because we can image driver node just like master node of cluster. In AWS EMR, we can setup 2 master node so that one of master node failed, another master node can re...

Data Engineering

2235 Views
3 replies
3 kudos

05-11-2023 7:23:35 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-22-2023 12:23:36 AM

3 kudos

Hi @Mars Su We haven't heard from you since the last response from @Werner Stinckens and @karthik p , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be...

3 kudos

05-22-2023 12:23:36 AM

2 More Replies

by psps • New Contributor III

05-04-2023 2:11:43 AM

3545 Views
3 replies
4 kudos

Databricks Job run logs only shows prints/logs from driver and not executors

Hi,In Databricks Job run output, only logs from driver are displayed. We have a function parallelized to run on executor nodes. The logs/prints from that function are not displayed in job run output. Is there a way to configure and show those logs i...

Data Engineering

3545 Views
3 replies
4 kudos

05-04-2023 2:11:43 AM

View Replies

Latest Reply

psps
New Contributor III

05-09-2023 9:29:48 AM

4 kudos

Thanks @Debayan Mukherjee . This is to enable executor logging. However, the executor logs do not appear in Databricks Job run output. Only driver logs are displayed.

4 kudos

05-09-2023 9:29:48 AM

2 More Replies

by B_J_Innov • New Contributor III

04-26-2023 3:15:09 AM

5893 Views
12 replies
0 kudos

Resolved! Can't use job cluster for scheduled jobs ADD_NODES_FAILED : Failed to add 9 containers to the cluster. Will attempt retry: false. Reason: Azure Quota Exceeded Exception

Hi everyone,I've been using my all purpose cluster for scheduled jobs and I've been told that it's a suboptimal thing to do and that using a job cluster for the scheduled jobs cuts costs by half.Unfortunately, when I tried to switch clusters on my ex...

Data Engineering

5893 Views
12 replies
0 kudos

04-26-2023 3:15:09 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

05-09-2023 8:55:22 AM

0 kudos

@Bassem Jaber If you are seeing same error then you need to increase quota, for that your azure plan should be changed from pay as you go to other plan. as pay-as-go azure model has limitations on quota increase

0 kudos

05-09-2023 8:55:22 AM

11 More Replies

by oleole • Contributor

03-19-2023 8:01:15 PM

3556 Views
3 replies
3 kudos

Resolved! How to delay a new job run after job

I have a daily job run that occasionally fails with the error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached. After I get the notification that this job failed on schedule, I manually run ...

Data Engineering

3556 Views
3 replies
3 kudos

03-19-2023 8:01:15 PM

View Replies

Latest Reply

oleole
Contributor

04-17-2023 1:16:39 PM

3 kudos

According to this documentation, you can specify the wait time between the "start" of the first run and the retry start time.

3 kudos

04-17-2023 1:16:39 PM

2 More Replies

by Michael_Papadop • New Contributor II

04-07-2023 6:35:05 AM

9834 Views
3 replies
0 kudos

How can I set the status of a databricks job as skipped via python?

I have a basic 2 task job. The 1st notebook (task) checks whether the source file has changes and if so then refreshes a corresponding materialized view. In case we have no changes then I use dbutils.jobs.taskValues.set(key = "skip_job", value = 1) &...

Data Engineering

9834 Views
3 replies
0 kudos

04-07-2023 6:35:05 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-07-2023 1:32:12 PM

0 kudos

@Michael Papadopoulos usually that should not be the case i think, as for task level we have 3 level notifications ( success, failure,start), where as whole job level skip option is available to discard notification . will see if some one from commu...

0 kudos

04-07-2023 1:32:12 PM

2 More Replies

by jakubk • Contributor

03-01-2023 11:14:31 PM

7616 Views
13 replies
9 kudos

dbt workflow job limitations - naming the target? where do docs go?

I'm on unity catalogI'm trying to do a dbt run on a project that works locallybut the databricks dbt workflow task seems to be ignoring the project.yml settings for schemas and catalogs, as well as that defined in the config block of individual model...

Data Engineering

7616 Views
13 replies
9 kudos

03-01-2023 11:14:31 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-21-2023 11:11:30 PM

9 kudos

Hi @Jakub K I'm sorry you could not find a solution to your problem in the answers provided.Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.I suggest provid...

9 kudos

03-21-2023 11:11:30 PM

12 More Replies

by Tjadi • New Contributor III

04-04-2023 1:11:36 AM

1315 Views
2 replies
4 kudos

Specifying cluster on running a job

Hi,Let's say that I am starting jobs with different parameters at a certain time each day in the following manner:response = requests.post( "https://%s/api/2.0/jobs/run-now" % (DOMAIN), headers={"Authorization": "Bearer %s" % TOKEN}, json={ ...

Data Engineering

1315 Views
2 replies
4 kudos

04-04-2023 1:11:36 AM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

04-04-2023 9:13:42 AM

4 kudos

@Tjadi Peeters You can select option Autoscaling/Enhanced Scaling in workflows which will scale based on workload

4 kudos

04-04-2023 9:13:42 AM

1 More Replies

by Saty • New Contributor

03-27-2023 12:50:06 AM

13948 Views
3 replies
1 kudos

Job is fails with java.lang.NoClassDefFoundError: Could not initialize class error

hi,It is scala code where we are connecting Redis to store (sparkcontext.toRedisKV) and i am also using scala udf . ihave excuted the same code in notebook without scala object and it works fine but everytime it fails when i am using same code in jar...

Data Engineering

13948 Views
3 replies
1 kudos

03-27-2023 12:50:06 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-03-2023 11:39:00 PM

1 kudos

Hi @Satish Kumbhar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

1 kudos

04-03-2023 11:39:00 PM

2 More Replies

by yzhang • New Contributor III

03-27-2023 4:04:49 PM

2351 Views
5 replies
0 kudos

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another &#...

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another 'job_b', also contains a list of tasks. Now I'd like to have a 'job_all' that will run both 'job_a' and 'job_b...

Data Engineering

2351 Views
5 replies
0 kudos

03-27-2023 4:04:49 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-27-2023 9:08:45 PM

0 kudos

Hi @Yanan Zhang Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the response and select the one that best answers yo...

0 kudos

03-27-2023 9:08:45 PM

4 More Replies

by sage5616 • Valued Contributor

03-15-2023 7:40:21 AM

4385 Views
1 replies
3 kudos

Resolved! Set Workflow Job Concurrency Limit

Hi Everyone,I need a job to be triggered every 5 minutes. However, if that job is already running, it must not be triggered again until that run is finished. Hence, I need to set the maximum run concurrency for that job to only one instance at a time...

Data Engineering

4385 Views
1 replies
3 kudos

03-15-2023 7:40:21 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-17-2023 8:48:42 AM

3 kudos

@Michael Okulik :To ensure that a Databricks job is not triggered again until a running instance of the job is completed, you can set the maximum concurrency for the job to 1. Here's how you can configure this in Databricks:Go to the Databricks work...

3 kudos

03-17-2023 8:48:42 AM

by vinaykumar • New Contributor III

02-13-2023 10:06:47 PM

3609 Views
3 replies
1 kudos

Resolved! Run databricks job instantly without waiting job cluster get active

when we run databricks job it take some time to get job cluster active . I created pool also and attached with job cluster but still it take time to attached the cluster and job cluster get active to start the job run. is there any way - we can run d...

Data Engineering

3609 Views
3 replies
1 kudos

02-13-2023 10:06:47 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

02-14-2023 5:06:49 AM

1 kudos

If you want instant processing, you will have to have a cluster running all the time.As mentioned above, Databricks is testing serverless compute for data engineering workloads (comparable to serverless SQL). This fires up a cluster in a few seconds...

1 kudos

02-14-2023 5:06:49 AM

2 More Replies

by User16783854657 • New Contributor III

06-09-2021 3:12:47 PM

2931 Views
4 replies
6 kudos

How do I know how much of a query/job used Photon?

I'm trying to use the native execution engine, Photon. How can I tell if a query is using Photon or is falling back to the non-native Spark engine?

Data Engineering

2931 Views
4 replies
6 kudos

06-09-2021 3:12:47 PM

View Replies

Latest Reply

venkat09
New Contributor III

01-21-2023 5:05:52 PM

6 kudos

Typo error in my second point of the previous post. Click the execution plan of your task[this is available under SQL/Dataframe tab in Spark UI]. It explains what operations run in the photon engine and what didn't execute by photon.

6 kudos

01-21-2023 5:05:52 PM

3 More Replies

by JD410993 • New Contributor II

01-16-2023 11:26:46 PM

1902 Views
3 replies
2 kudos

Job runs indefinitely after integrating with PyDeequ

I'm using PyDeequ data quality checks in one of our jobs. After adding this check, I noticed that the job does not complete and keeps running indefinitely after PyDeequ checks are completed and results are returned.As stated in Pydeequ documentation ...

Data Engineering

1902 Views
3 replies
2 kudos

01-16-2023 11:26:46 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-17-2023 2:26:58 AM

2 kudos

Hm, deequ certainly works as I have read about multiple people using it.And when reading the issues (open/closed) on the github pages of pydeequ, databricks is mentioned in some issues so it might be possible after all.But I think you need to check y...

2 kudos

01-17-2023 2:26:58 AM

2 More Replies