cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

psps
by New Contributor III
  • 2766 Views
  • 3 replies
  • 4 kudos

Databricks Job run logs only shows prints/logs from driver and not executors

Hi,​In Databricks Job run output, only logs from driver are displayed. We have a function parallelized to run on executor nodes. The logs/prints from that function are not displayed in job run output. Is there a way to configure and show those logs i...

  • 2766 Views
  • 3 replies
  • 4 kudos
Latest Reply
psps
New Contributor III
  • 4 kudos

Thanks @Debayan Mukherjee​ . This is to enable executor logging. However, the executor logs do not appear in Databricks Job run output. Only driver logs are displayed.

  • 4 kudos
2 More Replies
B_J_Innov
by New Contributor III
  • 4651 Views
  • 12 replies
  • 0 kudos

Resolved! Can't use job cluster for scheduled jobs ADD_NODES_FAILED : Failed to add 9 containers to the cluster. Will attempt retry: false. Reason: Azure Quota Exceeded Exception

Hi everyone,I've been using my all purpose cluster for scheduled jobs and I've been told that it's a suboptimal thing to do and that using a job cluster for the scheduled jobs cuts costs by half.Unfortunately, when I tried to switch clusters on my ex...

  • 4651 Views
  • 12 replies
  • 0 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 0 kudos

@Bassem Jaber​ If you are seeing same error then you need to increase quota, for that your azure plan should be changed from pay as you go to other plan. as pay-as-go azure model has limitations on quota increase

  • 0 kudos
11 More Replies
kjoth
by Contributor II
  • 13093 Views
  • 9 replies
  • 6 kudos

How to make the job fail via code after handling exception

Hi , We are capturing the exception if an error occurs using try except. But we want the job status to be failed once we got the exception. Whats the best way to do that. We are using pyspark.

  • 13093 Views
  • 9 replies
  • 6 kudos
Latest Reply
AkA
New Contributor II
  • 6 kudos

Instead of exiting the notebook which make the task/job success, Exception objects needs to be raised again from Exception block to fail the job.try: <you code>except Exception as err: <your block of exception handling> raise err

  • 6 kudos
8 More Replies
oleole
by Contributor
  • 2691 Views
  • 3 replies
  • 3 kudos

Resolved! How to delay a new job run after job

I have a daily job run that occasionally fails with the error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached. After I get the notification that this job failed on schedule, I manually run ...

image.png image.png
  • 2691 Views
  • 3 replies
  • 3 kudos
Latest Reply
oleole
Contributor
  • 3 kudos

According to this documentation, you can specify the wait time between the "start" of the first run and the retry start time.

  • 3 kudos
2 More Replies
Michael_Papadop
by New Contributor II
  • 5075 Views
  • 3 replies
  • 0 kudos

How can I set the status of a databricks job as skipped via python?

I have a basic 2 task job. The 1st notebook (task) checks whether the source file has changes and if so then refreshes a corresponding materialized view. In case we have no changes then I use dbutils.jobs.taskValues.set(key = "skip_job", value = 1) &...

  • 5075 Views
  • 3 replies
  • 0 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 0 kudos

@Michael Papadopoulos​ usually that should not be the case i think, as for task level we have 3 level notifications ( success, failure,start), where as whole job level skip option is available to discard notification . will see if some one from commu...

  • 0 kudos
2 More Replies
jakubk
by Contributor
  • 5960 Views
  • 13 replies
  • 9 kudos

dbt workflow job limitations - naming the target? where do docs go?

I'm on unity catalogI'm trying to do a dbt run on a project that works locallybut the databricks dbt workflow task seems to be ignoring the project.yml settings for schemas and catalogs, as well as that defined in the config block of individual model...

  • 5960 Views
  • 13 replies
  • 9 kudos
Latest Reply
Anonymous
Not applicable
  • 9 kudos

Hi @Jakub K​ I'm sorry you could not find a solution to your problem in the answers provided.Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.I suggest provid...

  • 9 kudos
12 More Replies
Tjadi
by New Contributor III
  • 1098 Views
  • 2 replies
  • 4 kudos

Specifying cluster on running a job

Hi,Let's say that I am starting jobs with different parameters at a certain time each day in the following manner:response = requests.post( "https://%s/api/2.0/jobs/run-now" % (DOMAIN), headers={"Authorization": "Bearer %s" % TOKEN}, json={ ...

  • 1098 Views
  • 2 replies
  • 4 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 4 kudos

@Tjadi Peeters​ You can select option Autoscaling/Enhanced Scaling in workflows which will scale based on workload

  • 4 kudos
1 More Replies
Saty
by New Contributor
  • 8936 Views
  • 3 replies
  • 1 kudos

Job is fails with java.lang.NoClassDefFoundError: Could not initialize class error

hi,It is scala code where we are connecting Redis to store (sparkcontext.toRedisKV) and i am also using scala udf . ihave excuted the same code in notebook without scala object and it works fine but everytime it fails when i am using same code in jar...

  • 8936 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Satish Kumbhar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 1 kudos
2 More Replies
yzhang
by New Contributor III
  • 1866 Views
  • 5 replies
  • 0 kudos

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a &#39;job_a&#39;, which contains list of tasks, and another &#...

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another 'job_b', also contains a list of tasks. Now I'd like to have a 'job_all' that will run both 'job_a' and 'job_b...

  • 1866 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Yanan Zhang​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the response and select the one that best answers yo...

  • 0 kudos
4 More Replies
sage5616
by Valued Contributor
  • 3342 Views
  • 1 replies
  • 3 kudos

Resolved! Set Workflow Job Concurrency Limit

Hi Everyone,I need a job to be triggered every 5 minutes. However, if that job is already running, it must not be triggered again until that run is finished. Hence, I need to set the maximum run concurrency for that job to only one instance at a time...

  • 3342 Views
  • 1 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Michael Okulik​ :To ensure that a Databricks job is not triggered again until a running instance of the job is completed, you can set the maximum concurrency for the job to 1. Here's how you can configure this in Databricks:Go to the Databricks work...

  • 3 kudos
vinaykumar
by New Contributor III
  • 2954 Views
  • 3 replies
  • 1 kudos

Resolved! Run databricks job instantly without waiting job cluster get active

when we run databricks job it take some time to get job cluster active . I created pool also and attached with job cluster but still it take time to attached the cluster and job cluster get active to start the job run. is there any way - we can run d...

  • 2954 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

If you want instant processing, you will have to have a cluster running all the time.As mentioned above, Databricks is testing serverless compute for data engineering workloads (comparable to serverless SQL). This fires up a cluster in a few seconds...

  • 1 kudos
2 More Replies
hanish
by New Contributor II
  • 1809 Views
  • 3 replies
  • 2 kudos

Job cluster support in jobs/runs/submit API

We are using jobs/runs/submit API of databricks to create and trigger a one-time run with new_cluster and existing_cluster configuration. We would like to check if there is provision to pass "job_clusters" in this API to reuse the same cluster across...

  • 1809 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Hanish Bansal​ Shared job cluster for  jobs/runs/submit API is not supported at the moment.

  • 2 kudos
2 More Replies
User16783854657
by New Contributor III
  • 2429 Views
  • 4 replies
  • 6 kudos

How do I know how much of a query/job used Photon?

I'm trying to use the native execution engine, Photon. How can I tell if a query is using Photon or is falling back to the non-native Spark engine?

  • 2429 Views
  • 4 replies
  • 6 kudos
Latest Reply
venkat09
New Contributor III
  • 6 kudos

Typo error in my second point of the previous post. Click the execution plan of your task[this is available under SQL/Dataframe tab in Spark UI]. It explains what operations run in the photon engine and what didn't execute by photon.

  • 6 kudos
3 More Replies
JD410993
by New Contributor II
  • 1577 Views
  • 3 replies
  • 2 kudos

Job runs indefinitely after integrating with PyDeequ

I'm using PyDeequ data quality checks in one of our jobs. After adding this check, I noticed that the job does not complete and keeps running indefinitely after PyDeequ checks are completed and results are returned.As stated in Pydeequ documentation ...

  • 1577 Views
  • 3 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Hm, deequ certainly works as I have read about multiple people using it.And when reading the issues (open/closed) on the github pages of pydeequ, databricks is mentioned in some issues so it might be possible after all.But I think you need to check y...

  • 2 kudos
2 More Replies
lcalca95
by New Contributor II
  • 1097 Views
  • 0 replies
  • 0 kudos

Azure Databricks job and exception handling

Hi,I'm working on Azure Databricks and I created two jobs, one based on a python wheel and the other based on a notebook, with the same code. The code get data from Azure blob storage, process data with pyspark and send data to EventHub. The whole co...

  • 1097 Views
  • 0 replies
  • 0 kudos
Labels