cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MrJava
by New Contributor III
  • 9321 Views
  • 15 replies
  • 12 kudos

How to know, who started a job run?

Hi there!We have different jobs/workflows configured in our Databricks workspace running on AWS and would like to know who actually started the job run? Are they started by a user or a service principle using curl?Currently one can only see, who is t...

  • 9321 Views
  • 15 replies
  • 12 kudos
Latest Reply
mcveyroosevelt
New Contributor II
  • 12 kudos

To determine who started a job run in Databricks, you can use the Audit Logs feature by enabling workspace-level events and analyzing the runStarted events. Look for the userIdentity field within these logs, which identifies whether the run was trigg...

  • 12 kudos
14 More Replies
cmilligan
by Contributor II
  • 863 Views
  • 1 replies
  • 1 kudos

Return notebook path from job that is run remotely from the repo

I'm wanting to set up some email alerts for issues in the data as a part of a job run. I am wanting to point the user to the notebook that the issue occurred in. I think this would be simple enough but another layer is that the job is going to be run...

  • 863 Views
  • 1 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, Could you please clarify what do you mean by return the file from the remote repo?Please tag @Debayan​ with your next response which will notify me, Thank you!

  • 1 kudos
Dipesh
by New Contributor II
  • 4443 Views
  • 4 replies
  • 2 kudos

Pausing a scheduled Azure Databricks job after failure

Hi All,I have a job/workflow scheduled in Databricks to run after every hour.How can I configure my Job to pause whenever a job run fails? (Pause the job/workflow on first failure)I would want to prevent triggering multiple runs due to the scheduled/...

  • 4443 Views
  • 4 replies
  • 2 kudos
Latest Reply
Dipesh
New Contributor II
  • 2 kudos

Hi @Hubert Dudek​ , Thank you for your suggestion.I understand that we can use Jobs API to change the pasue_status of job on errors, but sometimes we observed that the workflow/job fails due to cluster issues (while the job clusters are getting creat...

  • 2 kudos
3 More Replies
dceman
by New Contributor
  • 1664 Views
  • 0 replies
  • 0 kudos

Databricks with CloudWatch metrics without Instanceid dimension

I have jobs running on job clusters. And I want to send metrics to the CloudWatch. I set CW agent followed this guide.But issue is that I can't create useful metrics dashboard and alarms because I always have InstanceId dimension, and InstanceId is d...

image
  • 1664 Views
  • 0 replies
  • 0 kudos
AlexDavies
by Contributor
  • 1527 Views
  • 0 replies
  • 3 kudos

Jobs with dynamic task parameters

We have a jar method that takes in as a parameter "--date 2022-01-01" and it will process that dates worth of data. However when invoked via a job the date we want to pass in is the day before the job run was startedWe could default this in the jar j...

  • 1527 Views
  • 0 replies
  • 3 kudos
RaymondLC92
by New Contributor II
  • 2049 Views
  • 2 replies
  • 1 kudos

Resolved! How to obtain run_id without using dbutils in python?

We would like to be able to get the run_id in a job run and we have the unfortunate restriction that we cannot use dbutils, is there a way to get it in python?I know for Job ID it's possible to retrieve it from the environment variables.

  • 2049 Views
  • 2 replies
  • 1 kudos
Latest Reply
artsheiko
Databricks Employee
  • 1 kudos

Hi, please refer to the following thread : https://community.databricks.com/s/question/0D58Y00008pbkj9SAA/how-to-get-the-job-id-and-run-id-and-save-into-a-databaseHope this helps

  • 1 kudos
1 More Replies
celerity12
by New Contributor II
  • 5731 Views
  • 7 replies
  • 4 kudos

Pulling list of running jobs using JOBS API 2.1

I need to find out all jobs which are currently running and not get other jobsThe below command fetches all the jobscurl --location --request GET 'https://xxxxxx.gcp.databricks.com/api/2.1/jobs/list?active_only=true&expand_tasks=true&run_type=JOB_RUN...

  • 5731 Views
  • 7 replies
  • 4 kudos
Latest Reply
User16764241763
Honored Contributor
  • 4 kudos

Hi @Sumit Rohatgi​ It seems like active_only=true only applies to jobs/runs/list API and not to jobs/list.Can you please try the jobs/runs/list API?

  • 4 kudos
6 More Replies
SimonY
by New Contributor III
  • 2770 Views
  • 3 replies
  • 3 kudos

Resolved! Trigger.AvailableNow does not support maxOffsetsPerTrigger in Databricks runtime 10.3

Hello,I ran a spark stream job to ingest data from kafka to test Trigger.AvailableNow.What's environment the job run ?1: Databricks runtime 10.32: Azure cloud3: 1 Driver node + 3 work nodes( 14GB, 4core)val maxOffsetsPerTrigger = "500"spark.conf.set...

  • 2770 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

You'd be better off with 1 node with 12 cores than 3 nodes with 4 each. You're shuffles are going to be much better one 1 machine.

  • 3 kudos
2 More Replies
Anonymous
by Not applicable
  • 1867 Views
  • 2 replies
  • 4 kudos

Multi-task Job Run starting point

Hi community!I would like to know if it is possible to start a Multi-task Job Run from and specific task. The use case is as follows:I have a 17 tasks JobA task in the middle, let's say a task after 2 dependencies, failsI found the error and now it i...

  • 1867 Views
  • 2 replies
  • 4 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 4 kudos

+1 to what @Dan Zafar​  said. We're working **** ** this. Looking forward to bring this to you in the near future.

  • 4 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 2205 Views
  • 1 replies
  • 0 kudos

Resolved! Scheduled job did not trigger the job run

I have a job that is scheduled to run every one hour. But rarely I see the job runs are skipped

  • 2205 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

If you choose a timezone with Daylight savings this issue can happen. We recommend choosing UTC timezone to avoid this issue. If you select a zone that observes daylight saving time, an hourly job will be skipped or may appear to not fire for an hour...

  • 0 kudos
Labels