cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sandeep91
by New Contributor III
  • 4611 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks Job: Package Name and EntryPoint parameters for the Python Wheel file

I have created Python wheel file with simple file structure and uploaded into cluster library and was able to run the packages in Notebook but, when I am trying to create a Job using python wheel and provide the package name and run the task it fails...

image
  • 4611 Views
  • 5 replies
  • 2 kudos
Latest Reply
AndréSalvati
New Contributor III
  • 2 kudos

There you can see a complete template project with (the new!!!) Databricks Asset Bundles tool and a python wheel task. Please, follow the instructions for deployment.https://github.com/andre-salvati/databricks-template

  • 2 kudos
4 More Replies
User16790091296
by Contributor II
  • 1616 Views
  • 1 replies
  • 0 kudos

How to create a databricks job with parameters via CLI?

I'm creating a new job in databricks using the databricks-cli:databricks jobs create --json-file ./deploy/databricks/config/job.config.jsonWith the following json:{ "name": "Job Name", "new_cluster": { "spark_version": "4.1.x-scala2.1...

  • 1616 Views
  • 1 replies
  • 0 kudos
Latest Reply
matthew_m
New Contributor III
  • 0 kudos

This is an old post but still relevant for future readers, so will answer how it is done. You need to add base_parameters flag in the notebook_task config, like the following.   "notebook_task": { "notebook_path": "...", "base_parameters": { ...

  • 0 kudos
LidorAbo
by New Contributor II
  • 709 Views
  • 1 replies
  • 1 kudos

bucket ownership of s3 bucket in databricks

We had a databricks job that has strange behavior,when we passing 'output_path' to function saveAsTextFile and not output_path variable the data saved to the following path: s3://dev-databricks-hy1-rootbucket/nvirginiaprod/3219117805926709/output_pa...

s3
  • 709 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Valued Contributor
  • 1 kudos

I suspect you provided a dbfs path to save the data hence the data saved under your workspace root bucket.For the workspace root bucket, databricks workspace will interact with databricks credential to make sure databricks has access to it and able t...

  • 1 kudos
Divya_Bhadauria
by New Contributor II
  • 1349 Views
  • 2 replies
  • 2 kudos

Running databricks job with different parameter automatically

I have a python script running as databricks job. Is there a way I can run this job with different set of parameters automatically or programmatically without using run with different parameter option available in UI ?

  • 1349 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Divya Bhadauria​ We haven't heard from you since the last response from @Lakshay Goel​ â€‹, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...

  • 2 kudos
1 More Replies
source2sea
by Contributor
  • 2611 Views
  • 4 replies
  • 2 kudos

Resolved! how to make databricks job to fail when the application has already given "exit code 1"?

object OurMainObject extends LazyLogging with IOApp { def run(args: List[String]): IO[ExitCode] = { logger.info("Started the application")   val conf = defaultOverrides.withFallback(defaultApplication).withFallback(defaultReference) val...

  • 2611 Views
  • 4 replies
  • 2 kudos
Latest Reply
source2sea
Contributor
  • 2 kudos

my workaround now is to make the code like below, so the databricks jobs becomes failure. case Left(ex) => { IO(logger.error("Glue failure", ex)).map(_ => ExitCode.Error) IO.raiseError(ex) }

  • 2 kudos
3 More Replies
psps
by New Contributor III
  • 2171 Views
  • 3 replies
  • 4 kudos

Databricks Job run logs only shows prints/logs from driver and not executors

Hi,​In Databricks Job run output, only logs from driver are displayed. We have a function parallelized to run on executor nodes. The logs/prints from that function are not displayed in job run output. Is there a way to configure and show those logs i...

  • 2171 Views
  • 3 replies
  • 4 kudos
Latest Reply
psps
New Contributor III
  • 4 kudos

Thanks @Debayan Mukherjee​ . This is to enable executor logging. However, the executor logs do not appear in Databricks Job run output. Only driver logs are displayed.

  • 4 kudos
2 More Replies
Divya_Bhadauria
by New Contributor II
  • 2952 Views
  • 3 replies
  • 2 kudos

Unable to run python script from git repo in Databricks job

I'm getting cannot read python file on running this job which is configured to run a python script from git repo. Run result unavailable: run failed with error message Cannot read the python file /Repos/.internal/7c39d645692_commits/ff669d089cd8f93e9...

  • 2952 Views
  • 3 replies
  • 2 kudos
Latest Reply
Divya_Bhadauria
New Contributor II
  • 2 kudos

Hi Vidula,Yes, the above solution worked out for me. Tried debugging using all of the above steps and it turned out the path I was using in the job config was incorrect.

  • 2 kudos
2 More Replies
MarsSu
by New Contributor II
  • 3218 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks job about spark structured streaming zero downtime deployment in terraform.

I would like to ask how to implement zero downtime deployment of spark structured streaming in databricks job compute with terraform. Because we will upgrade spark application code version. But currently we found every deployment will cancel original...

  • 3218 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Mars Su​ :Yes, you can implement zero downtime deployment of Spark Structured Streaming in Databricks job compute using Terraform. One way to achieve this is by using Databricks' "job clusters" feature, which allows you to create a cluster specifica...

  • 1 kudos
4 More Replies
Michael_Papadop
by New Contributor II
  • 4185 Views
  • 3 replies
  • 0 kudos

How can I set the status of a databricks job as skipped via python?

I have a basic 2 task job. The 1st notebook (task) checks whether the source file has changes and if so then refreshes a corresponding materialized view. In case we have no changes then I use dbutils.jobs.taskValues.set(key = "skip_job", value = 1) &...

  • 4185 Views
  • 3 replies
  • 0 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 0 kudos

@Michael Papadopoulos​ usually that should not be the case i think, as for task level we have 3 level notifications ( success, failure,start), where as whole job level skip option is available to discard notification . will see if some one from commu...

  • 0 kudos
2 More Replies
youssefmrini
by Honored Contributor III
  • 619 Views
  • 1 replies
  • 1 kudos
  • 619 Views
  • 1 replies
  • 1 kudos
Latest Reply
youssefmrini
Honored Contributor III
  • 1 kudos

You can ensure there is always an active run of your Databricks job with the new continuous trigger type. https://docs.databricks.com/workflows/jobs/jobs.html#continuous-jobs

  • 1 kudos
Mohit_m
by Valued Contributor II
  • 13907 Views
  • 2 replies
  • 4 kudos

Resolved! How to get the Job ID and Run ID and save into a database

We are having Databricks Job running with main class and JAR file in it. Our JAR file code base is in Scala. Now, when our job starts running, we need to log Job ID and Run ID into the database for future purpose. How can we achieve this?

  • 13907 Views
  • 2 replies
  • 4 kudos
Latest Reply
User16783853961
New Contributor II
  • 4 kudos

Here is a blog with code and examples on how to achieve this https://medium.com/@canadiandataguy/how-to-get-the-job-id-and-run-id-for-a-databricks-job-b0da484e66f5

  • 4 kudos
1 More Replies
vinaykumar
by New Contributor III
  • 2391 Views
  • 3 replies
  • 1 kudos

Resolved! Run databricks job instantly without waiting job cluster get active

when we run databricks job it take some time to get job cluster active . I created pool also and attached with job cluster but still it take time to attached the cluster and job cluster get active to start the job run. is there any way - we can run d...

  • 2391 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

If you want instant processing, you will have to have a cluster running all the time.As mentioned above, Databricks is testing serverless compute for data engineering workloads (comparable to serverless SQL). This fires up a cluster in a few seconds...

  • 1 kudos
2 More Replies
joakon
by New Contributor III
  • 1414 Views
  • 4 replies
  • 3 kudos

Resolved! Databricks - Workflow- Jobs- Script to automate

Hi - I have created a Databricks job - under Workflow - its running fine without any issues . I would like to promote this job to other workspaces using a script.Is there a way to script the job definition and deploy it across multiple workspaces .I ...

  • 1414 Views
  • 4 replies
  • 3 kudos
Latest Reply
joakon
New Contributor III
  • 3 kudos

thank you @Landan George​ 

  • 3 kudos
3 More Replies
Bartek
by Contributor
  • 2296 Views
  • 0 replies
  • 1 kudos

How to pass all dag_run.conf parameters to python_wheel_task

I want to trigger Databricks job from Airflow using DatabricksSubmitRunDeferrableOperator and I need to pass configuration params. Here is excerpt from my code (definition is not complete, only crucial properties):from airflow.providers.databricks.op...

  • 2296 Views
  • 0 replies
  • 1 kudos
swzzzsw
by New Contributor III
  • 5129 Views
  • 5 replies
  • 10 kudos

"Run now with different parameters" - different parameters not recognized by jobs involving multiple tasks

I'm running a databricks job involving multiple tasks and would like to run the job with different set of task parameters. I can achieve that by edit each task and and change the parameter values. However, it gets very manual when I have a lot of tas...

  • 5129 Views
  • 5 replies
  • 10 kudos
Latest Reply
erens
New Contributor II
  • 10 kudos

Hello,I am also facing with the same issue. The problem is described below:I have a multi-task job. This job consists of multiple "spark_python_task" kind tasks that execute a python script in a spark cluster. This pipeline is created within a CI/CD ...

  • 10 kudos
4 More Replies
Labels