Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I'm running a databricks job involving multiple tasks and would like to run the job with different set of task parameters. I can achieve that by edit each task and and change the parameter values. However, it gets very manual when I have a lot of tas...
Dear Team, For now, I found a solution. Disconnect the bundle source on Databricks, edit the parameters that you want to run. After execution, redeploy your code again from repository.
I have created an Azure AD Group in "Microsoft 365" type with its own email address, which being added to the Notification of a Databricks Job (on failure). But there is no mail sent to the Azure Group mailbox when the job fails.I am able to send a d...
Hello Guys, I have setup ses receive email for databricks notification. When i send email message from google mail or yahoo mail, it gets to the SES email receiving rule. However, notification from databricks doesn't get to the same SES email receivi...
We are having Databricks Job running with main class and JAR file in it. Our JAR file code base is in Scala. Now, when our job starts running, we need to log Job ID and Run ID into the database for future purpose. How can we achieve this?
I have created Python wheel file with simple file structure and uploaded into cluster library and was able to run the packages in Notebook but, when I am trying to create a Job using python wheel and provide the package name and run the task it fails...
There you can see a complete template project with (the new!!!) Databricks Asset Bundles tool and a python wheel task. Please, follow the instructions for deployment.https://github.com/andre-salvati/databricks-template
I'm creating a new job in databricks using the databricks-cli:databricks jobs create --json-file ./deploy/databricks/config/job.config.jsonWith the following json:{
"name": "Job Name",
"new_cluster": {
"spark_version": "4.1.x-scala2.1...
This is an old post but still relevant for future readers, so will answer how it is done. You need to add base_parameters flag in the notebook_task config, like the following.
"notebook_task": {
"notebook_path": "...",
"base_parameters": {
...
We had a databricks job that has strange behavior,when we passing 'output_path' to function saveAsTextFile and not output_path variable the data saved to the following path: s3://dev-databricks-hy1-rootbucket/nvirginiaprod/3219117805926709/output_pa...
I suspect you provided a dbfs path to save the data hence the data saved under your workspace root bucket.For the workspace root bucket, databricks workspace will interact with databricks credential to make sure databricks has access to it and able t...
I have a python script running as databricks job. Is there a way I can run this job with different set of parameters automatically or programmatically without using run with different parameter option available in UI ?
Hi @Divya Bhadauria We haven't heard from you since the last response from @Lakshay Goel , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...
my workaround now is to make the code like below, so the databricks jobs becomes failure. case Left(ex) => {
IO(logger.error("Glue failure", ex)).map(_ => ExitCode.Error)
IO.raiseError(ex)
}
Hi,In Databricks Job run output, only logs from driver are displayed. We have a function parallelized to run on executor nodes. The logs/prints from that function are not displayed in job run output. Is there a way to configure and show those logs i...
Thanks @Debayan Mukherjee . This is to enable executor logging. However, the executor logs do not appear in Databricks Job run output. Only driver logs are displayed.
I'm getting cannot read python file on running this job which is configured to run a python script from git repo. Run result unavailable: run failed with error message Cannot read the python file /Repos/.internal/7c39d645692_commits/ff669d089cd8f93e9...
Hi Vidula,Yes, the above solution worked out for me. Tried debugging using all of the above steps and it turned out the path I was using in the job config was incorrect.
I would like to ask how to implement zero downtime deployment of spark structured streaming in databricks job compute with terraform. Because we will upgrade spark application code version. But currently we found every deployment will cancel original...
@Mars Su :Yes, you can implement zero downtime deployment of Spark Structured Streaming in Databricks job compute using Terraform. One way to achieve this is by using Databricks' "job clusters" feature, which allows you to create a cluster specifica...
I have a basic 2 task job. The 1st notebook (task) checks whether the source file has changes and if so then refreshes a corresponding materialized view. In case we have no changes then I use dbutils.jobs.taskValues.set(key = "skip_job", value = 1) &...
@Michael Papadopoulos usually that should not be the case i think, as for task level we have 3 level notifications ( success, failure,start), where as whole job level skip option is available to discard notification . will see if some one from commu...
You can ensure there is always an active run of your Databricks job with the new continuous trigger type. https://docs.databricks.com/workflows/jobs/jobs.html#continuous-jobs
when we run databricks job it take some time to get job cluster active . I created pool also and attached with job cluster but still it take time to attached the cluster and job cluster get active to start the job run. is there any way - we can run d...
If you want instant processing, you will have to have a cluster running all the time.As mentioned above, Databricks is testing serverless compute for data engineering workloads (comparable to serverless SQL). This fires up a cluster in a few seconds...
Hi - I have created a Databricks job - under Workflow - its running fine without any issues . I would like to promote this job to other workspaces using a script.Is there a way to script the job definition and deploy it across multiple workspaces .I ...