Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have a job running multiple tasks :Task 1 runs a machine learning pipeline from git repo 1Task 2 runs an ETL pipeline from git repo 1Task 2 is actually a generic pipeline and should not be checked in repo 1, and will be made available in another re...
Had this same problem. Fix was to have two workflows with no triggers, each pointing to the respective git repo. Then setup a 3rd workflow with appropriate triggers/schedule which calls the first 2 workflows. A workflow can run other workflows.
Hi DataBricks Experts:I'm using Databricks on Azure.... I'd like to understand the following:1) if there is way of automating the re run some specific failed tasks from a job (with several Tasks), for example if I have 4 tasks, and the task 1 and 2 h...
You can use "retries".In Workflow, select your job, the task, and in the options below, configure retries.If so, you can also see more options at:https://learn.microsoft.com/pt-br/azure/databricks/dev-tools/api/2.0/jobs?source=recommendations
Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another 'job_b', also contains a list of tasks. Now I'd like to have a 'job_all' that will run both 'job_a' and 'job_b...
Hi @Yanan Zhang Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the response and select the one that best answers yo...
I set up a workflow using 2 tasks. Just for demo purposes, I'm using an interactive cluster for running the workflow. {
"task_key": "prepare",
"spark_python_task": {
"python_file": "file...
Hi @Fran Pérez,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
I have created a job, Inside a job I have created tasks which are independent, I have used the concept of concurrent futures to exhibit parallelism and in each task there are couple of notebooks running(which are independent) Each notebook running ha...
Hi @swetha kadiyala Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...
Py4JJavaError: An error occurred while calling o236.sql. : org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:201) at org.apache.spark.sql.execution.datasources.I...
could you please increase the below config (at the cluster level) to a higher value or set it to zero spark.databricks.queryWatchdog.maxQueryTasks 0The spark config while it alleviates the issue.
Is there a way to schedule tasks or jobs through the Databricks CLI instead of the GUI? I want to be able to create a job flow with different notebook through the CLI.
I agreed with @Kaniz Fatma https://docs.databricks.com/dev-tools/cli/jobs-cli.html?_ga=2.101966982.684786035.1646666830-480220406.1638459894 this is the job CLI we currently support @Shreya Gupta
HI,I am running a Notebook job calling a JAR code (application code implmented in C#). in the Spark UI page for almost 2 hrs, it'w not showing any tasks and even the CPU usage is below 20%, memory usage is very small. Before this 2 hr window it shows...
Enabling of Task Orchestration feature in Jobs via API as wellDatabricks supports the ability to orchestrate multiple tasks within a job. You must enable this feature in the admin console. Once enabled, this feature cannot be disabled. To enable orch...
We are using the terraform databricks provier, which is starting a cluster and checking every mount (since there is no mount rest API!). Each mount takes 20 seconds to check, and 99.9% of that time is idle waiting, and it starts a job per mount. If w...
hi @Erik Parmann ,It is possible to do, but you might need to also enable dynamic allocation at the cluster level to be able to make sure your settings are apply at cluster creation . You can find more details here. As best practice, we do not recom...
In general, one task per core is how spark executes the tasks.If we want to restrict the number of tasks submitted to the executor to get more task to memory ratio, How can we achieve that?
We can use a config called "spark.task.cpus"This specifies the number of cores to allocate for each task.The default value is 1If we specify say 2, it means fewer tasks will be assigned to the executor.