Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I'm running a databricks job involving multiple tasks and would like to run the job with different set of task parameters. I can achieve that by edit each task and and change the parameter values. However, it gets very manual when I have a lot of tas...
Dear Team, For now, I found a solution. Disconnect the bundle source on Databricks, edit the parameters that you want to run. After execution, redeploy your code again from repository.
I have a job running multiple tasks :Task 1 runs a machine learning pipeline from git repo 1Task 2 runs an ETL pipeline from git repo 1Task 2 is actually a generic pipeline and should not be checked in repo 1, and will be made available in another re...
Had this same problem. Fix was to have two workflows with no triggers, each pointing to the respective git repo. Then setup a 3rd workflow with appropriate triggers/schedule which calls the first 2 workflows. A workflow can run other workflows.
I have a job with multiple tasks like Task1 -> Task2 -> Task3. I am trying to call the job using api "run now". Task details are belowTask1 - It executes a Note Book with some input parametersTask2 - It runs using "ABC.jar", so its a jar based task ...
Hi,It would be a good feature to pass parameters at task level. We have scenarios where we would like to create a job with multiple tasks (notebook/dbt) and pass parameters at task level.
I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. I open the Spark UI for the cluster and checkout the executors and don't see any tasks for my worker nodes. How ca...
I am trying to run an incremental data processing job using python wheel.The job is scheduled to run e.g. every hour.For my code to know what data increment to process, I inject it with the {{start_time}} as part of the command line, like so["end_dat...
We are having multiple joins involving a large table (about 500gb in size). The output of the joins is stored into multiple small files each of size 800kb-1.5mb. Because of this the job is split into multiple tasks and taking a long time to complete....
I am trying to create a job which has 2 tasks as follows:A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).A notebook which runs once for each of the dates from the d...
I created a Databricks job with multiple tasks. Is there a way to pass variable values from one task to another. For example, if I have tasks A and B as Databricks notebooks. Can I create a variable (e.g. x) in notebook A and later use that value in ...
you could also consider using an orchestration tool like Data Factory (Azure) or Glue (AWS). there you can inject and use parameters from notebooks.The job scheduling of databricks also has the possibility to add parameters, but I do not know if yo...