cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dave_hiltbrand
by New Contributor II
  • 1188 Views
  • 3 replies
  • 0 kudos

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.

I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. I open the Spark UI for the cluster and checkout the executors and don't see any tasks for my worker nodes. How ca...

  • 1188 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Dave Hiltbrand​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
2 More Replies
thib
by New Contributor III
  • 3127 Views
  • 3 replies
  • 2 kudos

Can we use multiple git repos for a job running multiple tasks?

I have a job running multiple tasks :Task 1 runs a machine learning pipeline from git repo 1Task 2 runs an ETL pipeline from git repo 1Task 2 is actually a generic pipeline and should not be checked in repo 1, and will be made available in another re...

image
  • 3127 Views
  • 3 replies
  • 2 kudos
Latest Reply
trijit
New Contributor II
  • 2 kudos

The way to go about this would be to create Databricks repos in the workspace and then use that in the task formation. This way we can refer multiple repos in different tasks.

  • 2 kudos
2 More Replies
swzzzsw
by New Contributor III
  • 5451 Views
  • 5 replies
  • 10 kudos

"Run now with different parameters" - different parameters not recognized by jobs involving multiple tasks

I'm running a databricks job involving multiple tasks and would like to run the job with different set of task parameters. I can achieve that by edit each task and and change the parameter values. However, it gets very manual when I have a lot of tas...

  • 5451 Views
  • 5 replies
  • 10 kudos
Latest Reply
erens
New Contributor II
  • 10 kudos

Hello,I am also facing with the same issue. The problem is described below:I have a multi-task job. This job consists of multiple "spark_python_task" kind tasks that execute a python script in a spark cluster. This pipeline is created within a CI/CD ...

  • 10 kudos
4 More Replies
Arun_tsr
by New Contributor III
  • 1005 Views
  • 2 replies
  • 0 kudos

Spark SQL output multiple small files

We are having multiple joins involving a large table (about 500gb in size). The output of the joins is stored into multiple small files each of size 800kb-1.5mb. Because of this the job is split into multiple tasks and taking a long time to complete....

Spark UI metrics
  • 1005 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi @Arun Balaji​ , Could you please provide the error message you are receiving?

  • 0 kudos
1 More Replies
RJB
by New Contributor II
  • 7477 Views
  • 6 replies
  • 0 kudos

Resolved! How to pass outputs from a python task to a notebook task

I am trying to create a job which has 2 tasks as follows:A python task which accepts a date and an integer from the user and outputs a list of dates (say, a list of 5 dates in string format).A notebook which runs once for each of the dates from the d...

  • 7477 Views
  • 6 replies
  • 0 kudos
Latest Reply
BilalAslamDbrx
Honored Contributor II
  • 0 kudos

Just a note that this feature, Task Values, has been generally available for a while.

  • 0 kudos
5 More Replies
RKNutalapati
by Valued Contributor
  • 1046 Views
  • 2 replies
  • 0 kudos

Jobs API "run now" - How to set task wise parameters

I have a job with multiple tasks like Task1 -> Task2 -> Task3. I am trying to call the job using api "run now". Task details are belowTask1 - It executes a Note Book with some input parametersTask2 - It runs using "ABC.jar", so its a jar based task ...

  • 1046 Views
  • 2 replies
  • 0 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 0 kudos

@Rama Krishna N​ you can refer here https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsRunNow"jar_params": [ "john", "doe", "35" ],   "notebook_params": { "name": "john doe", "age": "35" },

  • 0 kudos
1 More Replies
swzzzsw
by New Contributor III
  • 3019 Views
  • 5 replies
  • 2 kudos

Resolved! Pass variable values from one task to another

I created a Databricks job with multiple tasks. Is there a way to pass variable values from one task to another. For example, if I have tasks A and B as Databricks notebooks. Can I create a variable (e.g. x) in notebook A and later use that value in ...

  • 3019 Views
  • 5 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

you could also consider using an orchestration tool like Data Factory (Azure) or Glue (AWS). there you can inject and use parameters from notebooks.The job scheduling of databricks also has the possibility to add parameters, but I do not know if yo...

  • 2 kudos
4 More Replies
Labels