cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Pass Dataframe to child job in "Run Job" task

erigaud
Honored Contributor

Hello,

I have a Job A that runs a Job B, and Job A defines a globalTempView and I would like to somehow access it in the child job. 

Is that in anyway possible ? Can the same cluster be used for both jobs ? If it is not possible, does someone know of a work-around to do it ? Thank you very much ! 

5 REPLIES 5

Kaniz_Fatma
Community Manager
Community Manager

Hi @erigaud, When you have a Job A that runs a Job B, and Job A defines a globalTempView, you might wonder how to access it in the child job.

Letโ€™s break it down:

  1. Global Temporary Views:

    • A global temporary view is tied to the Spark application rather than a specific session or cluster. It is accessible across different Spark sessions within the same application.
    • When you create a global temporary view, it is registered in the catalog with a name prefixed by global_temp. (e.g., global_temp.my_view).
    • The lifetime of a global temporary view is tied to the Spark application. It exists as long as the application is running.
  2. Accessing Global Temporary Views in Child Jobs:

    • To access a global temporary view in a child job (Job B), you need to ensure that both Job A and Job B are part of the same Spark application.
    • If youโ€™re using Databricks, you can run both jobs on the same cluster, and they will share the same Spark application context.
    • In your child job (Job B), you can directly query the global temporary view created by Job A using its fully qualified name (e.g., global_temp.my_view).
  3. Example:

    # In Job A
    df = ...  # Your DataFrame
    df.createGlobalTempView("my_view")
    
    # In Job B (child job)
    spark.sql("SELECT * FROM global_temp.my_view").show()
    
  4. Workaround:

    • If youโ€™re unable to run both jobs in the same Spark application (e.g., different clusters or separate sessions), consider persisting the data from Job A (global temporary view) to a more permanent storage (e.g., Delta Lake, Parquet files, or a database table).
    • Then, in Job B, read the data from the persisted storage instead of relying on the global temporary view.

Remember that global temporary views are scoped to the application, so as long as both jobs are part of the same application, you can access the global temporary view in the child job. If not, consider using a more persistent storage solution for sharing data between jobs123. ๐Ÿ˜Š

 

erigaud
Honored Contributor

Hello  @Kaniz_Fatma ,

thank you for the very detailed answer. If I understand correctly there is no way to do this using temp views and using a Job Cluster ? I need in the case to use the same All-purpose for all my tasks in order to remain in the same spark application ? 

rahuja
New Contributor III

Hello @Kaniz_Fatma  and @erigaud 

I am also having the same issue. We have multiple tasks inside databricks jobs which require to share dictionaries of dataframes amongst them. Is there any way we can do this? Initially we thought TaskValues might help but seems like you can not send big data loads in them even if they are json serialisable. Any ideas as to how we can do this?

ranged_coop
Valued Contributor II

just curious, will using the same job cluster within the same workflow work ? Theoretically they should work. If it is across jobs with different job clusters they may not work and persistent tables are the solution. you could drop the table at the end of the flow with a generic task.

rahuja
New Contributor III

Hi @ranged_coop 
Yes, we are using the same job compute for using different workflows. But I think different tasks are like different docker containers so that is why it becomes an issue. It would be nice if you can explain a bit about the approach you think would work to access data from one task to another?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group