cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pass Dataframe to child job in "Run Job" task

erigaud
Honored Contributor

Hello,

I have a Job A that runs a Job B, and Job A defines a globalTempView and I would like to somehow access it in the child job. 

Is that in anyway possible ? Can the same cluster be used for both jobs ? If it is not possible, does someone know of a work-around to do it ? Thank you very much ! 

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @erigaud, When you have a Job A that runs a Job B, and Job A defines a globalTempView, you might wonder how to access it in the child job.

Let’s break it down:

  1. Global Temporary Views:

    • A global temporary view is tied to the Spark application rather than a specific session or cluster. It is accessible across different Spark sessions within the same application.
    • When you create a global temporary view, it is registered in the catalog with a name prefixed by global_temp. (e.g., global_temp.my_view).
    • The lifetime of a global temporary view is tied to the Spark application. It exists as long as the application is running.
  2. Accessing Global Temporary Views in Child Jobs:

    • To access a global temporary view in a child job (Job B), you need to ensure that both Job A and Job B are part of the same Spark application.
    • If you’re using Databricks, you can run both jobs on the same cluster, and they will share the same Spark application context.
    • In your child job (Job B), you can directly query the global temporary view created by Job A using its fully qualified name (e.g., global_temp.my_view).
  3. Example:

    # In Job A
    df = ...  # Your DataFrame
    df.createGlobalTempView("my_view")
    
    # In Job B (child job)
    spark.sql("SELECT * FROM global_temp.my_view").show()
    
  4. Workaround:

    • If you’re unable to run both jobs in the same Spark application (e.g., different clusters or separate sessions), consider persisting the data from Job A (global temporary view) to a more permanent storage (e.g., Delta Lake, Parquet files, or a database table).
    • Then, in Job B, read the data from the persisted storage instead of relying on the global temporary view.

Remember that global temporary views are scoped to the application, so as long as both jobs are part of the same application, you can access the global temporary view in the child job. If not, consider using a more persistent storage solution for sharing data between jobs123. 😊

 

erigaud
Honored Contributor

Hello  @Kaniz ,

thank you for the very detailed answer. If I understand correctly there is no way to do this using temp views and using a Job Cluster ? I need in the case to use the same All-purpose for all my tasks in order to remain in the same spark application ? 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!