Hi @erigaud, When you have a Job A that runs a Job B, and Job A defines a globalTempView
, you might wonder how to access it in the child job.
Letโs break it down:
-
Global Temporary Views:
- A global temporary view is tied to the Spark application rather than a specific session or cluster. It is accessible across different Spark sessions within the same application.
- When you create a global temporary view, it is registered in the catalog with a name prefixed by
global_temp.
(e.g., global_temp.my_view
).
- The lifetime of a global temporary view is tied to the Spark application. It exists as long as the application is running.
-
Accessing Global Temporary Views in Child Jobs:
- To access a global temporary view in a child job (Job B), you need to ensure that both Job A and Job B are part of the same Spark application.
- If youโre using Databricks, you can run both jobs on the same cluster, and they will share the same Spark application context.
- In your child job (Job B), you can directly query the global temporary view created by Job A using its fully qualified name (e.g.,
global_temp.my_view
).
-
Example:
df = ...
df.createGlobalTempView("my_view")
spark.sql("SELECT * FROM global_temp.my_view").show()
-
Workaround:
- If youโre unable to run both jobs in the same Spark application (e.g., different clusters or separate sessions), consider persisting the data from Job A (global temporary view) to a more permanent storage (e.g., Delta Lake, Parquet files, or a database table).
- Then, in Job B, read the data from the persisted storage instead of relying on the global temporary view.
Remember that global temporary views are scoped to the application, so as long as both jobs are part of the same application, you can access the global temporary view in the child job. If not, consider using a more persistent storage solution for sharing data between jobs123. ๐