You can obtain the query execution plan programmatically using the EXPLAIN
statement in SQL. The EXPLAIN
statement displays the execution plan that the database planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned โ by plain sequential scan, index scan, etc. โ and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table.
Here is an example of how you can use it:
query = "SELECT * FROM table"
plan = spark.sql(f"EXPLAIN {query}")
plan.show(truncate=False)
This will return a DataFrame with a single row and column that contains the execution plan as a string.
EXPLAIN
command will only provide the logical and physical plans. It will not provide the runtime details like how much time each stage took, how much data was read, etc. For that level of detail, you would need to parse the Spark UI or logs.