topic Re: EXPLAIN PLAN parser in Data Engineering

EXPLAIN PLAN parser

noorbasha534 — Fri, 22 Aug 2025 22:34:54 GMT

hello all,

have you come across a freely available parser that does good job for parsing explain plans for SQL queries in databricks...

Re: EXPLAIN PLAN parser

WiliamRosa — Fri, 22 Aug 2025 23:53:54 GMT

Hi @noorbasha534,

See if this helps: you can run an EXPLAIN directly in the SQL editor. Try something like:

EXPLAIN FORMATTED SELECT c.id, SUM(s.val) v FROM sales s JOIN customers c ON s.cid = c.id WHERE s.dt >= '2025-08-01' GROUP BY c.id;

You can also view details via the “See performance” icon that appears after the query runs.

Hope that helps!

Re: EXPLAIN PLAN parser

noorbasha534 — Sat, 23 Aug 2025 01:20:01 GMT

@WiliamRosa thanks. What I meant is a parser that gives me filter columns, join columns from the explain plan🙂

Re: EXPLAIN PLAN parser

WiliamRosa — Sat, 23 Aug 2025 01:33:03 GMT

See if this helps:

Import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.expressions._

def extractFiltersAndJoins(df: DataFrame): (Set[String], Set[(String,String)]) = {
val plan = df.queryExecution.optimizedPlan
var filterCols = Set.empty[String]
var joinCols = Set.empty[(String,String)]

plan.foreach {
case Filter(condition, _) =>
condition.references.foreach(ref => filterCols += ref.sql)
case Join(left, right, _, condOpt, _) =>
condOpt.foreach { cond =>
cond.references.toSeq.combinations(2).foreach {
case Seq(a,b) => joinCols += (a.sql -> b.sql)
case _ =>
}
}
case _ =>
}
(filterCols, joinCols)
}

// use it
val df = spark.sql("""
SELECT c.id, SUM(s.val) v
FROM sales s JOIN customers c ON s.cid = c.id
WHERE s.dt >= DATE '2025-08-01'
GROUP BY c.id
""")
val (filters, joins) = extractFiltersAndJoins(df)
println("Filter columns: " + filters)
println("Join columns: " + joins)

Re: EXPLAIN PLAN parser

noorbasha534 — Sat, 23 Aug 2025 01:39:17 GMT

@WiliamRosa ah thanks much.

Just a final question - it seems the code executes the SQL query & then captures the columns from the plan? If so, my need is to go through around 1000 sqls each day & get the columns used in filters/joins/group by of these. So, executing them is not an option🙂

Re: EXPLAIN PLAN parser

WiliamRosa — Sat, 23 Aug 2025 01:50:04 GMT

got it. If execution is not an option, you can still extract columns without running the queries by parsing their AST.

Use Spark’s internal SQL parser (no execution)

You can parse to an unresolved logical plan and walk the tree for Filter, Join, and Aggregate nodes. This avoids query execution and catalog reads on data (though the analyzer may still look up objects if you resolve). In Scala:

import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.expressions._ val spark: SparkSession = SparkSession.getActiveSession.get val parser = spark.sessionState.sqlParser // parse only, no execution def colsFromExpr(e: Expression): Set[String] = e.references.map(_.sql).toSet case class Parsed(colsFilter: Set[String], colsJoin: Set[(String,String)], colsGroupBy: Set[String]) def scan(sql: String): Parsed = { val plan = parser.parsePlan(sql) // Unresolved plan (no execution) var filterCols = Set.empty[String] var joinCols = Set.empty[(String,String)] var groupCols = Set.empty[String] plan.foreach { case f: Filter => filterCols ++= colsFromExpr(f.condition) case j: Join => j.condition.foreach { cond => val refs = cond.references.toSeq.map(_.sql).distinct // pairwise for simple equi-joins for { i <- refs.indices j <- (i+1) until refs.length } joinCols += (refs(i) -> refs(j)) } case a: Aggregate => groupCols ++= a.groupingExpressions.flatMap(e => colsFromExpr(e)) case _ => } Parsed(filterCols, joinCols, groupCols) } // Example val sql = """SELECT c.id, SUM(s.val) FROM sales s JOIN customers c ON s.cid = c.id WHERE s.dt >= DATE '2025-08-01' GROUP BY c.id""" val out = scan(sql) println(s"Filter: ${out.colsFilter}") println(s"Joins: ${out.colsJoin}") println(s"Group By: ${out.colsGroupBy}")