<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic SQL cell v spark.sql in notebooks in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/sql-cell-v-spark-sql-in-notebooks/m-p/139125#M11022</link>
    <description>&lt;P&gt;I am fairly new to Databricks, and indeed Python, so apologies if this has been answered elsewhere but I've been unable to find it.&lt;/P&gt;&lt;P&gt;I have been mainly working in notebooks as opposed to the SQL editor, but coding in SQL where possible using SQL cells or the %sql magic command. I have also occasionally used SQL within spark.sql() in Python cells. To my untrained eye these appear to be two ways of doing the same thing so I'm wondering what the differences are, if any? I am guessing spark.sql gives greater flexibility to combine SQL with Python but are there any other advantages or disadvantages of either method that I should be aware of?&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
    <pubDate>Fri, 14 Nov 2025 17:15:41 GMT</pubDate>
    <dc:creator>CookDataSol</dc:creator>
    <dc:date>2025-11-14T17:15:41Z</dc:date>
    <item>
      <title>SQL cell v spark.sql in notebooks</title>
      <link>https://community.databricks.com/t5/get-started-discussions/sql-cell-v-spark-sql-in-notebooks/m-p/139125#M11022</link>
      <description>&lt;P&gt;I am fairly new to Databricks, and indeed Python, so apologies if this has been answered elsewhere but I've been unable to find it.&lt;/P&gt;&lt;P&gt;I have been mainly working in notebooks as opposed to the SQL editor, but coding in SQL where possible using SQL cells or the %sql magic command. I have also occasionally used SQL within spark.sql() in Python cells. To my untrained eye these appear to be two ways of doing the same thing so I'm wondering what the differences are, if any? I am guessing spark.sql gives greater flexibility to combine SQL with Python but are there any other advantages or disadvantages of either method that I should be aware of?&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 14 Nov 2025 17:15:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/sql-cell-v-spark-sql-in-notebooks/m-p/139125#M11022</guid>
      <dc:creator>CookDataSol</dc:creator>
      <dc:date>2025-11-14T17:15:41Z</dc:date>
    </item>
    <item>
      <title>Re: SQL cell v spark.sql in notebooks</title>
      <link>https://community.databricks.com/t5/get-started-discussions/sql-cell-v-spark-sql-in-notebooks/m-p/139149#M11024</link>
      <description>&lt;P&gt;Greetings&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/154636"&gt;@CookDataSol&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Great question—this trips up a lot of folks when starting with Databricks. In a notebook attached to an all-purpose cluster, both SQL cells (%sql) and spark.sql(...) ultimately execute the same Spark SQL engine against the notebook’s SparkSession, so results are comparable; the choice is mostly about ergonomics and how much Python you want to mix in.&lt;/P&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Short answer&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Use a dedicated SQL cell or the &lt;STRONG&gt;%sql magic&lt;/STRONG&gt; when you’re writing mostly SQL, want rich result rendering, and plan to chain SQL queries in the notebook UI.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Use &lt;STRONG&gt;spark.sql(...)&lt;/STRONG&gt; in a Python cell when you want the query’s result as a DataFrame you can immediately manipulate in Python, or you need to compose SQL with Python control flow and libraries.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Key differences and when it matters&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Compute and engine&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;On an all-purpose cluster in a notebook, both SQL cells and spark.sql run on the same SparkSession for that notebook.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;SQL cells can also target a &lt;STRONG&gt;SQL warehouse&lt;/STRONG&gt; from a notebook, which is different compute optimized for SQL (and may have different configuration capabilities). Python cells do not run on a SQL warehouse.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Ergonomics and results&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;SQL cells render a nice table UI and charts directly in the cell, and they automatically expose the result as an implicit DataFrame named &lt;STRONG&gt;_sqldf&lt;/STRONG&gt; for use in later cells.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;spark.sql(...) returns a regular DataFrame that you assign to a variable (df = ...), which is ideal if you want to immediately call DataFrame operations, UDFs, display(df), write(), etc. SQL syntax highlighting also works inside Python strings in spark.sql calls.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Language mixing and state&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;When you switch cell languages (e.g., from %python to %sql), those run in distinct REPLs; variables defined in one language aren’t automatically available in the other. Share state via tables/views, files, or widgets.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;For parameters across languages in notebooks, &lt;STRONG&gt;widgets&lt;/STRONG&gt; are the standard approach; SQL can safely consume widget values via parameter markers (for example, IDENTIFIER(:db)).&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Parameters and variables in SQL&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;In notebooks, use widgets + parameter markers to pass values into SQL safely (for identifiers, wrap with IDENTIFIER()).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;You can also use SQL’s SET to change session-level Spark SQL settings in SQL cells; spark.conf.set(...) is the Python analog.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Parallel execution (handy for exploratory work)&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;SQL cells can run in parallel with currently running commands when the notebook is attached to an interactive cluster.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Limitations/nuances to remember&lt;/P&gt;
&lt;UL class="qt3gz98 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;The implicit &lt;STRONG&gt;_sqldf&lt;/STRONG&gt; variable from a SQL cell isn’t available if the notebook is running on a SQL warehouse, and there are version-specific limitations noted in the docs.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Databricks SQL warehouses allow only a limited, aliased set of Spark configurations (admins can set data access configs globally). This differs from notebooks on clusters where you can set many confs per SparkSession.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Practical examples&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Equivalent query in a SQL cell vs Python&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;SQL cell:&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql qt3gz9e hljs language-sql _1ymogdh2"&gt;&lt;SPAN class="hljs-comment"&gt;-- in a SQL cell or %sql&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;SELECT&lt;/SPAN&gt; col1, &lt;SPAN class="hljs-built_in"&gt;COUNT&lt;/SPAN&gt;(&lt;SPAN class="hljs-operator"&gt;*&lt;/SPAN&gt;) &lt;SPAN class="hljs-keyword"&gt;AS&lt;/SPAN&gt; c
&lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; catalog.schema.table
&lt;SPAN class="hljs-keyword"&gt;GROUP&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;BY&lt;/SPAN&gt; col1
&lt;SPAN class="hljs-keyword"&gt;ORDER&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;BY&lt;/SPAN&gt; c &lt;SPAN class="hljs-keyword"&gt;DESC&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;Python cell:&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;df = spark.sql(&lt;SPAN class="hljs-string"&gt;"""
  SELECT col1, COUNT(*) AS c
  FROM catalog.schema.table
  GROUP BY col1
  ORDER BY c DESC
"""&lt;/SPAN&gt;)
display(df)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Using _sqldf from a previous SQL cell&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql qt3gz9e hljs language-sql _1ymogdh2"&gt;&lt;SPAN class="hljs-comment"&gt;-- SQL cell 1&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;SELECT&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;*&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; catalog.schema.table LIMIT &lt;SPAN class="hljs-number"&gt;100&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-comment"&gt;# Python cell 2&lt;/SPAN&gt;
&lt;SPAN class="hljs-comment"&gt;# _sqldf is the DataFrame produced by the last SQL cell&lt;/SPAN&gt;
display(_sqldf)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Passing parameters with widgets (safe in SQL)&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-comment"&gt;# Python cell: create widgets&lt;/SPAN&gt;
dbutils.widgets.dropdown(&lt;SPAN class="hljs-string"&gt;"database"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"main"&lt;/SPAN&gt;, [d.name &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; d &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; spark.catalog.listDatabases()])
dbutils.widgets.text(&lt;SPAN class="hljs-string"&gt;"tbl"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"my_table"&lt;/SPAN&gt;)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="go8b9g1 _7pq7t6cl" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql qt3gz9e hljs language-sql _1ymogdh2"&gt;&lt;SPAN class="hljs-comment"&gt;-- SQL cell: safely use widget values&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;SHOW&lt;/SPAN&gt; TABLES &lt;SPAN class="hljs-keyword"&gt;IN&lt;/SPAN&gt; IDENTIFIER(:database);

&lt;SPAN class="hljs-keyword"&gt;SELECT&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;*&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; IDENTIFIER(CONCAT(:database, &lt;SPAN class="hljs-string"&gt;'.'&lt;/SPAN&gt;, :tbl))
LIMIT &lt;SPAN class="hljs-number"&gt;10&lt;/SPAN&gt;;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g3 _7pq7t62y _7pq7t6cm _7pq7t6ay _7pq7t6bo"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Bottom line&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Prefer a SQL cell or %sql when your workflow is predominantly SQL and you want great inline tables/charts, the implicit _sqldf bridge to Python, parameterized widgets, and optionally the ability to run on a SQL warehouse.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Prefer spark.sql in Python when you need to weave SQL with Python logic, immediately manipulate results as a DataFrame, or integrate with Python libraries and DataFrame APIs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Hope this helps, Louis.&lt;/P&gt;</description>
      <pubDate>Sat, 15 Nov 2025 03:53:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/sql-cell-v-spark-sql-in-notebooks/m-p/139149#M11024</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-11-15T03:53:31Z</dc:date>
    </item>
    <item>
      <title>Re: SQL cell v spark.sql in notebooks</title>
      <link>https://community.databricks.com/t5/get-started-discussions/sql-cell-v-spark-sql-in-notebooks/m-p/139293#M11033</link>
      <description>&lt;P&gt;Thanks Louis, really good explanation and helpful examples!&lt;/P&gt;</description>
      <pubDate>Mon, 17 Nov 2025 10:06:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/sql-cell-v-spark-sql-in-notebooks/m-p/139293#M11033</guid>
      <dc:creator>CookDataSol</dc:creator>
      <dc:date>2025-11-17T10:06:10Z</dc:date>
    </item>
  </channel>
</rss>

