sql query is not returning _sqldf.

Somia
New Contributor III

Notebooks in my workspace are not returning _sqldf when a sql query is run. 

If I run this code, it would give an error in second cell that _sqldf is not defined.

First Cell:

%sql

select * from some_table limit 10

Second Cell:

%sql

select * from _sqldf



However, the same code runs fine in other people's notebook in my organization. I suspect it started when I connected my notebook with sql warehouse as my whole notebook was in sql rather than an all purpose compute. Now I cannot run this above code even when I make a new notebook.

Can anybody suggest how to fix this.

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Somia,

If you run that function in all-purpose does it work fine?

Is there any equivalent to _sqldf for SQL Warehouses?

Somia
New Contributor III

I switched my notebook back to all purpose compute as _sqldf is not supported with sql warehouse notebook. It didn't work as explained above. 

MadhuB
Valued Contributor

@Somia 

To replicate your scenario, you need an All-Purpose Cluster and a notebook defaulted to python language. Then query a table using %sql as below. This creates a temp dataframe for you to use it in the python cells. Keep in mind, that this dataframe keep changing as you execute a different %sql cell.

 

%sql
-- cell 1
select * from catalog.schema.123_sample
# cell 2
display(_sqldf)

 

MadhuB_1-1738873741814.png

To summarize, the %sql magic command behaves differently depending on whether your Databricks notebook is connected to an All-Purpose cluster or a SQL Warehouse.

  1. All-Purpose Cluster: %sql creates a DataFrame named _sqldf that you can use in subsequent Python cells.
  2. SQL Warehouse: %sql executes the query but does not create the _sqldf DataFrame.


Please let me know for anything, else mark it as a solution.

JakubSkibicki
Contributor

@Somia _sqldf is a pyspark df not a sql object.

It works only for such direction 

exec sql -> call _sqldf in pyspark

Somia
New Contributor III

Changing the notebook to default python and all purpose compute have fixed the issue. I am able to access _sqldf in subsequent sql or python cell.

View solution in original post

MadhuB
Valued Contributor

@Somia Can you mark my detailed explanation as solution that helped to resolve your issue.