cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

sql query is not returning _sqldf.

Somia
New Contributor III

Notebooks in my workspace are not returning _sqldf when a sql query is run. 

If I run this code, it would give an error in second cell that _sqldf is not defined.

First Cell:

%sql

select * from some_table limit 10

Second Cell:

%sql

select * from _sqldf



However, the same code runs fine in other people's notebook in my organization. I suspect it started when I connected my notebook with sql warehouse as my whole notebook was in sql rather than an all purpose compute. Now I cannot run this above code even when I make a new notebook.

Can anybody suggest how to fix this.

1 ACCEPTED SOLUTION

Accepted Solutions

Somia
New Contributor III

Changing the notebook to default python and all purpose compute have fixed the issue. I am able to access _sqldf in subsequent sql or python cell.

View solution in original post

6 REPLIES 6

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Somia,

If you run that function in all-purpose does it work fine?

Somia
New Contributor III

I switched my notebook back to all purpose compute as _sqldf is not supported with sql warehouse notebook. It didn't work as explained above. 

MadhuB
Contributor III

@Somia 

To replicate your scenario, you need an All-Purpose Cluster and a notebook defaulted to python language. Then query a table using %sql as below. This creates a temp dataframe for you to use it in the python cells. Keep in mind, that this dataframe keep changing as you execute a different %sql cell.

 

%sql
-- cell 1
select * from catalog.schema.123_sample
# cell 2
display(_sqldf)

 

MadhuB_1-1738873741814.png

To summarize, the %sql magic command behaves differently depending on whether your Databricks notebook is connected to an All-Purpose cluster or a SQL Warehouse.

  1. All-Purpose Cluster: %sql creates a DataFrame named _sqldf that you can use in subsequent Python cells.
  2. SQL Warehouse: %sql executes the query but does not create the _sqldf DataFrame.


Please let me know for anything, else mark it as a solution.

JakubSkibicki
Contributor

@Somia _sqldf is a pyspark df not a sql object.

It works only for such direction 

exec sql -> call _sqldf in pyspark

Somia
New Contributor III

Changing the notebook to default python and all purpose compute have fixed the issue. I am able to access _sqldf in subsequent sql or python cell.

MadhuB
Contributor III

@Somia Can you mark my detailed explanation as solution that helped to resolve your issue.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group