cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to access the result of a %sql cell from python

adrin
New Contributor III

I see the way to move from

python
to
sql
is to create a temp view, and then access that dataframe from sql, and in a sql cell.

Now the question is, how can I have a

%sql
cell with a
select
statement in it, and assign the result of that statement to a dataframe variable which I can then use in the next
python
cell?

EDIT: I'm aware that I can run a SQL query in python and store the result in a dataframe. But what happens in reality is that I develop the SQL code in a

%sql
cell, and once it works I have to cut and paste the query back into a python cell and put it in quotes and loose the highlighting and all. This happens very frequently when I'm doing some data analysis where most of my code involves some SQL queries.

To me, it makes sense to add an option to the

select
statement such as
into dataframe_name
, which would then register that dataframe in the python space, without actually running the query.

1 ACCEPTED SOLUTION

Accepted Solutions

dmbaker
New Contributor III

@adrinโ€‹ it may not be a parameter to select and not in one sql statement, but it is possible without having to copy and paste:

df1 = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
display(df1)
df1.createOrReplaceTempView('table1')
%sql
CREATE OR REPLACE TEMPORARY VIEW table2 AS SELECT * FROM table1
%sql
SELECT * FROM table2
df2 = spark.table('table2')
display(df2)

View solution in original post

9 REPLIES 9

john_lynch
New Contributor II

@adrin, why not just run the query in a python cell? Do something like below:

df = sqlContext.sql("select * from table limit 10")

adrin
New Contributor III

@jlynch

I'm aware of this, but my question is not how I can do it in python, my question is how I can do it the other way around. Please see the edit.

dmbaker
New Contributor III

@adrinโ€‹ it may not be a parameter to select and not in one sql statement, but it is possible without having to copy and paste:

df1 = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
display(df1)
df1.createOrReplaceTempView('table1')
%sql
CREATE OR REPLACE TEMPORARY VIEW table2 AS SELECT * FROM table1
%sql
SELECT * FROM table2
df2 = spark.table('table2')
display(df2)

jlammens
New Contributor II

I don't have an answer, but I have the same question. I am used to Mathematica notebooks where the result of evaluating input [1] is always available in the variable %1, etc. Is something similar available in databricks? I don't think any of the answers above really answers that question. @adrinโ€‹, have you found a better answer in the meantime?  

rjdagner
New Contributor II

Zachary_Higgins
Contributor

Variables can't be accessed between API's this way.

See: Switch between %python variables and %sql variables in Databricks - Stack Overflow

In my opinion, the best way is to use the recommended answer above and create/update a tempview, or just run the query in sqlContext.sql()/spark.sql(). Creating a tempview from pyspark or capturing the result of a sql query in a dataframe isn't the same as accessing a python variable in SQL. The link above has some pretty good info in it.

Sjackson3289
New Contributor II

Very an informative post! This post has a lot of information here that can help many students. However, TFTH is also one of the most reliable and professional academic help providers. They cover complete programming subjects whether it's computer science assignments or finance assignments. Students who are unable to manage time for assignments can get Assignment help online.

rjdagner
New Contributor II

dogwoodlx
New Contributor II

Results from an SQL cell are available as a Python DataFrame. The Python DataFrame name is _sqldf.

To save the DataFrame, run this code in a Python cell:

df = _sqldf

Keep in mind that the value in _sqldf is held in memory and will be replaced with the most recent results of each SQL cell run.

https://docs.databricks.com/notebooks/notebooks-use.html#explore-sql-cell-results-in-python-notebook...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group