Databricks Community

adrin · ‎07-19-2018

I see the way to move from

python

to

sql

is to create a temp view, and then access that dataframe from sql, and in a sql cell.

Now the question is, how can I have a

%sql

cell with a

select

statement in it, and assign the result of that statement to a dataframe variable which I can then use in the next

python

cell?

EDIT: I'm aware that I can run a SQL query in python and store the result in a dataframe. But what happens in reality is that I develop the SQL code in a

%sql

cell, and once it works I have to cut and paste the query back into a python cell and put it in quotes and loose the highlighting and all. This happens very frequently when I'm doing some data analysis where most of my code involves some SQL queries.

To me, it makes sense to add an option to the

select

statement such as

into dataframe_name

, which would then register that dataframe in the python space, without actually running the query.

dmbaker · ‎07-20-2018

@adrin it may not be a parameter to select and not in one sql statement, but it is possible without having to copy and paste:

df1 = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
display(df1)

df1.createOrReplaceTempView('table1')

%sql
CREATE OR REPLACE TEMPORARY VIEW table2 AS SELECT * FROM table1

%sql
SELECT * FROM table2

df2 = spark.table('table2')
display(df2)

View solution in original post

john_lynch · ‎07-19-2018

@adrin, why not just run the query in a python cell? Do something like below:

df = sqlContext.sql("select * from table limit 10")

adrin · ‎07-20-2018

@jlynch

I'm aware of this, but my question is not how I can do it in python, my question is how I can do it the other way around. Please see the edit.

dmbaker · ‎07-20-2018

@adrin it may not be a parameter to select and not in one sql statement, but it is possible without having to copy and paste:

df1 = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
display(df1)

df1.createOrReplaceTempView('table1')

%sql
CREATE OR REPLACE TEMPORARY VIEW table2 AS SELECT * FROM table1

%sql
SELECT * FROM table2

df2 = spark.table('table2')
display(df2)

jlammens · ‎06-01-2022

I don't have an answer, but I have the same question. I am used to Mathematica notebooks where the result of evaluating input [1] is always available in the variable %1, etc. Is something similar available in databricks? I don't think any of the answers above really answers that question. @adrin, have you found a better answer in the meantime?

rjdagner · ‎08-03-2022

https://docs.databricks.com/notebooks/notebooks-use.html#explore-sql-cell-results-in-python-notebook...

Zachary_Higgins · ‎06-01-2022

Variables can't be accessed between API's this way.

See: Switch between %python variables and %sql variables in Databricks - Stack Overflow

In my opinion, the best way is to use the recommended answer above and create/update a tempview, or just run the query in sqlContext.sql()/spark.sql(). Creating a tempview from pyspark or capturing the result of a sql query in a dataframe isn't the same as accessing a python variable in SQL. The link above has some pretty good info in it.

Sjackson3289 · ‎06-06-2022

Very an informative post! This post has a lot of information here that can help many students. However, TFTH is also one of the most reliable and professional academic help providers. They cover complete programming subjects whether it's computer science assignments or finance assignments. Students who are unable to manage time for assignments can get Assignment help online.