โ07-19-2018 07:11 AM
I see the way to move from
python
to sql
is to create a temp view, and then access that dataframe from sql, and in a sql cell.
Now the question is, how can I have a
%sql
cell with a select
statement in it, and assign the result of that statement to a dataframe variable which I can then use in the next python
cell?
EDIT: I'm aware that I can run a SQL query in python and store the result in a dataframe. But what happens in reality is that I develop the SQL code in a
%sql
cell, and once it works I have to cut and paste the query back into a python cell and put it in quotes and loose the highlighting and all. This happens very frequently when I'm doing some data analysis where most of my code involves some SQL queries.
To me, it makes sense to add an option to the
select
statement such as into dataframe_name
, which would then register that dataframe in the python space, without actually running the query.
โ07-20-2018 05:48 AM
@adrinโ it may not be a parameter to select and not in one sql statement, but it is possible without having to copy and paste:
df1 = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
display(df1)
df1.createOrReplaceTempView('table1')
%sql
CREATE OR REPLACE TEMPORARY VIEW table2 AS SELECT * FROM table1
%sql
SELECT * FROM table2
df2 = spark.table('table2')
display(df2)
โ07-19-2018 09:13 AM
@adrin, why not just run the query in a python cell? Do something like below:
df = sqlContext.sql("select * from table limit 10")
โ07-20-2018 01:19 AM
I'm aware of this, but my question is not how I can do it in python, my question is how I can do it the other way around. Please see the edit.
โ07-20-2018 05:48 AM
@adrinโ it may not be a parameter to select and not in one sql statement, but it is possible without having to copy and paste:
df1 = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b"))
display(df1)
df1.createOrReplaceTempView('table1')
%sql
CREATE OR REPLACE TEMPORARY VIEW table2 AS SELECT * FROM table1
%sql
SELECT * FROM table2
df2 = spark.table('table2')
display(df2)
โ06-01-2022 05:46 AM
I don't have an answer, but I have the same question. I am used to Mathematica notebooks where the result of evaluating input [1] is always available in the variable %1, etc. Is something similar available in databricks? I don't think any of the answers above really answers that question. @adrinโ, have you found a better answer in the meantime?
โ08-03-2022 02:58 PM
โ06-01-2022 02:28 PM
Variables can't be accessed between API's this way.
See: Switch between %python variables and %sql variables in Databricks - Stack Overflow
In my opinion, the best way is to use the recommended answer above and create/update a tempview, or just run the query in sqlContext.sql()/spark.sql(). Creating a tempview from pyspark or capturing the result of a sql query in a dataframe isn't the same as accessing a python variable in SQL. The link above has some pretty good info in it.
โ06-06-2022 05:10 AM
Very an informative post! This post has a lot of information here that can help many students. However, TFTH is also one of the most reliable and professional academic help providers. They cover complete programming subjects whether it's computer science assignments or finance assignments. Students who are unable to manage time for assignments can get Assignment help online.
โ08-03-2022 02:58 PM
โ04-05-2023 06:33 AM
Results from an SQL cell are available as a Python DataFrame. The Python DataFrame name is _sqldf.
To save the DataFrame, run this code in a Python cell:
df = _sqldf
Keep in mind that the value in _sqldf is held in memory and will be replaced with the most recent results of each SQL cell run.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group