yesterday
I have created a DataFrame in a notebook using PySpark and am considering creating a fully-featured dashboard in SQL. My question is whether I need to first store the DataFrame as a table in order to use it in the dashboard, or if it's possible to directly use the DataFrame without storing it as table. While I attempted to display data and create visuals directly from the notebook, I noticed that this approach lacks some of the robust features available in an SQL dashboard. Could you advise on the best approach to integrate the DataFrame into a feature-rich SQL dashboard?
yesterday
To integrate a PySpark DataFrame into a feature-rich SQL dashboard, it's best to store the DataFrame as a table in your database. This allows you to use SQL for complex queries and leverages the advanced features of SQL dashboards. Simply write the DataFrame to a SQL database table and connect your dashboard tool (like Tableau or Power BI) to that database. This ensures data persistence and compatibility with robust visualization tools.
5 hours ago
Sorry, I vaugely remember we used to create persistent views on dataframe earlier.
Currently, spark dataframe doesn't allow you to create pesistent view on dataframe, rather you have to create table to use it in SQL warehouse.
# Assuming there is an existing table named 'existing_table'
df = spark.table("system.information_schema.catalog_privileges")
df.createOrReplaceTempView("temp_view") # Temp View at sesion levle, once session is closed, the view is dropped
df.createOrReplaceGlobalTempView("gbl_temp_view") # Global Temp View accessible across all sessions, even if the current session is closed, the view is still accessible.
spark.table("global_temp.gbl_temp_view").display()
And you cannot create persistant view on top of dataframe or temp-view, which throws error.
Instead, you can create view on tables (even joins) directly which can be accessed in SQL warehouse, but you have to recreate the spark logic into SQL for that you can leverage Databricks Assitance to convert it with less effort.
create
or replace view hive_metastore.labuser8777982_1736395171_wa99_da_adewd_lab.sys_catalog_privileges_new as
select *
from system.information_schema.catalog_privileges -- this is actual table in system schema
where privilege_type = 'EXECUTE'
Regards,
Hari Prasad.
yesterday
To integrate a PySpark DataFrame into a feature-rich SQL dashboard, it's best to store the DataFrame as a table in your database. This allows you to use SQL for complex queries and leverages the advanced features of SQL dashboards. Simply write the DataFrame to a SQL database table and connect your dashboard tool (like Tableau or Power BI) to that database. This ensures data persistence and compatibility with robust visualization tools.
yesterday - last edited yesterday
Hi @amarnathpal,
You can create a View (permanent view) on top dataframe which you can leverage in any SQL Dashboard. The View created will hold the logic of dataframe used to perform transformation.
Also, you can create materialized view, which are preferred for Dashboard scenarios.
pyspark.sql.DataFrame.createOrReplaceGlobalTempView — PySpark 3.5.4 documentation
Regards,
Hari Prasad
9 hours ago
Could you guide me on how to store a view in a table? For example, if I create a view table in a notebook, how can I locate it using an SQL editor later?
ex: i have created this in notebook
5 hours ago
Sorry, I vaugely remember we used to create persistent views on dataframe earlier.
Currently, spark dataframe doesn't allow you to create pesistent view on dataframe, rather you have to create table to use it in SQL warehouse.
# Assuming there is an existing table named 'existing_table'
df = spark.table("system.information_schema.catalog_privileges")
df.createOrReplaceTempView("temp_view") # Temp View at sesion levle, once session is closed, the view is dropped
df.createOrReplaceGlobalTempView("gbl_temp_view") # Global Temp View accessible across all sessions, even if the current session is closed, the view is still accessible.
spark.table("global_temp.gbl_temp_view").display()
And you cannot create persistant view on top of dataframe or temp-view, which throws error.
Instead, you can create view on tables (even joins) directly which can be accessed in SQL warehouse, but you have to recreate the spark logic into SQL for that you can leverage Databricks Assitance to convert it with less effort.
create
or replace view hive_metastore.labuser8777982_1736395171_wa99_da_adewd_lab.sys_catalog_privileges_new as
select *
from system.information_schema.catalog_privileges -- this is actual table in system schema
where privilege_type = 'EXECUTE'
Regards,
Hari Prasad.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group