Databricks Community

amarnathpal · ‎01-08-2025

I have created a DataFrame in a notebook using PySpark and am considering creating a fully-featured dashboard in SQL. My question is whether I need to first store the DataFrame as a table in order to use it in the dashboard, or if it's possible to directly use the DataFrame without storing it as table. While I attempted to display data and create visuals directly from the notebook, I noticed that this approach lacks some of the robust features available in an SQL dashboard. Could you advise on the best approach to integrate the DataFrame into a feature-rich SQL dashboard?

norma44shah · ‎01-08-2025

To integrate a PySpark DataFrame into a feature-rich SQL dashboard, it's best to store the DataFrame as a table in your database. This allows you to use SQL for complex queries and leverages the advanced features of SQL dashboards. Simply write the DataFrame to a SQL database table and connect your dashboard tool (like Tableau or Power BI) to that database. This ensures data persistence and compatibility with robust visualization tools.

KMFusaFinance

View solution in original post

hari-prasad · ‎01-08-2025

Hi @amarnathpal,

You can create a View (permanent view) on top dataframe which you can leverage in any SQL Dashboard. The View created will hold the logic of dataframe used to perform transformation.

Also, you can create materialized view, which are preferred for Dashboard scenarios.

pyspark.sql.DataFrame.createOrReplaceGlobalTempView — PySpark 3.5.4 documentation

Regards,

Hari Prasad

Regards,
Hari Prasad

View solution in original post

hari-prasad · ‎01-09-2025

Sorry, I vaugely remember we used to create persistent views on dataframe earlier.

Currently, spark dataframe doesn't allow you to create pesistent view on dataframe, rather you have to create table to use it in SQL warehouse.

# Assuming there is an existing table named 'existing_table'
df = spark.table("system.information_schema.catalog_privileges")

df.createOrReplaceTempView("temp_view") # Temp View at sesion levle, once session is closed, the view is dropped
df.createOrReplaceGlobalTempView("gbl_temp_view") # Global Temp View accessible across all sessions, even if the current session is closed, the view is still accessible.

spark.table("global_temp.gbl_temp_view").display()

And you cannot create persistant view on top of dataframe or temp-view, which throws error.

Instead, you can create view on tables (even joins) directly which can be accessed in SQL warehouse, but you have to recreate the spark logic into SQL for that you can leverage Databricks Assitance to convert it with less effort.

create
or replace view hive_metastore.labuser8777982_1736395171_wa99_da_adewd_lab.sys_catalog_privileges_new as
select *
from system.information_schema.catalog_privileges -- this is actual table in system schema
where privilege_type = 'EXECUTE'

Regards,

Hari Prasad.

Regards,
Hari Prasad

View solution in original post

norma44shah · ‎01-08-2025

To integrate a PySpark DataFrame into a feature-rich SQL dashboard, it's best to store the DataFrame as a table in your database. This allows you to use SQL for complex queries and leverages the advanced features of SQL dashboards. Simply write the DataFrame to a SQL database table and connect your dashboard tool (like Tableau or Power BI) to that database. This ensures data persistence and compatibility with robust visualization tools.

KMFusaFinance

hari-prasad · ‎01-08-2025

Hi @amarnathpal,

You can create a View (permanent view) on top dataframe which you can leverage in any SQL Dashboard. The View created will hold the logic of dataframe used to perform transformation.

Also, you can create materialized view, which are preferred for Dashboard scenarios.

pyspark.sql.DataFrame.createOrReplaceGlobalTempView — PySpark 3.5.4 documentation

Regards,

Hari Prasad

Regards,
Hari Prasad

amarnathpal · ‎01-09-2025

Could you guide me on how to store a view in a table? For example, if I create a view table in a notebook, how can I locate it using an SQL editor later?
ex: i have created this in notebook

active__df.createOrReplaceTempView("active_view")

hari-prasad · ‎01-09-2025

Sorry, I vaugely remember we used to create persistent views on dataframe earlier.

Currently, spark dataframe doesn't allow you to create pesistent view on dataframe, rather you have to create table to use it in SQL warehouse.

# Assuming there is an existing table named 'existing_table'
df = spark.table("system.information_schema.catalog_privileges")

df.createOrReplaceTempView("temp_view") # Temp View at sesion levle, once session is closed, the view is dropped
df.createOrReplaceGlobalTempView("gbl_temp_view") # Global Temp View accessible across all sessions, even if the current session is closed, the view is still accessible.

spark.table("global_temp.gbl_temp_view").display()

And you cannot create persistant view on top of dataframe or temp-view, which throws error.

Instead, you can create view on tables (even joins) directly which can be accessed in SQL warehouse, but you have to recreate the spark logic into SQL for that you can leverage Databricks Assitance to convert it with less effort.

create
or replace view hive_metastore.labuser8777982_1736395171_wa99_da_adewd_lab.sys_catalog_privileges_new as
select *
from system.information_schema.catalog_privileges -- this is actual table in system schema
where privilege_type = 'EXECUTE'

Regards,

Hari Prasad.

Regards,
Hari Prasad