cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Integrating PySpark DataFrame into SQL Dashboard for Enhanced Visualization

amarnathpal
Visitor

I have created a DataFrame in a notebook using PySpark and am considering creating a fully-featured dashboard in SQL. My question is whether I need to first store the DataFrame as a table in order to use it in the dashboard, or if it's possible to directly use the DataFrame without storing it as table. While I attempted to display data and create visuals directly from the notebook, I noticed that this approach lacks some of the robust features available in an SQL dashboard. Could you advise on the best approach to integrate the DataFrame into a feature-rich SQL dashboard?

2 ACCEPTED SOLUTIONS

Accepted Solutions

norma44shah
Visitor

To integrate a PySpark DataFrame into a feature-rich SQL dashboard, it's best to store the DataFrame as a table in your database. This allows you to use SQL for complex queries and leverages the advanced features of SQL dashboards. Simply write the DataFrame to a SQL database table and connect your dashboard tool (like Tableau or Power BI) to that database. This ensures data persistence and compatibility with robust visualization tools.

View solution in original post

hari-prasad
Contributor III

Sorry, I vaugely remember we used to create persistent views on dataframe earlier.

Currently, spark dataframe doesn't allow you to create pesistent view on dataframe, rather you have to create table to use it in SQL warehouse.

# Assuming there is an existing table named 'existing_table'
df = spark.table("system.information_schema.catalog_privileges")

df.createOrReplaceTempView("temp_view") # Temp View at sesion levle, once session is closed, the view is dropped
df.createOrReplaceGlobalTempView("gbl_temp_view") # Global Temp View accessible across all sessions, even if the current session is closed, the view is still accessible.

spark.table("global_temp.gbl_temp_view").display()

hariprasad_0-1736426924611.png

And you cannot create persistant view on top of dataframe or temp-view, which throws error.

hariprasad_1-1736427069415.png

Instead, you can create view on tables (even joins) directly which can be accessed in SQL warehouse, but you have to recreate the spark logic into SQL for that you can leverage Databricks Assitance to convert it with less effort.

create
or replace view hive_metastore.labuser8777982_1736395171_wa99_da_adewd_lab.sys_catalog_privileges_new as
select *
from system.information_schema.catalog_privileges -- this is actual table in system schema
where privilege_type = 'EXECUTE'

hariprasad_2-1736427349716.png

 

Regards,

Hari Prasad.



Regards,
Hari Prasad

View solution in original post

4 REPLIES 4

norma44shah
Visitor

To integrate a PySpark DataFrame into a feature-rich SQL dashboard, it's best to store the DataFrame as a table in your database. This allows you to use SQL for complex queries and leverages the advanced features of SQL dashboards. Simply write the DataFrame to a SQL database table and connect your dashboard tool (like Tableau or Power BI) to that database. This ensures data persistence and compatibility with robust visualization tools.

hari-prasad
Contributor III

Hi @amarnathpal,

You can create a View (permanent view) on top dataframe which you can leverage in any SQL Dashboard. The View created will hold the logic of dataframe used to perform transformation.

Also, you can create materialized view, which are preferred for Dashboard scenarios.

pyspark.sql.DataFrame.createOrReplaceGlobalTempView — PySpark 3.5.4 documentation

Regards,

Hari Prasad



Regards,
Hari Prasad

Could you guide me on how to store a view in a table? For example, if I create a view table in a notebook, how can I locate it using an SQL editor later?
ex: i have created this in notebook 

active__df.createOrReplaceTempView("active_view")

hari-prasad
Contributor III

Sorry, I vaugely remember we used to create persistent views on dataframe earlier.

Currently, spark dataframe doesn't allow you to create pesistent view on dataframe, rather you have to create table to use it in SQL warehouse.

# Assuming there is an existing table named 'existing_table'
df = spark.table("system.information_schema.catalog_privileges")

df.createOrReplaceTempView("temp_view") # Temp View at sesion levle, once session is closed, the view is dropped
df.createOrReplaceGlobalTempView("gbl_temp_view") # Global Temp View accessible across all sessions, even if the current session is closed, the view is still accessible.

spark.table("global_temp.gbl_temp_view").display()

hariprasad_0-1736426924611.png

And you cannot create persistant view on top of dataframe or temp-view, which throws error.

hariprasad_1-1736427069415.png

Instead, you can create view on tables (even joins) directly which can be accessed in SQL warehouse, but you have to recreate the spark logic into SQL for that you can leverage Databricks Assitance to convert it with less effort.

create
or replace view hive_metastore.labuser8777982_1736395171_wa99_da_adewd_lab.sys_catalog_privileges_new as
select *
from system.information_schema.catalog_privileges -- this is actual table in system schema
where privilege_type = 'EXECUTE'

hariprasad_2-1736427349716.png

 

Regards,

Hari Prasad.



Regards,
Hari Prasad

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group