cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Re-Using Datasets inside the Same SQL Dashboard

mtreigelman
New Contributor III

Hi folks, I am creating a SQL dashboard and want to know if I can re-use datasets within the same dashboard. 

The screenshot below captures what I would like to do pretty well, but to summarize... I need to run an computationally expensive query and want to visualize those results in a figure on the dashboard. I then need to aggregate that table further to create additional figures. Is there a way I can call the first query again so that my SQL warehouse does not need to run that expensive first query twice? 

databricks_dash_dataset_recycle_question.png

6 REPLIES 6

Walter_C
Databricks Employee
Databricks Employee

You should be able to do it by using static widgets as mentioned in docs https://docs.databricks.com/en/dashboards/index.html#static-widget-parameters  

seans
New Contributor III

This still needs a solution. @Walter_C gave a reference to having two visualizations use the same dataset. This question (and my current problem) is about referencing a SQL dataset as if it was a view inside another SQL dataset in the dashboard. Dataset "my_agg" references dataset "my_raw" in the same dashboard (this is before any visualizations are defined, so parameters and widgets have no bearing yet). 

sonynbcu
New Contributor II

Bumping this. I also have a use case where it would be beneficial to reference one dataset from another dataset.

rskuntoji24
New Contributor II

Would appreciate a solution on this at the earliest as well. Thanks!

mtreigelman
New Contributor III

@rskuntoji24 My current workaround has been to create the first query as a materialized view in my catalog, and then run a workflow daily to update the view. 

The downsides to this strategy is that it might not be practical for streaming data (not an issue for me personally), and creating a materialized view is saving the data that the query outputs somewhere (so storage requirements are increasing). I am not sure exactly where materialized view data is being stored, but I would guess somewhere on your cloud provider... perhaps someone like @Walter_C  could confirm. 

SFDataEng
New Contributor III

Agreement

Yes, a feature to support nested datasets in Databricks AI/BI dashboards would seem convenient, e.g. as in both examples in the OP's screenshot.

Specific need

A use case that comes to mind would be to avoid the maintenance of parallel code across dashboard SQL datasets that contains parameters driven by dashboard widgets/filters, which might ideally be defined and located once in an "inner" dataset.

Why it's needed

The maintenance inefficiencies that arise in the absence of such a feature can include, for example,

  • keeping a common CTE in sync across datasets, and also
  • linking each and every dashboard filter to each and every instance of that parameter's usage across all datasets.

The pain

This parallel definition has potential to multiply out rather unsuitably for a dashboard with many (common) dashboard-wide filter widgets/parameters and many charts, each having its own custom dataset meticulously defining metrics, in terms of dashboard construction, maintenance, and reliability.  Nonetheless, the "hard/inefficient-mantenace way" seems possible as a brute force workaround to the lack of "nested dataset" support.

Scope and non-alternatives

Obviously, this problem/opportunity applies mainly to datasets that are "dynamic" and not suitable for being fully pre-materialized; pipelining a table or materialized view does not solve this problem.  To be clear, regular views also are not suitable because views cannot contain parameters.  (If view supported `:parameters` to be filled in at run time, that could be an option).

Example in another dashboard tool

In contrast, and perhaps for inspiration, dashboarding in Apache Superset supports a built-in dataset() macro for n-level nested dataset usage, which works quite nicely.  This is basically the ask in the OP -- to be able to query/reference another dataset in the same query.

Possible alternative approach in Databricks

A potential alternative that appears to exist within the Databricks ecosystem for further exploration might be the Unity catalog metric views.  Still, possibility of nested Dashboard datasets, to make use of "common parameters" in the core/central/sub-datasets, would be a most welcome feature addition, since some use cases might not rise to the occasion of adopting a whole "metrics framework" as it were.