11-25-2024 09:06 AM
Hi folks, I am creating a SQL dashboard and want to know if I can re-use datasets within the same dashboard.
The screenshot below captures what I would like to do pretty well, but to summarize... I need to run an computationally expensive query and want to visualize those results in a figure on the dashboard. I then need to aggregate that table further to create additional figures. Is there a way I can call the first query again so that my SQL warehouse does not need to run that expensive first query twice?
11-25-2024 03:49 PM
You should be able to do it by using static widgets as mentioned in docs https://docs.databricks.com/en/dashboards/index.html#static-widget-parameters
12-09-2024 06:48 PM
This still needs a solution. @Walter_C gave a reference to having two visualizations use the same dataset. This question (and my current problem) is about referencing a SQL dataset as if it was a view inside another SQL dataset in the dashboard. Dataset "my_agg" references dataset "my_raw" in the same dashboard (this is before any visualizations are defined, so parameters and widgets have no bearing yet).
02-06-2025 01:55 PM
Bumping this. I also have a use case where it would be beneficial to reference one dataset from another dataset.
03-13-2025 05:34 AM
Would appreciate a solution on this at the earliest as well. Thanks!
03-18-2025 08:16 AM
@rskuntoji24 My current workaround has been to create the first query as a materialized view in my catalog, and then run a workflow daily to update the view.
The downsides to this strategy is that it might not be practical for streaming data (not an issue for me personally), and creating a materialized view is saving the data that the query outputs somewhere (so storage requirements are increasing). I am not sure exactly where materialized view data is being stored, but I would guess somewhere on your cloud provider... perhaps someone like @Walter_C could confirm.
6m ago - last edited 5m ago
Yes, a feature to support nested datasets in Databricks AI/BI dashboards would seem convenient, e.g. as in both examples in the OP's screenshot.
A use case that comes to mind would be to avoid the maintenance of parallel code across dashboard SQL datasets that contains parameters driven by dashboard widgets/filters, which might ideally be defined and located once in an "inner" dataset.
The maintenance inefficiencies that arise in the absence of such a feature can include, for example,
This parallel definition has potential to multiply out rather unsuitably for a dashboard with many (common) dashboard-wide filter widgets/parameters and many charts, each having its own custom dataset meticulously defining metrics, in terms of dashboard construction, maintenance, and reliability. Nonetheless, the "hard/inefficient-mantenace way" seems possible as a brute force workaround to the lack of "nested dataset" support.
Obviously, this problem/opportunity applies mainly to datasets that are "dynamic" and not suitable for being fully pre-materialized; pipelining a table or materialized view does not solve this problem. To be clear, regular views also are not suitable because views cannot contain parameters. (If view supported `:parameters` to be filled in at run time, that could be an option).
In contrast, and perhaps for inspiration, dashboarding in Apache Superset supports a built-in dataset() macro for n-level nested dataset usage, which works quite nicely. This is basically the ask in the OP -- to be able to query/reference another dataset in the same query.
A potential alternative that appears to exist within the Databricks ecosystem for further exploration might be the Unity catalog metric views. Still, possibility of nested Dashboard datasets, to make use of "common parameters" in the core/central/sub-datasets, would be a most welcome feature addition, since some use cases might not rise to the occasion of adopting a whole "metrics framework" as it were.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now