cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
holly
Databricks Employee
Databricks Employee

Firstly. 

Don’t make a demo and then release it under its old name Lakeview, 6 weeks before the name changes to AI/BI, because now no one in the future knows what you're on about.

holly_0-1728986657129.png

Secondly, I spoke at length about how to make a dashboard; what are the buttons to click, how to change the colours, how to share it. But I think I missed the why. Especially when there's so many dashboarding tools out there, it can be easy to mistake it for 'just another hosted bar chart'. 

In this blog I'd like to cover off the why, and challenge the way that we view and use BI tools.

Lakeview was rebranded to AI/BI and announced with much fanfare at Data + AI Summit. It’s pretty much the golden child of AI use cases:
Helps people bring meaning to their data
Brings insights to the masses 
Starts to remove the bottleneck of skilled practitioners

So what are these BI features?

  1. Dashboards - you know what a dashboard is
  2. Genie spaces - the ability to use plain English to ask questions about your data

holly_1-1728986714548.png

Once you or a data SME has set up a Genie space, you can ask it questions about data. You can ask follow on questions and for further breakdowns, and also to plot the data in various ways to help visualise the results.

…and the AI bit? 

The AI fairy has sprinkled her magic throughout this part of the product in some obvious and not so obvious ways.

The Obvious AI

With dashboards, once you’ve pointed it to the right dataset, you can use plain English to explain the chart you want. 

holly_3-1728989100008.png

This feature is ...fine. I know other BI tools have started doing this, but unless you’re building dashboards all day every day, I don’t think this is going to save much time.

Much more impressive is Genie; a RAG type chatbot for your data. I’ve had complaints that it’s “baby stuff” and no serious developer would use it. But hear me out, this isn’t for serious developers who can write their own scala to find answers. This is for the management types that want answers to questions but don’t want to bother an analyst type to get them. Let’s not forget that in the data world, most managers started their careers doing analysis in excel or with very basic SQL. They are data literate, they’ve just not flexed that muscle in quite a while. 

There's a Genie integration for Dashboards. If you're supplied with a dashboard that sort of answers your question, but you need some more detail or slicing & dicing, you can hit 'Ask Genie' instead of going back to the Dashboard creator.

holly_2-1728986770840.png

Their reaction to this feature surprised me. Management types admit to loving Genie, but in hushed clandestine tones, almost like they’ve found some secret hack to getting instant answers. My hypothesis is it feels like some bureaucracy workaround - no longer do their questions need to get ticketed, scoped, prioritised and allocated to the analysis team, they can just …get answers. 

The Not So Obvious AI 

It’s easy to roll your eyes at an assistant that saves 60 seconds on chart creation, but in Databricks a lot of the AI is working hard underneath to add performance and context to your data.

holly_6-1728989595366.jpeg

AI generated column descriptors 

Most data is labelled terribly. And even when there is a dictionary, there’s strong odds it’s in an excel file somewhere. 

The AI assistant can make an educated guess about what your tables and columns contain, but crucially, it gives you the chance to overwrite or update what’s being written about. These descriptions are stored within Unity Catalog and will add context to interactions with the assistant. 

holly_5-1728989141274.png

This doesn’t have to be done before using Dashboards and Genie Spaces as it’ll be done automatically. However, if you do it beforehand to include changes and corrections then the AI assistant can be a lot smarter in the way it interacts with humans. 

AI in the Query Engine

Underneath the compute are complex algorithms figuring out what’s the fastest way to process your data. Yes, some of this is built on Apache Spark, but with additions like faster planning and faster data retrieval. Rather than one giant optimization, there’s lots of smaller improvements coming together. We used to publicise these improvements individually, but people got fatigued by many “obscure improvement makes queries faster” posts, but at Data + AI summit we were happy to announce the 73% improvement over 2 years of development. 

AI for Data Layouts

Bad layouts lead to bad performance. Back in the day it was easy to shoot yourself in the foot with overparitioning and a lacklustre OPTIMIZE or VACUUM schedule. I don’t think it’s reasonable that someone who wants to make a few dashboards should have to learn about file protocols and statistics, and with liquid clustering and predictive optimization this should become a thing of the past.

If you’re reading this and thinking “what on earth is a data layout?” I aggressively recommend you find someone who can help you turn on Predictive Optimization. It’ll speed up your queries and save a bit of money on storage. 

AI for Workload Management

Your dashboard refreshes and Genie questions shouldn’t be held up behind some behemoth job that someone else is running. If using SQL warehouses (serverless or not) you’ll benefit from smarter utlilisation, queuing for better compute elasticity and routing. This will benefit scheduled refreshes as well as interactive work. 

Word on the street

Now that AI/BI has been out for a while, I’ve had the chance to speak to people in the wild about what it does, and doesn’t, do. Here are the most overlooked features.

It’s laughably easy to get started 

If you’ve worked in an enterprise, you know damn well the pain involved getting a licence allocated and then connecting system A to system B to move some data. 

Dashboards and Genie spaces don’t have a licensing model. Instead you pay for the compute that you use. 

This is great for eliminating friction getting started  for anyone wanting to make business decisions based off data. 

So how much is compute? 

Most dashboards or genie spaces will only need a 2XS Warehouse to process data, billed at ~$3/hour. Warehouses are billed down to the second, so a daily refresh that takes 3 minutes will cost far less than a whole hour.

Dashboard development will need compute to be created. So if a dashboard took 4 hours to develop, that’s ~$12.

These numbers are all estimates. The number changes depending on region, cloud, discounts, but also whether you share your warehouses or need something bigger for heavier processing. For a realistic estimate, use this pricing page here, or contact your account team. 

You get the same goodies you would from ETL in Databricks

We’re still working in the same tech stack that the Data Engineers and Machine Learning Engineers are using. That means a lot of the tools they use are extended to use for Dashboards and Genie Spaces. And technically PowerBI or any other third party tool, but that’s a story for another time.

You can do advanced ETL in the tool

Any advanced SQL you might do for a DE pipeline, you can use for Dashboarding and Genie. This means joins are optimized, you can use AI SQL functions, complex type handling, you name it, you could do it for a dashboard.

Sometimes you need syntax precision to make the bar chart display exactly what you need. I once made an esoteric chart of consultant hours used above the line, and remaining hours below … with complex logic to add more hours accruing each month and some hours expiring. It was a pig to make but greatly appreciated by a small number of people.

Buuutttt… just because you can doesn’t mean you should every time. If this needs to be ETL that’s replicated elsewhere do your coworkers a favour and write the results somewhere. Don’t be a silo maker. 

You can ramp up the amount of data used

Again this is the same chunky data engineering engine underneath that people use to process terabytes of data.

….but do consider there’s a front end here. Just because you have 100k records doesn’t mean your browser will have the oomf to display all of them. 

Git-able

I don’t think that’s a word. But dashboards can be checked into Github which is great for version control and moving a dashboard through dev, test and prod environments. 

Unity Catalog funsies

Lineage! Discoverability! Access Control! 

All these things come as standard with no additional config or setup needed. 

It’s not a reskinned ReDash

In consulting we use the phrase “it was the right decision at the time”, and whilst the old ReDash dashboards had their charm, its drawbacks did grate. Cross filtering was a nightmare, browser performance wan’t snappy, moving between dev/test/prod was for the brave, and although not ReDash’s fault, the warehouse performance just wasn’t that great on small data.

All of this has been addressed with the new Dashboards; and usability and performance are still at the forefront of development. I once trashed dashboard performance by including a column that contained an entire binary file, the engineering team took it very seriously, and it got fixed almost immediately

Genie is a helper, not a fortune teller

Remember when CoPilot came out, and everyone thought it was the end of Software Engineers …and then it wasn’t? They bubble burst and people realised that Software Engineers don’t just write code all day. It’s easy to get carried away with what AI can do, so I’m here to keep your enthusiasm in check. 

What Genie can do

“What is the breakdown of the thing?”
“Can you plot the thing against the other thing?”
“Can you add this other category to the chart?”

What Genie can’t do

“Why does the line go up?” fundamentally, the reason the line goes up might not even be in the dataset. Ice cream sales are unlikely to be in a drowning dataset

“Are these things correlated?”  there’s no statistics modelling as of yet, but I imagine it’s on a roadmap somewhere. 

“Combine this data with other data”  although data can be joined together, this has to be done in the setup with SQL and can’t be done with plain English instructions.

I will caveat this by saying this is correct as of Oct-24. I know the team has received many enthusiastic requests to incorporate new features, so much like the original video, this will probably be out of date in 6 weeks too.

Death to the single use Dashboard

I love Genie spaces as a way to prevent the single use dashboard from being created. You know the ones - someone senior has a specific question one time so an entire dashboard gets created. This is meant to replace that as an antidote to that. People can iterate faster with their questions getting to the crux of their issues without needing to wait for a 2 week sprint to update the dashboard.

People have been harping on about data democratization for a long time now, and whilst People and Process are not absolved from this delivery, I do think that AI/BI is the Platform to make this a reality. 

Further Reading

Genie Documentation - Dashboard Documentation - Announcement Blog