cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
josh_melton
Contributor II
Contributor II

Introduction

Data and AI media will try to convince you of two terribly misguided ideas about AI’s impact on analytics. First of all, there’s the idea that prompt engineers who know just the right phrase to unlock the secret powers of generative AI will be the developers of tomorrow. LLMs are really good at producing many variants of text, and in the future prompt engineering will be done by language models that are trained to produce optimal prompts (check out DSPy as one example of “programming, not prompting” AI systems). For this reason, courses and blogs telling you to memorize various sets of instructions are simply a waste of time. 

The second misguided idea is that AI will wipe out analyst jobs because AI tools can generate code, particularly for a highly structured language like SQL. This is a common fear among people who are early in their analytics journey - I hear customers wonder if they should even bother learning analyst skills like data modeling or SQL (to be clear, there has never been a better time to learn technical skills!!). Both of these ideas give too much credit to LLMs’ ability to reason, and not enough credit to analysts whose jobs consist of far more than writing generic SQL queries. Furthermore, it’s already the case that non-technical users are often reliant on more experienced programmers to transform and model data. With the emergence of generative AI, the ability of non-technical users to independently create and harness business insights is made more dramatic, as they are no longer restricted by the bottleneck of assistance from data modelers and ETL developers.

It is true, however, that all of us will have to know how to guide AI systems in the right way. This sounds similar to the concept of a prompt engineer, but the skill set required won’t be knowing whether a model works better after being asked to “take a breath and think step by step” rather than being threatened with a high stakes scenario. Guiding AI systems will be about rigorously gathering requirements, converting requirements into precise success criteria, and providing the system with the right context to achieve that criteria. Rather than prompting, the mode of communication will be plugging in the right business context, providing the right data model, or iteratively designing the right abstractions for generative AI systems to be successful, all with the goal of eliminating the need to prompt the system in just the right way. 

Genie Spaces

The role of an analyst in creating an AI/BI Genie Space is a perfect example of these ideas. Genie is a compound AI system that, given a handful of tables, can write queries to answer simple questions in natural language. However, simply clicking “Create Genie Space” does not mean that an analyst’s job is done! Genie (and any other compound AI system) will not be able to anticipate the questions that an executive is likely to ask, it does not know how to model complex formulas specific to your business, and it can not do the high level planning required to solve abstract problems. Are we past the days of having to memorize the specifics of SQL syntax? Maybe so. Are we past the days of analysts doing the critical thinking required to answer a business’ most important questions? Definitely not. Like most jobs impacted by technology shifts, analysts’ jobs will be both unrecognizable and orders of magnitude more impactful as generative AI finds a foothold in their daily workflow.

In this article, we’ll walk through how an analyst might magnify their impact using Genie Spaces and Databricks Assistant. You can follow along by cloning this IOT example, which will set up a sample dataset, pipeline, and dashboard with a few clicks. To skip to the Genie demo, simply click “Run All” in notebook 00_setup, run the DLT pipeline it creates, and click “Run All” in notebook 04_actionable_insights. Follow the dashboard link in notebook 04_actionable_insights to test some of the approaches from this article for yourself. 

Getting Started with AI/BI

The dashboard development process is our first example of how an analyst can begin to guide the AI system to best answer the questions a business is asking. To build this dashboard, we start by working with consumers of the report to understand their requirements - what sort of questions do they want to answer with this report? Given these requirements, an analyst needs to work with the owners of the upstream data to select the right tables to answer the question, which requires the context they have about the business. With the help of Databricks Assistant, an analyst can simply describe their SQL query (assuming it’s not overly complex), describe their chart, and immediately a query and chart will be generated. These steps, however, were never the most important or difficult part of their job. The more menial aspects of the analyst role will shift to the more interesting challenges: combining domain-specific expertise, critical thinking, and communication.

josh_melton_0-1724170554601.png

The New Data Analyst Skill Set

The first skill of the New Data Analyst is adding business-specific context to enable Generative AI systems to answer common questions more accurately. First, the dashboard we’ve built can serve as the foundation for creating our text-to-insights Genie. This allows the consumers of our dashboard to ask ad-hoc questions of the data using natural language. Our Genie Space will be created with each of the queries that our dashboard referenced already embedded into the context of the Space. If your business is like every data organization I’ve ever worked with, however, the names of some of your tables and columns are borderline nonsensical. The best AI in the world won’t know that dim_met_dev_id represents device id, which is the same as ESN, which means Engine Serial Number in natural language. Similarly, it’s likely that you have columns that have specific requirements for filtering, such as using a three letter country ISO code instead of the colloquial country name. By applying your understanding of your data and domain, you can curate instructions that enable Genie to magnify your ability to provide useful answers to more specific questions. 

josh_melton_1-1724170554693.png josh_melton_2-1724170554643.png

  The second emergent skill of the New Data Analyst is to mitigate the risks of our system hallucinating by providing examples and defining trusted assets. It’s likely that many of the common relationships in your data model are difficult for Genie to piece together, even given some hints. If your data is in fact and dimension tables, Genie might not determine the correct join key, or that a join is required in the first place. The more complex the metric or data model, the more likely it is that Genie will need more guidance. In this instance, analysts will be tasked with anticipating the common requests and provide examples for Genie to follow. In addition to the queries from our dashboard, we might include extra examples that exemplify a complex metric or series of joins. The queries from our original dashboard are automatically provided as examples, but for deeper analysis we might provide more context or specific queries. For the particularly common or critical examples, you can create trusted assets. We can use trusted assets to define templated tools that Genie can use to answer questions. For example, a formula for calculating net profit within a certain date might require joins across several tables and plugging start and end dates into the correct portions of the query. It’s unlikely that Genie will always determine the specifics of this pivotal formula completely correctly. However, if we provide a trusted asset which boils down the amount of “thinking” required from Genie to simply providing start and end dates to a predefined formula, we greatly reduce the amount of possible error. With example queries and trusted assets, analysts have another tool to amplify their ability to answer business questions while hedging against the risks of hallucinations in AI systems.

josh_melton_3-1724170554662.png

The third emergent skill of the New Data Analyst is via actively managing the metadata of the tables and columns themselves. If we apply annotations to a column, Genie and other users alike will have more clues to devise the appropriate query. For example, a description of the temp column can specify that the column refers to the ambient temperature in degrees Fahrenheit. To make this easier, Databricks AI Generated documentation uses a model fine tuned specifically to generate descriptions of tables and columns which you can run with the click of a button. We can also provide guidance for how to join tables together by adding primary key / foreign key relationships. Determining how to join tables together is another example of something that likely comes naturally to an analyst with familiarity with the data but won’t be possible for Genie without thoughtful indicators. Finally, it may be useful to create new views altogether with unnecessary columns removed to reduce the “noise” that Genie needs to sift through to answer questions. This also gives the opportunity to use descriptions on the underlying tables that are most useful for data engineers or other analysts, but descriptions on the Genie views specifically geared towards pointing Genie in the right direction.

josh_melton_4-1724170554677.png

Once a Genie Space is deployed, users will inevitably think of new questions that Genie hasn’t been primed to answer, regardless of how well requirements were defined. Genie spaces have a much tighter feedback look than traditional dashboarding tools. Therefore, the final emergent skill for the New Data Analyst is monitoring and incorporating feedback from user questions into the model. We begin this feedback loop by checking the Monitoring tab of the Genie Space. Similar to the requirements gathering during the earlier stages of development, the monitoring tab will surface user feedback, common questions, and challenges found in answering them. We can use the monitoring in Genie to determine the most important enhancements to make in the next version of the Space.

josh_melton_5-1724170554675.png

Conclusion

Cutting edge generative AI tools like Genie serve to further deepen and enhance the relationship between important business questions and the data that answer them. The new data analysts who are able to create this bridge will be invaluable to their organizations. While our relationship with, for example, writing SQL is certainly changing, we will need to think critically to solve dynamic business problems for a long time to come. With the barrier to entry for learning programming languages approaching something close to zero, analysts (and likely other technical roles) will be able to shift their time to these more interesting parts of their jobs. Importantly, the value they can produce will grow massively due to the automation of many of the time consuming tasks that were traditionally required. For this reason, demand for analysts won’t be drying up any time soon. If you’re looking to gain experience or insight into how the field will evolve, try designing a Genie Space to answer your business’ most important questions.

Contributors