cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
brett-aulbaugh
Databricks Employee
Databricks Employee

Energy Industry Chatbots in Minutes with Databricks Marketplace, S&P Global, and Genie Rooms

In this blog, we will demonstrate how to combine Databricks Marketplace data from S&P Global with Databricks Genie Rooms to create powerful, industry-specific chatbots in a matter of minutes that transform how your team interacts with industry data. 

Introduction to Databricks Marketplace and Delta Sharing

Databricks Marketplace represents a paradigm shift in how organizations access and utilize data. It's an open marketplace for data, analytics and AI assets powered by Delta Sharing. In the past, most data sharing approaches required complicated ETL processes or expensive data replication. Databricks Marketplace allows organizations to obtain datasets, ML models, notebooks, applications, and dashboards without proprietary platform dependencies. What this means for you is that you can share your data without having to move it or create complicated ETL around it.

What makes this truly revolutionary is the underlying Delta Sharing protocol, an open standard for secure data sharing that enables organizations to share data with other entities regardless of which computing platforms they use. This technology allows for sharing terabyte-scale datasets reliably and efficiently by leveraging cloud storage systems while maintaining governance, tracking, and auditing capabilities.

brettaulbaugh_0-1748444382779.png

S&P Global: A Data Powerhouse

S&P Global is one of the premier data providers in the Databricks Marketplace ecosystem. It provides numerous datasets that are particularly important for Energy companies and serve use cases related to commodity trading, competitor analysis, and regulatory reporting workflows. Historically, energy companies would ingest S&P Global data through complex ETL processes, requiring dedicated engineering resources to transform and load the data into their analytical environments. This created data silos, latency issues, and governance challenges that limited the value organizations could extract from this rich information source.

Today, through the Databricks Marketplace, S&P Global provides an open solution to securely share their Global Commodity Insights data without replication or complicated ETL. Energy companies can now access this data directly in their lakehouse environment with a few clicks.

brettaulbaugh_1-1748444382791.png

Introduction to Databricks Genie Rooms

Now that we understand how to acquire high-quality energy-related data, let's explore how to make it accessible to business users through Databricks Genie Rooms.  Genie is a Databricks AI-powered feature that allows business teams to interact with data using natural language. Think of it as having your own personal data analyst that is always ready to get you the data you need–be it in a table or in a chart.

brettaulbaugh_2-1748444382686.png

At its core, Genie uses a compound AI system that uses metadata from Unity catalog alongside user provided context to translate business questions into insight. When a business user asks a question like "What were our top 5 crude oil suppliers last quarter by volume?", Genie does the following:

  1. First, it parses the natural language request from the user.
  2. Next, using the metadata and structure of the data, it converts the request to SQL. This is known as “Text-to-SQL”. 
  3. Executes read-only queries on your SQL warehouse.
  4. Returns results and visualizations based on the query.

See the chart below for a visual of the above steps.

brettaulbaugh_3-1748444382699.png

The magic happens through Unity Catalog integration, where Genie leverages table metadata, column descriptions, and defined relationships to understand your data model. This makes it particularly powerful for domain-specific applications where specialized terminology and data relationships can be challenging for generic AI systems to understand.

How to Create an Energy Industry Chatbot Using Databricks Marketplace and Genie Rooms in Minutes

Now let's walk through the process of creating an energy industry-specific chatbot.

Step 1: Browse Datasets and Request Access to Data

Your first step is to explore the Databricks Marketplace for S&P Global datasets relevant to your use case. In the marketplace, browse or search for S&P Global's energy-related datasets

For this particular use case, we will be leveraging the Platts eWindow Market DataThis dataset “brings an immediacy to the price discovery process that can't be experienced anywhere else. Its near real-time structured layout gives an enhanced, at-a-glance view of all bids, offers and transaction data shared during the Platts Market on Close (MOC) process.”  Use cases include asset valuation, Risk and Supply Chain Analytics, as well as in-depth financial analyses.

Once you've identified the datasets you need, requesting access is straightforward, especially if your organization already has an S&P Global subscription. Click on the dataset listing to view details and request access.

brettaulbaugh_4-1748444382616.png

For existing S&P Global customers, this is often as simple as linking your S&P Global account to Databricks. Your account representative can facilitate this process, typically requiring just a quick approval. If you're not currently an S&P Global customer, you can request a trial or contact S&P Global through the marketplace listing.

Step 2: Create a Catalog Based on the S&P Global Dataset

After gaining access to the datasets of interest, users can easily add this data into their Databricks workspace by following these steps:

  1. Navigate to the catalog view and select the “Delta Sharing” option at the top.
  2. Search for and select the provider, in this case “spgci” (S&P Global Commodity Insights)
  3. Choose the “Create Catalog” option next to the dataset of your choosing and give an appropriate name.
  4. Click “Create”

brettaulbaugh_5-1748444383134.gif

Step 3: Set Up a Genie Room with Your Energy Industry Data

Now comes the fun part–creating a Genie Room that enables your business users to interact with this data using plain language.

To create a Genie space:

  1. Navigate to the Genie section in your Databricks workspace.
  2. Click "Create Space".
  3. Name your space (e.g., "Energy Trading Bot").
  4. Add tables and views from your previously created catalog.
  5. Curate the space with domain-specific instructions and example queries.

Here are a few tips for proper curation of Genie Room datasets:

  • Annotate Your Tables Well: Include clear descriptions for tables and columns that explain energy industry terminology. Additionally, creating  sample values significantly increases Genie accuracy and adds context to the columns.

brettaulbaugh_6-1748444382685.png

 

  • Provide Example Queries: Add sample SQL queries that demonstrate common energy analytics questions. This helps Genie understand typical patterns in your questions and how tables/fields should be leveraged.

brettaulbaugh_7-1748444382694.png

 

  • Add General Instructions: Provide some general context about your energy data model, terminology, and business context. Adding business terminology and acronyms into this section will help Genie translate these into the appropriate SQL calls. The general instructions section is a great place to define synonymous terms that may be used by the users when performing their queries. For example in this room we might add the following instructions to get us started.
You are an advanced chatbot powered by the S&P Global Commodity eWindow trading dataset. Your primary function is to assist users in accessing, analyzing, and interpreting comprehensive energy trading data.

The field ORDER_TIME should be used if questions are asked around time of trade execution.

When calculating the total number of trades, count the distinct order_id entries in the dataset.

Please use this information to accurately answer user queries, provide relevant market insights, and support data-driven decision-making in the context of energy trading. Ensure all responses are clear, concise, and based on the most current and reliable data available.

Following these best practices can take you a long way to improving the accuracy of your Genie room.

Step 4: Start Chatting on Energy Industry Topics

With your Genie Room set up, users across your organization can now “chat” with the data using natural language. No SQL knowledge is required, so you can share the Genie Room with analysts, traders, executives, or other stakeholders who might benefit from these energy market insights.

Users can ask questions like:

  • "What is the most frequently traded product in the US over the last 5 days?"
  • "What product has seen the most price volatility this month?"
  • "What is the most recent price for product X"

Genie will translate these questions into SQL queries, execute the queries, and return results and visualizations, all without users needing to write a single line of code.

brettaulbaugh_8-1748444383122.gif

In the example above, Genie was asked "What product has seen the most price volatility in the last week".  Genie was able to correctly answer the question by finding the standard deviation of all trades over the last week and return the product with the most price volatility.  We can verify the SQL statement to give quick feedback to the Genie system to reinforce the expected behavior.

Final Thoughts and Next Steps

By combining the energy industry data available from S&P Global in the Databricks Marketplace with the conversational AI capabilities of Databricks Genie Rooms, you've created a powerful tool that makes it possible for business users to access energy market insights across your organization, all while eliminating the past barriers to data analysis:

  • No more waiting for data engineering resources to prepare datasets.
  • No need for business users to know SQL.
  • Instant access to trusted, high-quality energy market data.
  • Ability to combine external market data with internal business data.

The result is an organization that can make better and more informed decisions in a fraction of the time it used to take, be it optimizing trading strategies, planning refinery operations, or evaluating renewable energy investments.

Want to take Genie and these datasets even further? Achieving high accuracy with natural language interfaces often involves some level of iterative refinement and user feedback. Using the Databricks tools makes it easy to do this in various ways, such as implementing query monitoring to further refine the model. Also, check out how you can use the Genie API to leverage the engine outside of the Databricks UI in tools such as Microsoft Teams

So what are you waiting for? If your company already subscribes to S&P Global data services, you're just a few steps away from fundamentally transforming how your organization performs market research. Start building your energy analytics chatbot today!