Databricks Community

brett-aulbaugh · ‎10-21-2025

Petroleum engineers spend hours wrangling and analyzing data to estimate well production targets. We demonstrate how the Data Science Agent can significantly accelerate this type of analysis and many other problems that petroleum engineers continually strive to solve. Building on the momentum from my previous article on leveraging Databricks Assistant for petroleum workflows, the industry is now witnessing a step-change in analytics tools. With the introduction of the Databricks Data Science Agent, engineers can move beyond simple AI support to benefit from an autonomous analytical partner that streamlines data work and accelerates the path from question to actionable results.

The transition from copilot to autonomous agent addresses a critical gap in petroleum engineering analytics. While the Assistant proved valuable for generating SQL queries and Python code snippets, complex multi-step analyses still required extensive manual orchestration. The global AI market in oil and gas is projected to grow from $3.01 billion in 2025 to $6.92 billion by 2032. The Databricks Data Science Agent positions petroleum engineering teams at the forefront of this transformation, providing immediate access to autonomous capabilities that enable comprehensive analytical workflows from initial data exploration through final executive reporting, all with a single prompt.

The Databricks Data Science Agent: Technical Overview

The Data Science Agent represents a fundamental advancement in autonomous analytics, extending beyond code generation to comprehensive workflow orchestration. This capability proves particularly valuable for engineering applications that require integration of multiple analytical techniques, complex data relationships, and domain-specific modeling approaches.

Autonomous multi-step execution distinguishes the Agent from traditional AI assistants. Rather than responding to individual queries, the Agent can plan and execute complete analytical workflows. For petroleum applications, this means seamless integration of data validation, statistical modeling, forecasting, and visualization within a single prompt.

Unity Catalog Integration provides critical context awareness for petroleum datasets. The Agent leverages Unity Catalog's metadata to understand table relationships, data lineage, and business semantics. This integration enables the Agent to navigate complex petroleum data architectures autonomously, understanding that production tables relate to completion data, geological formations, and operational parameters.

Governance and Review Capabilities address enterprise requirements for data security and analytical transparency. The Agent operates within existing Unity Catalog permissions, requests human approval for data modifications, and maintains comprehensive audit trails to ensure transparency and accountability. These safeguards ensure that autonomous analytics align with industry standards for data governance and operational integrity in the petroleum sector.

For petroleum engineering teams, these capabilities enable sophisticated analyses previously requiring specialized data science expertise. The Agent democratizes advanced analytics while maintaining professional standards for accuracy, transparency, and governance.

Enabling Agent Mode

Currently, Agent Mode is in Beta and requires workspace administrator intervention through the Databricks preview portal for activation. Administrators must enable the "Assistant Agent Mode Beta" feature within the preview management interface to give the team access to autonomous analytics capabilities. This feature will soon be available without this step. In addition to needing administrators to enable this feature, some components of the interface could still be changing while this product is in beta and might differ slightly from what is outlined in the blog below..

Following administrator enablement, users will observe a toggle control in the Assistant interface located in the notebook's bottom-right panel. This toggle allows switching between traditional Assistant mode and the new Agent mode, providing immediate access to autonomous analytical capabilities.

Configuring Agent Instructions

The Data Science Agent and Databricks Assistant support customizable instructions that allow petroleum engineering teams to establish domain-specific preferences, coding standards, and business context that the Agent will consistently apply across all interactions. These instructions prove particularly valuable for standardizing analytical approaches across petroleum engineering workflows.

User-level instructions provide individual customization capabilities, enabling petroleum engineers to specify their preferred analytical libraries, coding conventions, and personal preferences. To configure user instructions:

Open the Assistant panel and click the settings icon
Select "Add instructions file" to create a .assistant_instructions.md file
Define custom instructions using a natural language format

Example petroleum engineering user instructions:

Workspace-Level Instructions enable standardization across entire petroleum engineering teams, ensuring consistent analytical approaches regardless of individual user preferences. To enable workspace instruction, in the Workspace/ directory of the workspace, create a new file named .assistant_workspace_instructions.md. Workspace administrators can establish organization-wide standards for:

Preferred Python Libraries: Standardize on specific packages for reservoir engineering, geophysics, or production analysis
Business Terminology: Define company-specific terms, field names, and operational definitions
Coding Standards: Establish formatting conventions, documentation requirements, and quality standards
Analytical Methodologies: Specify preferred approaches for common petroleum engineering analyses

Priority Hierarchy ensures that workspace instructions generally take precedence over user instructions, maintaining organizational consistency while allowing individual customization for non-conflicting preferences.

Case Study: Production Forecasting Analysis

This analysis demonstrates the Agent's capability to address complex petroleum engineering challenges through autonomous analytics. A production engineer responsible for a particular set of wells is required to do a year-end production assessment to determine target achievement probability.

Business Requirements encompassed 47 active wells with varying completion designs, production histories, and operational timelines. The analysis needed to compare actual performance against original type curve forecasts and provide actionable insights for year-end planning decisions.

This analysis utilizes two core data tables that upstream operators commonly create as foundational data products: the Daily Production table, which captures actual production volumes at the well and date level, and the Type Well Forecast table, which provides projected production values based on Days from First Production for specific forecast scenarios. Type Well Forecasts are typically assigned to individual wells to enhance production forecasting accuracy. Within our current infrastructure, both tables are maintained in Delta format under Unity Catalog management, where comprehensive metadata and descriptions are systematically documented to support continuous workflow optimization and governance.

The comprehensive prompt provided to our Agent is as follows:

I am a production engineer responsible for all currently producing wells in FORMATION_C, I need a report to assess whether our total oil production for FORMATION_C will end 2025 above or below the originally forecasted values.

Actual daily production data is in vdm_serverless_frlx2w.ds_agent.daily_production.
Type curve forecasts are in vdm_serverless_frlx2w.ds_agent.type_curve_daily_df.

For each well:
Fit an ARPS decline curve to the well's actual daily production history to determine per-well ARPS parameters.
Use these fitted parameters to forecast each well's daily production through the end of 2025.
Compare the summed, re-forecasted production for all FORMATION_C wells to the original type curve forecast, anchored at each well's initial production date.

The final report should include:
The total production delta (difference) between the original type curve forecast and the re-forecasted values for FORMATION_C for 2025.
The set of ARPS parameters (e.g., initial rate, decline rate, b-factor) you found would have best fit the actual performance for FORMATION_C type curves.
Please structure the results so they are actionable for year-end production planning and forecasts.

Upon submitting the prompt, it immediately begins outlining the proposed plan and asks clarifying questions.

These clarifying questions continue to shape the agent’s plan. In this case, the agent asked the following:

How is FORMATION_C identified in your data? Is it a value in a specific column (e.g., type_curve or another identifier)?
Should we include all wells with any production in 2025, or only those with first production before a certain cutoff?
Do you want confidence intervals on the ARPS forecasts per well, or only on the group total?
Should the report include a downloadable table of per-well ARPS parameters?

Once the requirements were clarified, the agent began its work by writing code, executing it, and correcting mistakes. Ultimately, it generated an entire notebook of code that addressed my problem statement. This notebook includes methods for data sampling and extraction, curve fitting techniques, statistical interpretations, and advanced visualizations, all generated by the Databricks Data Science Agent.

ezgif.com-video-to-gif-converter.gif

To conclude the analysis, the agent created a summary markdown cell at the top of my notebook with the key takeaways to accompany a fully reproducible notebook.

Traditionally, conducting this level of analysis would have required several days of effort from a multidisciplinary team, including reservoir engineers, data scientists, and analysts. With the Agent, these tasks were completed autonomously within minutes. Not only did the Agent automate outlier detection, statistical confidence intervals, and professional visualizations, but it also uncovered a 265k barrel production variance. This critical insight now enables proactive intervention strategies that have the potential to prevent millions in lost revenue.

Revolutionizing Petroleum Engineering Through Intelligent Automation

Technical Accessibility empowers petroleum engineers to tackle complex analyses—like advanced statistical modeling, machine learning, and probabilistic analysis—through intuitive conversational interfaces, eliminating the need for specialized programming skills. By making these capabilities broadly available, engineers can continuously adapt to new technologies and data-driven workflows, effectively future-proofing their job roles and responsibilities in an evolving industry.

Governance and Compliance ensure enterprise-level security and transparency. All Agent actions adhere to Unity Catalog permissions, need approval for sensitive tasks, and keep the human in the loop throughout the entire process. This ensures autonomous analytics comply with petroleum industry governance standards.

Analytical Scalability supports expanding workflows from single-well analyses to field-wide evaluations without requiring method changes. The Agent maintains analytical rigor across different scopes, enabling consistent approaches regardless of dataset complexity or size.

Knowledge Transfer happens naturally through Agent interactions. As the Agent explains its analytical decisions, methodological choices, and technical details, petroleum engineers build better data science skills through hands-on experience rather than formal training. The overall impact extends beyond efficiency gains to encompass advanced analytical capabilities, enhanced problem-solving skills, and increased confidence in tackling complex data challenges in the field of petroleum engineering.

Databricks Community

Databricks Assistant Data Science Agent: Autonomous Analytics for Petroleum Engineers

The Databricks Data Science Agent: Technical Overview

Enabling Agent Mode

Configuring Agent Instructions

Case Study: Production Forecasting Analysis

Revolutionizing Petroleum Engineering Through Intelligent Automation

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks