Oil and gas companies are no strangers to data silos. Despite decades of drilling and technological advances, extremely important well information, carefully recorded in the field, often winds up scattered across proprietary tools, legacy databases, or aging file servers. A great example of this is well log data.
Well logging is the process of recording geological and fluid properties from inside a borehole using specialized equipment, like a logging truck and cable, as shown in the image below.
Despite its value, well log data often remains siloed in proprietary systems and outdated formats, restricting collaboration and analytics. Traditional tools for accessing formats like Log ASCII Standard (LAS) are cumbersome and expensive, keeping vital subsurface insights out of reach.
Modern solutions, such as Databricks AI Functions, provide a path to unlock and scale access to well log data, enabling teams to leverage advanced analytics, cloud integrations, and rapid decision-making across their operations.
LAS files serve as the industry standard for storing and transferring well log data within the oil and gas sector. Developed by the Canadian Well Logging Society in the 1990s, LAS files contain crucial geological, geophysical, and petrophysical measurements that provide detailed insights into subsurface formations.
LAS files follow a structured ASCII format consisting of several distinct sections:
Well log data contains a wealth of geological intelligence that can drive critical business decisions:
These curves are usually shown on a log plot, often by themselves, as demonstrated below; however, combining this data with other operational information and applying advanced analytics and modeling can offer significant value to oil and gas operators by not only understanding rock properties but also incorporating additional operational details such as drilling parameters, production performance, and more.
Databricks AI Functions represent a paradigm shift in how organizations can apply artificial intelligence directly within their data processing workflows. These functions enable users to leverage the power of large language models (LLMs) and other AI capabilities through simple SQL queries, eliminating the need for complex model deployments or specialized AI infrastructure.
The Databricks platform offers several specialized AI functions designed for different use cases:
The ai_query function stands out as the most flexible option, allowing users to interact with foundation models using custom prompts. This function accepts several key parameters that make it particularly well-suited for processing semi-structured data like LAS files:
Key Parameters:
The first step is storing these files within Unity Catalog Volumes, which can be completely governed and secured to ensure only the appropriate personnel can access these files:
Once files have been uploaded and governed via Unity Catalog, the underlying text can be accessed through the read_files SQL call, which can directly read LAS files in their original format and maintain the semi-structured schema defined above.
This approach treats each LAS file as a complete text object, preserving the structured format that AI models need to understand the geological context and data relationships.
The key to a high-performing extraction model is to be extremely intentional about the ai_query parameters above.
The power of ai_query lies in combining detailed instructions alongside the raw LAS file content to give it the best chance at successfully extracting the data in the format we need:
This is where the real magic happens! By crafting specific instructions—such as "Identify ALL curve mnemonics from ~CURVE section" or "Sections start with ~ (tilde) character"—users can fine-tune the behavior of the foundation model each time it’s called. For more detailed information on crafting an effective prompt, visit this blog that thoroughly discusses the elements of a good prompt. Whether the goal is to extract high-level summaries or detailed, curve-by-curve analysis, adjusting the prompt text tailors the interpretive lens for each run. This flexibility enables teams to iterate rapidly, test hypotheses, and deploy workflows that adapt to evolving requirements.
In addition to the instructions provided in the request, the responseFormat parameters enable the enforcement of structured output. For this example, we aim to return a JSON object that adheres to a specific schema.
This ensures consistent, parseable results that can be easily integrated into downstream analytics workflows and eliminates the need for complex post-processing of AI responses. There are other parameters that can be defined within this section, such as failOnError, which allows users to control how the response is handled if it does not meet the desired output.
With the ai_query function call completely filled out, we can now package this into a reusable SQL function that can be easily leveraged within Lakeflow Declarative Pipelines, notebooks, or SQL editors. These functions are stored within the Unity Catalog, allowing them to be governed and secured, with access restricted to only certain company personnel. The complete code is presented below. This function accepts a text input, in this case our LAS file coming in from a read_files() call, and returns an appropriately formatted JSON object.
CREATE OR REPLACE FUNCTION {catalog}.{schema}.extract_las_info(input STRING)
RETURNS STRUCT<
well: STRING,
null_value: DOUBLE,
api_number: STRING,
curve_data: ARRAY<STRUCT<
depth: DOUBLE,
gamma_ray: DOUBLE,
delta_t: DOUBLE,
resistivity: DOUBLE,
sp: DOUBLE
>>
>
RETURN from_json(ai_query(
endpoint => 'databricks-gpt-oss-120b',
request =>
CONCAT(
'''
You are an expert drilling engineer and well log analyst with 15+ years of experience parsing LAS (Log ASCII Standard) files.
CRITICAL INSTRUCTIONS:
1. This is a LAS file containing well log data from oil/gas drilling operations
2. Parse ALL sections methodically: ~VERSION, ~WELL, ~CURVE, ~PARAMETER (if present), ~ASCII data
3. PRESERVE original depth values - do NOT interpolate, average, or modify any measurements
4. Extract curve data for ALL available log types (not just the schema examples)
5. Handle null values properly (commonly -999.25, -9999, or as specified in NULL field)
6. Maintain data precision as recorded in the file
7. If any section is missing or corrupted, note this in the response
WELL INFORMATION EXTRACTION:
- Parse ~WELL section for: WELL name, COMP (company), FLD (field), LOC (location),
CNTY (county), STAT (state), CTRY (country), STRT (start depth), STOP (stop depth),
STEP (step interval), NULL (null value), UWI/API (well identifier)
CURVE DATA EXTRACTION:
- Identify ALL curve mnemonics from ~CURVE section (e.g., DEPT, GR, NPHI, RHOB, RT, SP, etc.)
- Extract complete depth series with ALL available log curves
- Common curve types include: Gamma Ray (GR), Neutron Porosity (NPHI), Bulk Density (RHOB),
Resistivity (various: RT, ILD, MSFL), Spontaneous Potential (SP), Photoelectric Factor (PEF),
Caliper (CALI), Delta-T/Sonic (DT), and many others
- DO NOT assume only specific curves exist - extract whatever is available
DATA QUALITY CHECKS:
- Verify depth progression is logical (increasing or decreasing consistently)
- Flag any depth gaps or overlaps
- Note data density and any sparse sections
- Identify outlier values that may indicate data quality issues
PARSING RULES:
- Sections start with ~ (tilde) character
- In header sections, data format is: MNEM.UNIT DATA :DESCRIPTION
- ASCII data section has space-separated columns matching curve order
- Handle wrapped lines if WRAP.YES is specified
- Respect the files specified null value for missing data
- Return a complete JSON object with this exact structure, including ALL curves found in the file:
''',
input
),
returnType => 'JSON',
responseFormat =>
'{
"type": "json_schema",
"json_schema": {
"name": "las_extraction",
"schema": {
"type": "object",
"properties": {
"well": {"type": "string"},
"null_value": {"type": "number"},
"api_number": {"type": "string"},
"curve_data": {
"type": "array",
"items": {
"type": "object",
"properties": {
"depth": {"type": "number"},
"gamma_ray": {"type": "number"},
"delta_t": {"type": "number"},
"resistivity": {"type": "number"},
"sp": {"type": "number"}
}
}
}
}
},
"strict": true
}
}',
failOnError => false
).result,
'STRUCT<well:STRING,
null_value:DOUBLE,
api_number:STRING,
curve_data:ARRAY<STRUCT<depth:DOUBLE, gamma_ray:DOUBLE, delta_t:DOUBLE, resistivity:DOUBLE, sp:DOUBLE>>>'
);
With your custom AI function now outputting actionable JSON, turning insights from each LAS file into analytics-ready data is simple. By applying the function to files stored in Unity Catalog Volumes and using the explode operation, “Curve_data” arrays are easily expanded into tabular columns, ready for storage in your Lakehouse and for use in downstream analyses.
These custom SQL functions aren’t just convenience tools, they underpin robust, production-grade analytics. Run ad hoc queries, trigger them automatically with Lakeflow pipelines when each new file lands, or embed them in batch jobs. Centralizing AI-powered geological interpretation in this way gives teams both operational strength and agility, eliminating the need to rewrite business logic for every new workflow.
Once LAS files are processed through the AI pipeline, the extracted insights become the foundation for advanced analytics that drive significant business value across exploration, development, and production operations. Some of those analytics projects include the following:
This revolutionary approach transforms how oil and gas companies extract value from their subsurface data assets. By breaking free from proprietary software silos and leveraging the power of Databricks AI Functions, organizations can democratize access to geological insights, accelerate decision-making, and unlock new opportunities for operational excellence.
The future of well log analysis lies not in expensive, specialized software packages, but in open, AI-powered platforms that put the power of advanced analytics directly into the hands of geoscientists, engineers, and data teams. With Databricks ai_query processing LAS files at scale, that future is already here.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.