cancel
Showing results for 
Search instead for 
Did you mean: 
Announcements
Stay up-to-date with the latest announcements from Databricks. Learn about product updates, new features, and important news that impact your data analytics workflow.
cancel
Showing results for 
Search instead for 
Did you mean: 

Introducing Simple, Fast, and Scalable Batch LLM Inference on Mosaic AI Model Serving

Sujitha
Databricks Employee
Databricks Employee

Over the years, organizations have amassed a vast amount of unstructured text data—documents, reports, and emails—but extracting meaningful insights has remained a challenge. Large Language Models (LLMs) now offer a scalable way to analyze this data, with batch inference as the most efficient solution. However, many tools still focus on online inference, leaving a gap for better batch processing capabilities.

Today, we’re excited to announce a simpler, faster, and more scalable way to apply LLMs to large documents. No more exporting data as CSV files to unmanaged locations—now you can run batch inference directly within your workflows, with full governance through Unity Catalog. Simply write the query below and execute it in a notebook or workflow.

With this release, you can now run ai_query at scale with unmatched speed, ensuring fast processing of even the largest datasets. We've also expanded the interface to support all AI models, allowing you to securely apply LLMs, traditional AI models, or compound AI systems to analyze your data at scale.

SELECT ai_query('finetuned-llama-3.1-405b', "Can you evaluate this call transcript and write me a summary with action items of the main grievances: " || transcript_raw_text) AS summary_analysis FROM call_center_transcripts LIMIT 10;

Figure 1: A batch inference job of any scale - millions or billions of tokens - is defined using the same, familiar SQL interface

“With Databricks, we processed over 400 billion tokens by running a multi-modal batch pipeline for document metadata extraction and post-processing. Working directly where our dta resides with familiar tools, we ran the unified workflow without exporting data or managing massive GPU infrastructure, quickly bringing generative AI value directly to our data. We are excited to use batch inference for even more opportunities to add value for our customers at Scribd, Inc." - Steve Neola, Senior Director at Scribd

Continue to read more.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group