Introducing Simple, Fast, and Scalable Batch LLM Inference on Mosaic AI Model Serving

Sujitha — Thu, 24 Oct 2024 07:53:45 GMT

Over the years, organizations have amassed a vast amount of unstructured text data—documents, reports, and emails—but extracting meaningful insights has remained a challenge. Large Language Models (LLMs) now offer a scalable way to analyze this data, with batch inference as the most efficient solution. However, many tools still focus on online inference, leaving a gap for better batch processing capabilities.

Today, we’re excited to announce a simpler, faster, and more scalable way to apply LLMs to large documents. No more exporting data as CSV files to unmanaged locations—now you can run batch inference directly within your workflows, with full governance through Unity Catalog. Simply write the query below and execute it in a notebook or workflow.

With this release, you can now run ai_query at scale with unmatched speed, ensuring fast processing of even the largest datasets. We've also expanded the interface to support all AI models, allowing you to securely apply LLMs, traditional AI models, or compound AI systems to analyze your data at scale.

SELECT ai_query('finetuned-llama-3.1-405b', "Can you evaluate this call transcript and write me a summary with action items of the main grievances: " || transcript_raw_text) AS summary_analysis FROM call_center_transcripts LIMIT 10;

Figure 1: A batch inference job of any scale - millions or billions of tokens - is defined using the same, familiar SQL interface

“With Databricks, we processed over 400 billion tokens by running a multi-modal batch pipeline for document metadata extraction and post-processing. Working directly where our dta resides with familiar tools, we ran the unified workflow without exporting data or managing massive GPU infrastructure, quickly bringing generative AI value directly to our data. We are excited to use batch inference for even more opportunities to add value for our customers at Scribd, Inc." - Steve Neola, Senior Director at Scribd

Continue to read more.

topic Introducing Simple, Fast, and Scalable Batch LLM Inference on Mosaic AI Model Serving in Announcements

Introducing Simple, Fast, and Scalable Batch LLM Inference on Mosaic AI Model Serving