Databricks Community

qianyu · ‎12-16-2024

Introduction

In the rapidly evolving financial service industry, the ability to efficiently handle customer interactions is paramount. The call center, often the first point of contact for customers, plays a crucial role in shaping customer experiences and satisfaction levels. However, traditional call center operations face challenges in terms of speed, efficiency, and accuracy, which can impact customer satisfaction and business outcomes. Large volumes of audio call recordings need to be stored, processed, and governed. Insights and next best action recommendations need to be intelligently extracted from this data to drive customer happiness and retention. Many personas have a stake in deriving value from this data, including Banking Service Agents, Investment Advisors, Insurance claims Officers, Adjusters, and Customer Servicer Operators.

With Databricks Data Intelligence Platform, we can unlock the value of this data at scale. The new AI Query makes the development of an end-to-end workflow as simple as a few lines of SQL script, reducing complexity and saving time.

Call Center Batch LLM Inference Workflow

A traditional call center analytics workflow involves many steps and technologies:

Audio data ingestion
Speech2text transcription
NLP Analytics

Text tokenization
Construct training and validation dataset
Train and validate NLP models
Model deployment
Endpoint performance test and tuning
Model inferences

Create structured output for downstream consumptions.

With Large Language Models (LLMs), the NLP analytics steps of the workflow can be simplified with prompt engineering. But there is still a lot of work required to host LLMs, make API calls to LLMs, develop pipelines, and orchestrate workflows. It can be particularly challenging to achieve good performance at scale. This is where the Databricks AI Query (ai_query SQL API) comes to the rescue. It not only simplifies the implementation of the workflow with SQL language but also optimizes the batch LLM inference performance at scale with built-in fault tolerance.

What is AI Query?

ai_query function provides a simple way to apply AI directly on data with Databricks. ai_query supports Databricks foundation model endpoint, external model endpoint, and custom model endpoint with Databricks Model Serving.

Here is simple example shows the syntax of ai_query:

SELECT 
  ai_query(
      "llama-70b-endpoint", 
      "Summarize this call transcript: " || transcript
  ) AS summary_analysis 
FROM call_center_transcripts;

Using ai_query, you can now run batch LLM inference at high scale with unmatched speed thanks to Databricks model serving which minimizes batch LLM inference processing time and costs by auto-scaling resources, adjusting batching configurations, and improving workload management. In addition, built-in fault tolerance with automatic retries ensures large workflows run smoothly, handling transient errors without disruption.

Here is an example of how one can perform call center batch LLM inference workflow on Databricks (Figure 1) with just 4 steps using ai_query

Screenshot 2024-12-10 at 8.16.12 PM.png

Figure 1. Call Center Batch Inference Workflow

Use Databricks autoloader function to ingest the raw audio files (e.g. .mp3, .wav) as bytes to a delta table from a unity catalog volume location
Set up dedicated Databricks foundation model endpoints for a OpenAI Whisper v3 large speech to text (STT) model and a open source Llama 3.1 70B LLM from system.ai using Databricks UI
Perform audio transcription with ai_query function using the STT model deployed with the Databricks model serving
Perform NLP analytics with ai_query function using an LLM deployed with the databricks model serving provision throughput endpoint

Example Implementation

Step 1: Ingest raw audio files as bytes using autoloader function from a unity catalog volume location

Typically call center datasets have caller ids associated with the audio file, in this example, the caller id is embedded in the name of folders that contain the audio files.

CREATE OR REFRESH STREAMING TABLE raw_audio_files
AS 
SELECT *, regexp_extract(path, r'.*\/caller_id_(\d+)\/.*', 1) AS caller_id 
FROM STREAM read_files(
  '/Volumes/genai/call_center/volume_speech/audio_clips/',
  format => 'binaryFile',
  inferColumnTypes => 'true',
  recursiveFileLookup => 'true',
  pathGlobFilter => '*.wav'
);

Step 2: Set up dedicated Databricks model serving endpoint using Databricks UI

Foundation models such as OpenAI whisper large v3 speech2text model and meta Llama 3.1 series LLMs are located in unity catalog path system.ai. One can navigate to a model asset and deploy with a few clicks of a button. Please follow the screen recording below to deploy a foundation model.

Navigate to system.ai for foundation models

167418 - find_ai_models_LowQ_D2_V1.gif

Find and deploy OpenAI whisper large V3 speech to text model

167418 - whisper_LowQ_D2_V1.gif

The recommended endpoint compute for the OpenAI whisper large v3 model is:

Model Name	Suggested workload type (AWS)	Suggested workload type (Azure)
whisper_large_v3	GPU Medium (AWS)	GPU Large (Azure)

Find and deploy Llama 3.1 70B LLM

167418 - llama3_LowQ_D2_V1.gif

Step 3: Perform audio transcription

ai_query function is applied to the content column. The content column was created in step 1 when the files were loaded and it contains the audio byte data.

CREATE TABLE IF NOT EXISTS raw_audio_transcription (
  caller_id STRING,
  modificationTime TIMESTAMP,
  length INT,
  transcript STRING
)
USING DELTA;

INSERT INTO raw_audio_transcription (caller_id, modificationTime, length, transcript)
SELECT 
  caller_id,
  modificationTime,
  length,
  ai_query(
    "whisper_v3", 
    content, 
    failOnError => True
  ) as transcript
FROM raw_audio_files;

Step 4: Perform NLP analytics

The example here shows a version of prompts that works well with Llama 3.1 LLMs and an example dataset for the tasks of summarization, sentiment analysis, and topic analysis. We encourage readers to experiment and optimize prompts based your own data and desired LLMs

CREATE TABLE IF NOT EXISTS transcription_nlp_analysis (
  caller_id STRING,
  modificationTime TIMESTAMP,
  length INT,
  transcript STRING,
  summary STRING,
  sentiment STRING,
  topic STRING
)
USING DELTA;

INSERT INTO transcription_nlp_analysis (caller_id, modificationTime, length, transcript, summary, sentiment, topic)
SELECT 
  caller_id,
  modificationTime,
  length,
  transcript,
  ai_query(
    'meta-llama3-1-70b',
    CONCAT(
      "Summarize the conversation in max of 150 words",
      transcript
    ),
    failOnError => True,
    modelParameters => named_struct('max_tokens', 300, 'temperature', float(0))
  ) as summary,
  ai_query(
    'meta-llama3-1-70b',
    CONCAT(
      "analyze the sentiment of the conversion", 
      transcript,
      ", return 'positive', 'negative', or 'neutral'. Return only the overall sentiment, do not explain"
      ),
      failOnError => True,
      modelParameters => named_struct('max_tokens', 50, 'temperature', float(0))
  ) as sentiment,
  ai_query(
    'meta-llama3-1-70b',
      CONCAT(
        "Return the predominant topic in the below conversation. please include only one main topic from the provided list ", 
        transcript,
        "\n\nlist of topics wtih description delimited with ':' \n",
        "* car accident: the customer involved in a car accident \n",
        "* policy change: the customer would like change, update, or add on their policy or information\n",
        "* home accident: the customer has a damage in his or her home\n",
        "* motorcrycle: the customer has a motorcycle related question\n",
        "* theft: the customer had things stolen from their cars and homes\n",
        "Return only the topic, Do not explain"
      ),
      failOnError => True,
      modelParameters => named_struct('max_tokens', 50, 'temperature', float(0))
  ) as topic
FROM raw_audio_transcription;

Here are a few examples of the analysis results:

Now the call center batch LLM inference workflow is completed and the results are ready to be consumed by downstream applications for business insights.

Additional Recommendations

Use Databricks SQL Warehouse Compute to achieve the best performance with ai_query SQL API
Setup Databricks Provision Throughput Foundation Model Endpoint to achieve the best inference speed at scale. See Databricks Document
The OpenAI Whisper v3 large STT model has a file size limit of 25MB. For long audio files, the recommendation is to perform chunking or re-encoding on the raw audio files before the ingestion step. Alternatively, one can also perform audio dilation and speech2text model distillation. Please refer to this excellent set of notebooks from my colleague Sri Tikkireddy
In addition to OpenAI Whisper V3 large model, users can deploy other open source speech to text (SST) models on Databricks and use them with ai_query, for example, nvidia/canary-1b, facebook/w2v-bert-2.0, etc.

Conclusion

In this article, we showed how using Mosaic AI Batch Inference and ai_query can simplify Customer Call Center NLP Analytics for the financial services industry.

We plan to add more exciting features to this capability in the near future:

ai_query LLM batch inference with No Endpoint Provisioning
AI Functions at Scale
Cost-Optimized Batch Jobs

Stay tuned!

Databricks Community

Streamline Customer Call Center Transcripts Analytics with Mosaic AI Batch Inference

Introduction

Call Center Batch LLM Inference Workflow

What is AI Query?

Example Implementation

Step 1: Ingest raw audio files as bytes using autoloader function from a unity catalog volume location

Step 2: Set up dedicated Databricks model serving endpoint using Databricks UI

Step 3: Perform audio transcription

Step 4: Perform NLP analytics

Additional Recommendations

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Best practices for safe data experimentation with Databricks

Top 10 query performance tuning tips for Databricks Serverless SQL