Databricks Community

Debu-Sinha · ‎05-01-2025

Databricks ships some killer toys for large-language-model work:

ai_query for in-warehouse inference
Vector Search for lightning-fast retrieval
Serving Endpoints for real-time chat

Put them together, though, and you’ll trip over a few booby traps I learned about the hard way.

	The surprise	Why it hurts
1	A single `NULL` in `CONCAT` nukes the whole prompt	The LLM never even sees your question
2	`similarity_search()` only accepts one string	Batch jobs grind along row-by-row
3	Calling an endpoint in a loop feels like dial-up	Hundreds of prompts = coffee-break latency

Here’s how I dodge each land-mine — code included, copy-paste away.

1 · Vaccinate Your Prompts Against `NULL`

SQL’s motto is: “If anything is NULL, everybody’s NULL.”
So instead of begging the LLM to ignore missing data, I scrub the prompt string first:

SELECT
  id,
  ai_query(
    'your-endpoint-name',
    CONCAT_WS(' ',
      'Answer from context:',
      COALESCE(context, 'No context.'),
      'Question:',
      COALESCE(question, 'No question.')
    ),
    modelParameters => named_struct('temperature', 0.3, 'max_tokens', 100)
  ) AS response
FROM questions_table;

COALESCE supplies a sensible default; CONCAT_WS quietly skips any leftover NULLs.

Result: every row ships a valid prompt.

2 · Faux-Batch Vector Search

The Vector Search SDK is single-query only. I trick it into “batch mode” with a thread pool:

# ‌‌ Parallel similarity_search() ‌‌
from databricks.vector_search import VectorSearchClient, VectorSearchIndex
from concurrent.futures import ThreadPoolExecutor
import logging, time

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("vector-search")

def get_index(endpoint, name):
    vs_client = VectorSearchClient()
    idx_url = next(
        (idx.url for idx in vs_client.list_indexes(endpoint_name=endpoint)
         if idx.name == name), None)
    if not idx_url:
        raise ValueError(f"Index {name} not found")
    return VectorSearchIndex(
        workspace_url="https://your-workspace.cloud.databricks.com",
        index_url=idx_url,
        name=name,
        endpoint_name=endpoint)

def search(index, query, cols, tries=3):
    for n in range(tries):
        try:
            return index.similarity_search(query_text=query, columns=cols, num_results=5)
        except Exception as e:
            if n == tries - 1:
                return {"error": str(e)}
            logger.warning(f"Retry {n+1}: {e}")
            time.sleep(2 ** n)

def batch_search(queries, endpoint="my-endpoint", idx="my-index", workers=20):
    index = get_index(endpoint, idx)
    with ThreadPoolExecutor(max_workers=workers) as pool:
        futs = [pool.submit(search, index, q, ["id", "text", "metadata"]) for q in queries]
    return [f.result() for f in futs]

Twenty threads on the driver give me a 10–20× speed-up versus a plain for-loop, with back-off retries to smooth over momentary blips.

3 · Fire-Hose Calls to an LLM Endpoint

Exact same threading trick, but wrapped around WorkspaceClient so I can send system + user prompts together:

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole
from concurrent.futures import ThreadPoolExecutor
import logging, time

log = logging.getLogger("llm")
log.setLevel(logging.INFO)

class FastLLM:
    def __init__(self, endpoint, workers=10):
        self.endpoint = endpoint
        self.workers = workers
        self.wsc = WorkspaceClient()

    def _ask(self, sys_msg, user_msg, tries=3):
        for n in range(tries):
            try:
                resp = self.wsc.serving_endpoints.query(
                    name=self.endpoint,
                    messages=[
                        ChatMessage(role=ChatMessageRole.SYSTEM, content=sys_msg),
                        ChatMessage(role=ChatMessageRole.USER, content=user_msg)
                    ],
                    max_tokens=200,
                    temperature=0.2
                )
                return {"content": resp.choices[0].message.content, "error": None}
            except Exception as e:
                if n == tries - 1:
                    return {"content": None, "error": str(e)}
                time.sleep(2 ** n)

    def ask_many(self, prompts, sys_msg="You are a helpful assistant"):
        with ThreadPoolExecutor(max_workers=self.workers) as pool:
            futs = [pool.submit(self._ask, sys_msg, p) for p in prompts]
        return [f.result() for f in futs]

# Demo
if __name__ == "__main__":
    engine = FastLLM("my-endpoint", workers=10)
    answers = engine.ask_many([
        "What’s the capital of France?",
        "Explain machine learning in one sentence.",
        "Write a haiku about mountains."
    ])
    for a in answers:
        print(a["content"] or a["error"])

Ten threads is my comfort zone: quick yet gentle enough to dodge rate-limits. Scale up or chunk the inputs once you see how your endpoint behaves.

TL;DR

Sanitize prompts in SQL, not in the model.
Threads beat async in a Databricks notebook for I/O-heavy jobs.
Reuse connections and sprinkle in exponential back-off; half the “random” failures vanish.

Steal these snippets, remix them, and let me know what other hurdles you run into. Always happy to swap tips — just tag me on LinkedIn.

Happy building!

Databricks Community

Three Sneaky Databricks LLM Hurdles — and the Cheats I Use to Clear Them

1 · Vaccinate Your Prompts Against `NULL`

2 · Faux-Batch Vector Search

3 · Fire-Hose Calls to an LLM Endpoint

TL;DR

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks

Databricks Community

Three Sneaky Databricks LLM Hurdles — and the Cheats I Use to Clear Them

1 · Vaccinate Your Prompts Against NULL

2 · Faux-Batch Vector Search

3 · Fire-Hose Calls to an LLM Endpoint

TL;DR

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks

1 · Vaccinate Your Prompts Against `NULL`