<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databricks LLM Evolution and Future Prospects in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/databricks-llm-evolution-and-future-prospects/m-p/126250#M498</link>
    <description>&lt;H1&gt;&lt;STRONG&gt;Databricks LLM Evolution and Future Prospects&lt;BR /&gt;&lt;/STRONG&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;Databricks has progressed from a big-data compute engine to a full-stack AI powerhouse that designs, trains, and serves state‐of‐the-art large language models (LLMs). This article explores two key technical innovations—&lt;STRONG&gt;DBRX&lt;/STRONG&gt;, a fine-grained mixture-of-experts (MoE) model, and &lt;STRONG&gt;Test‑Time Adaptive Optimization (TAO)&lt;/STRONG&gt;—highlighting their architectures, algorithms, performance, and future potential. Code examples and implementation details are provided to enable hands-on experimentation.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;H1&gt;&lt;STRONG&gt;Executive Overview&lt;/STRONG&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Since 2023, Databricks has integrated the MosaicML acquisition, released the fine-grained mixture-of-experts (MoE) model DBRX, and built a unified Data Intelligence Platform that fuses data governance, model training, serving, and evaluation. The platform’s architectural focus on compound AI systems—multiple models orchestrated with rigorous governance—positions Databricks to dominate enterprise generative-AI adoption through 2026 and beyond.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;H1&gt;&lt;STRONG&gt;&lt;BR /&gt;Databricks: Key Milestones&lt;BR /&gt;&lt;/STRONG&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;Databricks, evolving from its 2013 Spark roots, now drives &lt;STRONG&gt;enterprise GenAI&lt;/STRONG&gt; through the integration of MosaicML, DBRX, and adaptive optimization techniques.&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;&lt;STRONG&gt;Year&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;STRONG&gt;Milestone&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;STRONG&gt;Technical Significance&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2013&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;Founding and Apache Spark launch&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;Distributed, in-memory compute core&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2017&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;Azure Databricks GA&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;First managed Spark-as-a-service&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2023&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;Acquisition of MosaicML for $1.3 billion&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;Adds efficient LLM training stack&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2024&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;DBRX open-sourced&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;132 B-parameter MoE surpasses Llama 2 70B&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2025&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;Data Intelligence Platform update&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;Seamless RAG, vector search, agentic frameworks&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;H1&gt;&lt;STRONG&gt;&lt;BR /&gt;DBRX: &lt;/STRONG&gt;Fine‑Grained Mixture‑of‑Experts LLM&lt;/H1&gt;&lt;UL class="lia-list-style-type-square"&gt;&lt;LI&gt;&lt;H2&gt;&lt;STRONG&gt;&lt;SPAN&gt;Architecture&lt;BR /&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H2&gt;DBRX is a decoder-only transformer with a total of 132 B parameters, but only 36 B are active per token, achieved via a fine-grained MoE approach.&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;16 experts, with 4 selected per&lt;/STRONG&gt;&lt;SPAN&gt; token → 65x more routing combinations than previous MoEs.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Rotary position encodings (RoPE), Gated Linear Units (GLU), Grouped Query Attention (GQA) for efficient long-context modeling (up to 32 K tokens).&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;H2&gt;&lt;STRONG&gt;&lt;SPAN&gt;Training Stack&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H2&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Compute&lt;/STRONG&gt;&lt;SPAN&gt;: 3,072 × NVIDIA H100 at 3.2 TB/s InfiniBand; 2.5 months; US$10M&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;SPAN&gt;Data: 12 T tokens curated with Unity Catalog lineage.&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;H2&gt;&lt;STRONG&gt;&lt;SPAN&gt;Efficiency Gains&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H2&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Up to 2x faster inference&lt;/STRONG&gt;&lt;SPAN&gt; on H100 GPUs compared to LLaMA‑2‑70B.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Uses MegaBlocks, LLM Foundry, Composer, Spark, MLflow—fully integrated within Databricks workflows.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Code Example&lt;BR /&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;Load DBRX (Base / Instruct)&lt;/SPAN&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "databricks/dbrx-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = "Explain the mixture-of-experts architecture."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;&lt;H2&gt;&lt;SPAN&gt;Serving Modes&lt;BR /&gt;&lt;/SPAN&gt;&lt;/H2&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="22.16645754914262%" height="30px"&gt;&lt;STRONG&gt;Mode&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="26.181514010874114%" height="30px"&gt;&lt;STRONG&gt;Use Case&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="51.65202843998327%" height="30px"&gt;&lt;STRONG&gt;SLA/Capacity&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="22.16645754914262%" height="30px"&gt;&lt;SPAN&gt;Pay-per-token&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="26.181514010874114%" height="30px"&gt;&lt;SPAN&gt;Prototyping, ad-hoc queries&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="51.65202843998327%" height="30px"&gt;&lt;SPAN&gt;Best for light workloads; multi-tenant latency trade-off&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="22.16645754914262%" height="30px"&gt;&lt;SPAN&gt;Provisioned throughput&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="26.181514010874114%" height="30px"&gt;&lt;SPAN&gt;High-traffic production&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="51.65202843998327%" height="30px"&gt;&lt;SPAN&gt;Dedicated GPUs, HIPAA compliance&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="22.16645754914262%" height="30px"&gt;&lt;SPAN&gt;Batch inference&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="26.181514010874114%" height="30px"&gt;&lt;SPAN&gt;Large offline jobs&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="51.65202843998327%" height="30px"&gt;&lt;SPAN&gt;AI Functions orchestrate Spark+ML pipelines&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;H2&gt;&amp;nbsp;&lt;/H2&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H1&gt;&lt;STRONG&gt;TAO: &lt;/STRONG&gt;&lt;SPAN&gt;Test‑Time Adaptive Optimization&lt;/SPAN&gt;&lt;/H1&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;TAO enables LLM adaptation with &lt;/SPAN&gt;&lt;STRONG&gt;unlabeled usage data&lt;/STRONG&gt;&lt;SPAN&gt;, only using test-time compute. The pipeline:&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;OL&gt;&lt;LI&gt;&lt;STRONG&gt;Generate&lt;/STRONG&gt;&lt;SPAN&gt; N candidate responses.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Score&lt;/STRONG&gt;&lt;SPAN&gt; using a &lt;/SPAN&gt;&lt;STRONG&gt;Databricks Reward Model (DBRM)&lt;/STRONG&gt;&lt;SPAN&gt; trained on synthetic or preference data.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Reinforcement‑Learning&lt;/STRONG&gt;&lt;SPAN&gt; (on best‑of‑N) to update weights.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Resulting model incurs &lt;STRONG&gt;no extra cost at inference time.&lt;/STRONG&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Benchmarks&lt;BR /&gt;&lt;/STRONG&gt;On &lt;STRONG&gt;FinanceBench&lt;/STRONG&gt;, TAO-tuned Llama 3.1B improved from 68.4% to &lt;STRONG&gt;82.8%&lt;/STRONG&gt;, outperforming proprietary GPT‑4-class models.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Reward Model (DBRM)&lt;BR /&gt;&lt;/STRONG&gt;&lt;SPAN&gt;DBRM mimics human preference using predicted rankings, enabling synthetic training generation.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Code Skeleton: TAO Loop&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;# Pseudocode
for prompt_batch in prompt_stream:
    candidate_resps = [model.generate(prompt_batch) for _ in range(N)]
    scores = db_reward_model.score(candidate_resps)
    top_resp = candidate_resps[argmax(scores)]
    loss = rl_loss(model(prompt_batch), top_resp)
    loss.backward(); optimizer.step()&lt;/LI-CODE&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Vector Search and RAG&lt;/SPAN&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Databricks Vector Search auto-syncs Delta tables and embeddings; index freshness is governed by streaming CDC pipelines and audited via Unity Catalog. Compared with standalone vector DBs, this cuts maintenance overhead and enforces row-level security by design.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Agentic Framework&lt;/SPAN&gt;&lt;/H1&gt;&lt;P&gt;&lt;SPAN&gt;The Mosaic AI Agent Framework coordinates compound systems:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Planning – LLM decomposes the task.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Tool Use – External APIs queried via secure credentials.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Evaluation – AI judges plus SME feedback score accuracy, hallucination, helpfulness, and safety.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Continuous Learning – Results stored, labeled, and recycled into fine-tuning sets.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Competitive Landscape&lt;/SPAN&gt;&lt;/H1&gt;&lt;TABLE border="1" width="99.76105137395459%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;Params (Total/Active)&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;MoE Experts&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;MMLU&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;Notable Strength&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;DBRX&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;132 B / 36 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;16 (4 active)&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;73.7%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Fast 150 tok/s user rate&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Mixtral 8x7B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;47 B / 13 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;8 (2 active)&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;70%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;6x faster than Llama 2 70B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Grok-1&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;314 B / 78 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;8 (2 active)&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;73%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Largest open MoE&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Llama 2 70B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;70 B / 70 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Dense&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;67–70%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Broad adoption&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Code Llama 70B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;70 B / 70 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Dense&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;HumanEval 65.2%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Code generation&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;SPAN&gt;DBRX edges Grok-1 on throughput with a fraction of the cost footprint while maintaining equal reasoning scores.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Governance and Security&lt;/SPAN&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Unity Catalog’s hierarchical model (account → catalog → schema → asset) governs both data and derived embeddings, delivering row-level masking, lineage, and audit logs. Model access passes through Mosaic AI Gateway, which tracks usage, latency, and token spend per endpoint.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Future Trajectory&lt;/SPAN&gt;&lt;/H1&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Multimodal MoE&lt;/STRONG&gt;&lt;SPAN&gt; – Audio-vision experts integrated into DBRX-2 expected by 2026; likely 4-expert activation for each modality to keep costs flat.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Incremental Learning&lt;/STRONG&gt;&lt;SPAN&gt; – Streaming fine-tunes leveraging Delta Live Tables to update weights nightly without full retraining.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Edge Serving&lt;/STRONG&gt;&lt;SPAN&gt; – Quantized 8-bit MoE splits per-expert shards across heterogeneous GPU clusters, targeting 30 tok/s on T4 cards for compliance regions.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Federated Governance&lt;/STRONG&gt;&lt;SPAN&gt; – Cross-cloud lineage via OpenLLM schema federation; builds on Unity Catalog metadata outbox events.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Conclusion&lt;/SPAN&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Databricks has shifted the center of gravity for enterprise AI from monolithic black-box APIs to an open, modular, and governable lakehouse ecosystem. DBRX proves that sparse MoE architectures can match or surpass dense giants at a fraction of serving cost, while the Mosaic AI stack addresses the lifecycle gaps—evaluation, governance, and orchestration—that stall enterprise roll-outs today. With continued investment in multimodal expertise, automated RAG, and federated governance, Databricks is poised to remain a primary conduit between corporate data estates and next-generation AI applications through the rest of the decade.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 23 Jul 2025 19:11:09 GMT</pubDate>
    <dc:creator>ayushbadhera1</dc:creator>
    <dc:date>2025-07-23T19:11:09Z</dc:date>
    <item>
      <title>Databricks LLM Evolution and Future Prospects</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-llm-evolution-and-future-prospects/m-p/126250#M498</link>
      <description>&lt;H1&gt;&lt;STRONG&gt;Databricks LLM Evolution and Future Prospects&lt;BR /&gt;&lt;/STRONG&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;Databricks has progressed from a big-data compute engine to a full-stack AI powerhouse that designs, trains, and serves state‐of‐the-art large language models (LLMs). This article explores two key technical innovations—&lt;STRONG&gt;DBRX&lt;/STRONG&gt;, a fine-grained mixture-of-experts (MoE) model, and &lt;STRONG&gt;Test‑Time Adaptive Optimization (TAO)&lt;/STRONG&gt;—highlighting their architectures, algorithms, performance, and future potential. Code examples and implementation details are provided to enable hands-on experimentation.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;H1&gt;&lt;STRONG&gt;Executive Overview&lt;/STRONG&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Since 2023, Databricks has integrated the MosaicML acquisition, released the fine-grained mixture-of-experts (MoE) model DBRX, and built a unified Data Intelligence Platform that fuses data governance, model training, serving, and evaluation. The platform’s architectural focus on compound AI systems—multiple models orchestrated with rigorous governance—positions Databricks to dominate enterprise generative-AI adoption through 2026 and beyond.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;H1&gt;&lt;STRONG&gt;&lt;BR /&gt;Databricks: Key Milestones&lt;BR /&gt;&lt;/STRONG&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;Databricks, evolving from its 2013 Spark roots, now drives &lt;STRONG&gt;enterprise GenAI&lt;/STRONG&gt; through the integration of MosaicML, DBRX, and adaptive optimization techniques.&lt;/P&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;&lt;STRONG&gt;Year&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;STRONG&gt;Milestone&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;STRONG&gt;Technical Significance&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2013&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;Founding and Apache Spark launch&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;Distributed, in-memory compute core&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2017&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;Azure Databricks GA&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;First managed Spark-as-a-service&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2023&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;Acquisition of MosaicML for $1.3 billion&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;Adds efficient LLM training stack&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2024&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;DBRX open-sourced&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;132 B-parameter MoE surpasses Llama 2 70B&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="8.602150537634412%"&gt;2025&lt;/TD&gt;&lt;TD width="36.43966547192353%"&gt;&lt;SPAN&gt;Data Intelligence Platform update&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="54.95818399044206%"&gt;&lt;SPAN&gt;Seamless RAG, vector search, agentic frameworks&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;H1&gt;&lt;STRONG&gt;&lt;BR /&gt;DBRX: &lt;/STRONG&gt;Fine‑Grained Mixture‑of‑Experts LLM&lt;/H1&gt;&lt;UL class="lia-list-style-type-square"&gt;&lt;LI&gt;&lt;H2&gt;&lt;STRONG&gt;&lt;SPAN&gt;Architecture&lt;BR /&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H2&gt;DBRX is a decoder-only transformer with a total of 132 B parameters, but only 36 B are active per token, achieved via a fine-grained MoE approach.&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;16 experts, with 4 selected per&lt;/STRONG&gt;&lt;SPAN&gt; token → 65x more routing combinations than previous MoEs.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Rotary position encodings (RoPE), Gated Linear Units (GLU), Grouped Query Attention (GQA) for efficient long-context modeling (up to 32 K tokens).&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;H2&gt;&lt;STRONG&gt;&lt;SPAN&gt;Training Stack&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H2&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Compute&lt;/STRONG&gt;&lt;SPAN&gt;: 3,072 × NVIDIA H100 at 3.2 TB/s InfiniBand; 2.5 months; US$10M&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;SPAN&gt;Data: 12 T tokens curated with Unity Catalog lineage.&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;H2&gt;&lt;STRONG&gt;&lt;SPAN&gt;Efficiency Gains&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/H2&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Up to 2x faster inference&lt;/STRONG&gt;&lt;SPAN&gt; on H100 GPUs compared to LLaMA‑2‑70B.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Uses MegaBlocks, LLM Foundry, Composer, Spark, MLflow—fully integrated within Databricks workflows.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Code Example&lt;BR /&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;Load DBRX (Base / Instruct)&lt;/SPAN&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "databricks/dbrx-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = "Explain the mixture-of-experts architecture."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;&lt;H2&gt;&lt;SPAN&gt;Serving Modes&lt;BR /&gt;&lt;/SPAN&gt;&lt;/H2&gt;&lt;TABLE border="1" width="100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="22.16645754914262%" height="30px"&gt;&lt;STRONG&gt;Mode&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="26.181514010874114%" height="30px"&gt;&lt;STRONG&gt;Use Case&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="51.65202843998327%" height="30px"&gt;&lt;STRONG&gt;SLA/Capacity&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="22.16645754914262%" height="30px"&gt;&lt;SPAN&gt;Pay-per-token&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="26.181514010874114%" height="30px"&gt;&lt;SPAN&gt;Prototyping, ad-hoc queries&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="51.65202843998327%" height="30px"&gt;&lt;SPAN&gt;Best for light workloads; multi-tenant latency trade-off&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="22.16645754914262%" height="30px"&gt;&lt;SPAN&gt;Provisioned throughput&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="26.181514010874114%" height="30px"&gt;&lt;SPAN&gt;High-traffic production&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="51.65202843998327%" height="30px"&gt;&lt;SPAN&gt;Dedicated GPUs, HIPAA compliance&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="22.16645754914262%" height="30px"&gt;&lt;SPAN&gt;Batch inference&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="26.181514010874114%" height="30px"&gt;&lt;SPAN&gt;Large offline jobs&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD width="51.65202843998327%" height="30px"&gt;&lt;SPAN&gt;AI Functions orchestrate Spark+ML pipelines&lt;/SPAN&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;H2&gt;&amp;nbsp;&lt;/H2&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H1&gt;&lt;STRONG&gt;TAO: &lt;/STRONG&gt;&lt;SPAN&gt;Test‑Time Adaptive Optimization&lt;/SPAN&gt;&lt;/H1&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;TAO enables LLM adaptation with &lt;/SPAN&gt;&lt;STRONG&gt;unlabeled usage data&lt;/STRONG&gt;&lt;SPAN&gt;, only using test-time compute. The pipeline:&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;OL&gt;&lt;LI&gt;&lt;STRONG&gt;Generate&lt;/STRONG&gt;&lt;SPAN&gt; N candidate responses.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Score&lt;/STRONG&gt;&lt;SPAN&gt; using a &lt;/SPAN&gt;&lt;STRONG&gt;Databricks Reward Model (DBRM)&lt;/STRONG&gt;&lt;SPAN&gt; trained on synthetic or preference data.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Reinforcement‑Learning&lt;/STRONG&gt;&lt;SPAN&gt; (on best‑of‑N) to update weights.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Resulting model incurs &lt;STRONG&gt;no extra cost at inference time.&lt;/STRONG&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Benchmarks&lt;BR /&gt;&lt;/STRONG&gt;On &lt;STRONG&gt;FinanceBench&lt;/STRONG&gt;, TAO-tuned Llama 3.1B improved from 68.4% to &lt;STRONG&gt;82.8%&lt;/STRONG&gt;, outperforming proprietary GPT‑4-class models.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Reward Model (DBRM)&lt;BR /&gt;&lt;/STRONG&gt;&lt;SPAN&gt;DBRM mimics human preference using predicted rankings, enabling synthetic training generation.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Code Skeleton: TAO Loop&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;# Pseudocode
for prompt_batch in prompt_stream:
    candidate_resps = [model.generate(prompt_batch) for _ in range(N)]
    scores = db_reward_model.score(candidate_resps)
    top_resp = candidate_resps[argmax(scores)]
    loss = rl_loss(model(prompt_batch), top_resp)
    loss.backward(); optimizer.step()&lt;/LI-CODE&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Vector Search and RAG&lt;/SPAN&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Databricks Vector Search auto-syncs Delta tables and embeddings; index freshness is governed by streaming CDC pipelines and audited via Unity Catalog. Compared with standalone vector DBs, this cuts maintenance overhead and enforces row-level security by design.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Agentic Framework&lt;/SPAN&gt;&lt;/H1&gt;&lt;P&gt;&lt;SPAN&gt;The Mosaic AI Agent Framework coordinates compound systems:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Planning – LLM decomposes the task.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Tool Use – External APIs queried via secure credentials.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Evaluation – AI judges plus SME feedback score accuracy, hallucination, helpfulness, and safety.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Continuous Learning – Results stored, labeled, and recycled into fine-tuning sets.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Competitive Landscape&lt;/SPAN&gt;&lt;/H1&gt;&lt;TABLE border="1" width="99.76105137395459%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;Model&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;Params (Total/Active)&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;MoE Experts&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;MMLU&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="66px"&gt;&lt;P&gt;&lt;STRONG&gt;Notable Strength&lt;/STRONG&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;DBRX&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;132 B / 36 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;16 (4 active)&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;73.7%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Fast 150 tok/s user rate&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Mixtral 8x7B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;47 B / 13 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;8 (2 active)&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;70%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;6x faster than Llama 2 70B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Grok-1&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;314 B / 78 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;8 (2 active)&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;73%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Largest open MoE&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Llama 2 70B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;70 B / 70 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Dense&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;67–70%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Broad adoption&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="16.71867734065918%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Code Llama 70B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="15.521072550240017%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;70 B / 70 B&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="13.844425843653193%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Dense&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="18.634845005329844%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;HumanEval 65.2%&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD width="35.04203063407235%" height="50px"&gt;&lt;P&gt;&lt;SPAN&gt;Code generation&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;SPAN&gt;DBRX edges Grok-1 on throughput with a fraction of the cost footprint while maintaining equal reasoning scores.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Governance and Security&lt;/SPAN&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Unity Catalog’s hierarchical model (account → catalog → schema → asset) governs both data and derived embeddings, delivering row-level masking, lineage, and audit logs. Model access passes through Mosaic AI Gateway, which tracks usage, latency, and token spend per endpoint.&lt;/SPAN&gt;&lt;/P&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Future Trajectory&lt;/SPAN&gt;&lt;/H1&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Multimodal MoE&lt;/STRONG&gt;&lt;SPAN&gt; – Audio-vision experts integrated into DBRX-2 expected by 2026; likely 4-expert activation for each modality to keep costs flat.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Incremental Learning&lt;/STRONG&gt;&lt;SPAN&gt; – Streaming fine-tunes leveraging Delta Live Tables to update weights nightly without full retraining.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Edge Serving&lt;/STRONG&gt;&lt;SPAN&gt; – Quantized 8-bit MoE splits per-expert shards across heterogeneous GPU clusters, targeting 30 tok/s on T4 cards for compliance regions.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Federated Governance&lt;/STRONG&gt;&lt;SPAN&gt; – Cross-cloud lineage via OpenLLM schema federation; builds on Unity Catalog metadata outbox events.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H1&gt;&lt;SPAN&gt;&lt;BR /&gt;Conclusion&lt;/SPAN&gt;&lt;/H1&gt;&lt;P class="lia-align-justify"&gt;&lt;SPAN&gt;Databricks has shifted the center of gravity for enterprise AI from monolithic black-box APIs to an open, modular, and governable lakehouse ecosystem. DBRX proves that sparse MoE architectures can match or surpass dense giants at a fraction of serving cost, while the Mosaic AI stack addresses the lifecycle gaps—evaluation, governance, and orchestration—that stall enterprise roll-outs today. With continued investment in multimodal expertise, automated RAG, and federated governance, Databricks is poised to remain a primary conduit between corporate data estates and next-generation AI applications through the rest of the decade.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jul 2025 19:11:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-llm-evolution-and-future-prospects/m-p/126250#M498</guid>
      <dc:creator>ayushbadhera1</dc:creator>
      <dc:date>2025-07-23T19:11:09Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks LLM Evolution and Future Prospects</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-llm-evolution-and-future-prospects/m-p/126431#M504</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/175312"&gt;@ayushbadhera1&lt;/a&gt;&amp;nbsp;- Did you miss to mention Databricks Dolly by any chance? &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jul 2025 09:50:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-llm-evolution-and-future-prospects/m-p/126431#M504</guid>
      <dc:creator>RiyazAliM</dc:creator>
      <dc:date>2025-07-25T09:50:19Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks LLM Evolution and Future Prospects</title>
      <link>https://community.databricks.com/t5/community-articles/databricks-llm-evolution-and-future-prospects/m-p/126438#M505</link>
      <description>&lt;P class=""&gt;Thanks, &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/15469"&gt;@RiyazAliM&lt;/a&gt;,&amp;nbsp;for checking out the blog post!&lt;BR /&gt;More insights on Databricks LLM and Dolly are on the way in the next one. &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;BR /&gt;Stay tuned and keep learning!&lt;BR /&gt;&lt;BR /&gt;Best,&lt;BR /&gt;Ayush&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jul 2025 10:04:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/databricks-llm-evolution-and-future-prospects/m-p/126438#M505</guid>
      <dc:creator>ayushbadhera1</dc:creator>
      <dc:date>2025-07-25T10:04:40Z</dc:date>
    </item>
  </channel>
</rss>

