Databricks Community

lara_rachidi · ‎08-18-2024

No time to read? Check out this recap video of all the announcements listed below for July 2024 👇

→ Subscribe to our YouTube channel for regular updates

Table of Contents
· Llama 3.1: A New Standard in Open Source AI: Meta Llama 3.1 on Databricks
· Mosaic AI Model Training available in public preview to all AWS us-east-1 and us-west-2 customers
· Shutterstock ImageAI available in private preview on Foundation Model API pay-per-token
· AI Functions now supports ai_forecast()
· Mosaic AI Model Serving now supports serving multiple external models per model serving endpoint
· Function calling is now available in Public Preview
· Blog Post: How Long Should You Train Your Language Model? (by MosaicAI Research)
· Blog Post: Training MoEs at Scale with PyTorch
· Databricks Assistant is now generally available
· AI-generated comments is now generally available

Llama 3.1: A New Standard in Open Source AI: Meta Llama 3.1 on Databricks

Databricks partnered with Meta to release the Llama 3.1 series of models on Databricks, further advancing the standard of powerful open models.
The Meta Llama 3.1 family is a collection of pre-trained and instruction-tuned generative models in 8B, 70B, and 405B sizes. At the time of release, Meta Llama 3.1–405B-Instruct was the world’s highest-quality open model, and the quality of existing 8B and 70B models was improved.
All models support an expanded context length (128k) and are optimized for inference with support for grouped query attention (GQA).
Llama 3.1 family of models is now available in the system.ai catalog (within Unity Catalog) and can be easily accessed on Mosaic AI Model Serving using the same unified API and SDK that works with other Foundation Models.
It supports 8 languages and outperforms many of the available open source chat models on common industry benchmarks.
It has improved tool use and function calling, allowing the creation of complex multi-step agentic workflows that can automate sophisticated tasks and answer complex queries.
Llama 3.1 Instruct Model (Text) is fine-tuned for tool use, enabling it to generate tool calls for search, image generation, code execution, and mathematical reasoning, and also supports zero-shot tool use.
Upgraded LlamaGuard model and Safety Models, enabling secure and responsible deployment of Compound AI Systems for enterprise use cases.
View the pricing page
View this article for more details and benchmarks
Note that to continue supporting the most state-of-the-art models in our Foundation Model APIs (FM API) product, Databricks is going to retire the older Llama-2 and Llama 3 from the pay-per-token offering in favour of the newer Llama-3.1 model family. These models will continue to be available for usage in our Provisioned Throughput systems, but you will no longer be able to use these custom weights as inputs to another finetuning run. This change will be implemented on October 30, 2024. If you would like to finetune another model, instead of Llama-2 or Llama-3, we recommend trying out the closest equivalent Llama-3.1 model as shown below:
Llama 2 7B → Llama 3.1 8B
Llama 2 13B → Llama 3.1 8B
Llama 2 70B → Llama 3.1 70B
Llama 2 7B Chat → Llama 3.1 8B Instruct
Llama 2 13B Chat → Llama 3.1 8B Instruct
Llama 2 70B Chat → Llama 3.1 70B Instruct
Llama 3 8B → Llama 3.1 8B
Llama 3 70B → Llama 3.1 70B
Llama 3 8B Instruct → Llama 3.1 8B Instruct
Llama 3 70B Instruct → Llama 3.1 70B Instruct

Mosaic AI Model Training available in public preview to all AWS us-east-1 and us-west-2 customers

Mosaic AI Model Training (formerly Foundation Model Training) is now available in public preview to all customers in these regions. With Mosaic AI Model Training, you use your own data to customize a foundation model to optimize its performance for your specific application. By fine-tuning or continuing training of a foundation model, you can train your own model using significantly less data, time, and compute resources than training a model from scratch. You can fine-tune or pretrain a wide range of models — including Llama 3, Mistral, DBRX, and more — with your enterprise data. With this release, we now fully support AWS Private Link for workspaces in these regions.

Shutterstock ImageAI available in private preview on Foundation Model API pay-per-token

Shutterstock ImageAI is a text-to-image model that was announced at DAIS and is now available in Private Preview on Foundation Models API (pay per token), with Provisioned Throughput support coming soon.
ImageAI is a proprietary diffusion model trained on our own Databricks Mosaic AI platform using Shutterstock’s high quality images for text-to-image generation. The model was designed with quality and safety in mind, with over 550M+ images in the training set from a trusted data source, curated from more than millions contributors in 150+ countries reflecting different ethnicities, genders, and orientations, the model is capable of generating high quality, photo-realistic images that are commercially safe.
Unlike other providers who have mixed licensed data (perhaps Shutterstock’s data) with other public sources, ImageAI is trained on a single dataset that is fully licensed and commercially safe for enterprises to use. Since the training images are free from copyright, logos, famous characters, or celebrities, customers can be confident that their generated content won’t face any legal issues or copyright infringement risks.
The Shutterstock model is available everywhere that the Foundation Models API (pay per token) is available, namely on AWS and Azure in the US. The model outputs are priced at $0.06 per image.
Reach out to your account team to get access!

AI Functions now supports ai_forecast()

ai_forecast() is a new Databricks SQL function for analysts and data scientists designed to extrapolate time series data into the future.
Watch a demo on how to use it.👇

Mosaic AI Model Serving now supports serving multiple external models per model serving endpoint

Mosaic AI Model Serving now supports serving multiple external models per model serving endpoint.
You can also now directly input API keys as plaintext strings to model serving endpoints that host external models (see the documentation: Configure the provider for an endpoint).
Watch a demo on how to use it.👇

Function calling is now available in Public Preview

This functionality is available using Foundation Model APIs pay-per-token models: DBRX Instruct and Meta-Llama-3–70B-Instruct.

Blog Post: How Long Should You Train Your Language Model? (by MosaicAI Research)

Over the last few years, researchers have developed scaling laws, or empirical formulas for estimating the most efficient way to scale up the pretraining of language models. However, popular scaling laws only factor in training costs, and ignore the often incredibly expensive costs of deploying these models. In this blog post, the MosaicAI Research team suggests a modified scaling law to account for the cost of both training and inference. They experimentally demonstrate how “overtrained” smaller LLMs can be optimal.
First, they explain the widely used “Chinchilla” Scaling Law and its limits. With a fixed compute budget, Chinchilla law states that there is a tradeoff between increasing model size vs. increasing training duration. It focuses on training costs, which are determined by model size (parameter count) multiplied by data size (number of tokens). Larger models are more capable than smaller ones, but training on more data also improves model quality.
The Chinchilla Law doesn’t take into account the fact that since LLM serving costs are a function of the model size (in addition to user demand), larger models are much more expensive to deploy. Model size is an important cost factor for both training and inference time. The authors therefore suggest a brand new scaling law that quantifies a training-inference trade-off, producing models that are optimal over their total lifetime, instead of just looking at training costs. To validate this, they ran experiments where they spent extra money on training to produce a smaller but equivalently powerful model, validating the hypothesis that they could make up for the extra training costs at inference time.
The key takeaway is that the more inference demand you expect from your users, the smaller and longer you should train your models, as you should continue to see quality improvements. The MosaicAI research team suggest modifying scaling laws to account for the computational and real-world costs of both training and inference. As inference demand grows, the additional cost pushes the optimal training setup toward smaller and longer-trained models. A trend has already started toward powerful, smaller models in the 1B — 70B parameter range that are easier and cheaper to finetune and deploy.
Link to the blog post.
Watch the video that discusses this blog post.👇

Blog Post: Training MoEs at Scale with PyTorch

Over the past year, Mixture of Experts (MoE) models have surged in popularity, fueled by powerful open-source models like DBRX, Mixtral, DeepSeek, and many more. Databricks worked closely with the PyTorch team to scale the training of MoE models. This blog post discusses how they scaled to over three thousand GPUs using PyTorch Distributed and MegaBlocks, an efficient open-source MoE implementation in PyTorch.
Compared to dense models, MoEs provide more efficient training for a given compute budget. This is because the gating network only sends tokens to a subset of experts, reducing the computational load. As a result, the capacity of a model (its total number of parameters) can be increased without proportionally increasing the computational requirements. During inference, only some of the experts are used, so a MoE is able to perform faster inference than a dense model. However, the entire model needs to be loaded in memory, not just the experts being used.
MegaBlocks is an efficient MoE implementation that uses sparse matrix multiplication to compute expert outputs in parallel despite uneven token assignment. MegaBlocks implements a dropless MoE that avoids dropping tokens while using GPU kernels that maintain efficient training. Prior to MegaBlocks, dynamic routing formulations forced a tradeoff between model quality and hardware efficiency. Databricks brought Megablocks to its open source training stack, LLMFoundry, to enable scaling MoE training to thousands of GPUs.
This blog post shows how the MosaicAI research team implemented efficient MoE training through Pytorch Distributed and MegaBlocks on Foundry. The authors also explain how Pytorch elastic checkpointing allowed them to quickly resume training on a different number of GPUs when node failures occurred, and how using Pytorch HSDP has allowed them to scale training efficiently as well as improve checkpointing resumption times.
Link to the blog post.
Watch the video that discusses this blog post.👇

Databricks Assistant is now generally available

The Databricks Assistant is now GA. Adding additional context to the Assistant from stack traces, lineage, popular and favorite tables, dataframe schemas, nearby cells, and relevant documentation has greatly improved Assistant accuracy.
We’ve moved the Assistant out of the notebook and SQL Editor and added it to every page in Databricks. We also added inline chat for code and query refinement, autocomplete (preview) that offers real-time code suggestions as you type.
The Assistant can help you quickly create and iterate on visualizations in the Databricks AI/BI Dashboards editor.

AI-generated comments is now generally available

This feature leverages generative AI to provide relevant table descriptions and column comments. Since launching the feature, more than 80% of the table metadata updates on Databricks are AI-assisted. Building descriptive metadata takes time but it is one of the highest-value tasks you can undertake to improve Assistant accuracy.

AI Agent Framework Availability

Mosaic AI Agent Framework is now available in the eu-central-1. See Features with limited regional availability.

Subscribe to the NextGenLakehouse newsletter to receive monthly updates!