Streamline AI Agent Evaluation using synthetic evaluation sets (Public Preview)
- Demo: https://www.youtube.com/watch?v=8Mb91QtLzJ8
- You can evaluate your AI agent by generating a representative evaluation set from your documents. The synthetic generation API is tightly integrated with Agent Evaluation, allowing you to quickly evaluate and improve the quality of your agent’s responses without going through the costly process of human labeling. See Synthesize evaluation sets.
- One effective method for synthesizing evaluation datasets involves using the
generate_evals_df
method from thedatabricks-agents
Python package. This method requires a DataFrame with two columns:content
(the parsed document content as a string) anddoc_uri
(the document's URI). Developers can control the generation process using key parameters:num_evals
(the total number of evaluations to generate),agent_description
(the task description of the agent), andquestion_guidelines
(guidelines for generating synthetic questions). Thenum_evals
parameter intelligently distributes evaluations across documents, balancing the number of questions per page and considering document size. The output includes detailed columns such asrequest_id
,request
,expected_facts
, andexpected_retrieved_context
, ensuring traceability with the correspondingdoc_uri
. For developers unsure about how many evaluations are needed, theestimate_synthetic_num_evals
method helps estimate the ideal number of evaluations for desired coverage. - By using synthetic evaluation datasets, AI teams can automate and optimize the testing process, saving valuable time and achieving thorough validation of their agents’ capabilities.
Python code executor for AI agents (Public Preview)
- You can now quickly give your AI agents the ability to run Python code. Databricks now offers a pre-built Unity Catalog function that can be used by an AI agent as a tool to expand their capabilities beyond language generation. See Code interpreter AI agent tools.
Add budget policies to model serving endpoints (Public Preview)
- Demo: https://www.youtube.com/watch?v=6y9rpReGquM
- Effectively managing cloud computing costs is essential, especially when working with serverless compute resources. Databricks addresses this need with budget policies, which help organizations track and control serverless usage. These policies work by applying tags to any serverless compute activity associated with the policy. These tags are logged in billing records, making it easier to attribute serverless spending to specific budgets and perform granular billing analysis. A notable update is that budget policies are now supported on model serving endpoints, enabling organizations to monitor and control serverless costs tied to machine learning models. See Manage model serving endpoints.
- To get started, workspace admins can create budgets in the Account Console and apply budget policies directly through the Databricks UI. Admins can also manage and view policies they have created or have permissions for. However, to manage all policies across an account, admins must also have the Billing admin role at the account level. Non-admin users can manage budget policies if they are assigned the Budget Policy Manager permissions.
- Applying a budget policy is straightforward. When creating a model serving endpoint, users can select a budget policy from the Budget Policy menu in the Serving UI. If a budget policy is already assigned to a user, it will automatically apply to any new endpoints they create. Existing endpoints, however, need to be manually updated to include a budget policy. Additionally, users with MANAGE permissions can modify budget policies for existing endpoints via the Endpoint Details page.
- It’s important to note that this feature is currently in Public Preview and does not support endpoints serving External Models or Foundation Model APIs with pay-per-token workloads. Moreover, existing endpoints won’t automatically inherit newly assigned budget policies — they must be manually updated.
- By implementing budget policies, organizations can achieve greater visibility and control over their serverless spending, ensuring more effective resource management and cost optimization.
Mosaic AI Model Training serverless forecasting (Public Preview)
- Demo: https://www.youtube.com/watch?v=WXF88P7tHCA
- Mosaic AI Model Training — forecasting improves upon the existing AutoML forecasting experience with serverless compute, Unity Catalog support, access to deep learning algorithms, and an upgraded interface.
- The goal is to simplify time-series forecasting by automatically selecting the optimal algorithms and hyperparameters while running on fully managed, scalable compute resources.
- Getting Started with Serverless Forecasting is straightforward. All you need is a training dataset with a time-series column stored as a Unity Catalog table. If your workspace uses a Secure Egress Gateway (SEG), make sure to add
pypi.org
to the allowed domains list to avoid connectivity issues. This setup allows seamless forecasting without worrying about infrastructure management. - To begin, navigate to the Experiments tab in Databricks and use the sample dataset
dbdemos.dbdemos_iot_turbine.turbine_training_dataset_ml
. Set the forecast frequency (e.g., hourly) and follow these steps: - Select Training Data: Choose your dataset from accessible Unity Catalog tables.
- Configure Columns:
- Time Column: Identify the column with timestamps or dates.
- Forecast Frequency: Define the data’s time unit (minutes, hours, days, months).
- Forecast Horizon: Specify how far into the future to predict.
- Prediction Target Column: Choose the feature to forecast.
- Optional Settings:
- Prediction Data Path: Store output forecasts in a Unity Catalog table.
- Model Registration Location: Define where to save the trained model.
- Advanced Options: Customize experiment names, identifier columns for multi-series forecasting, evaluation metrics, training frameworks, data splits, weighting, holiday regions, and timeout settings.
Once configured, click Start Training to run the AutoML experiment. You can monitor progress, stop the experiment if needed, and explore results in real time. After training, the forecast results are saved in a Delta table, and the best-performing model is registered in the Unity Catalog.
From there, you can:
- View Predictions: Examine the forecasting results table.
- Run Batch Inference: Use the auto-generated notebook for batch predictions.
- Deploy the Model: Easily create a serving endpoint for real-time forecasting.
For a detailed comparison between serverless forecasting and traditional classic compute forecasting, check out the official documentation here.
This serverless forecasting feature empowers data teams to quickly build and deploy accurate time-series models without the burden of managing infrastructure, making forecasting more accessible and efficient than ever before.
Platform Updates
- databricks-agents SDK 0.13.0 release: Version 0.13.0 of the
databricks-agents
SDK has been released to PyPI. - Meta Llama 3.3 is now available for provisioned throughput workloads
- Meta Llama 3.3 70B Instruct is now available on Model Serving
- bamboolib is now deprecated
Blog Posts
Benchmarking Domain Intelligence
- Blog post available here.
Batch Inference on Fine Tuned Llama Models with Mosaic AI Model Serving
- Blog post available here.
Build an Autonomous AI Assistant with Mosaic AI Agent Framework
- Blog post available here.
Aimpoint Digital: AI Agent Systems for Building Travel Itineraries
- Blog post available here.