July 2025 Release Highlights
🛠️Data Engineering
Moving tables between the Lakeflow declarative pipeline is GA
Teams can seamlessly reorganize streaming tables and materialized views as needs change—split a pipeline, consolidate workloads, or migrate to improved refresh schedules—without disruption or rebuilding from scratch. This increases agility in managing and scaling data platforms, helping maintain velocity as business requirements evolve. 📖 Documentation
Dynamic Partition overwrite with INSERT REPLACE USING
You can now surgically replace rows in a table based on match conditions, improving performance and simplifying updates out of the box. 📖 Documentation
Real-time mode is now available in Structured Streaming
You can now use Real Time mode, a trigger type for Structured Streaming that enables sub-second latency data processing. It’s designed for operational workloads that require immediate response to streaming data (typically from kafka to kafka). 📖 Documentation
🪄Serverless
New features are now available on Serverless Compute:
SQL procedure support
Set a default collation for SQL Functions
Recursive common table expressions (rCTE) support
PySpark and Spark Connect now support the DataFrames df.mergeInto
API
Support ALL CATALOGS
in SHOW
SCHEMAS
Liquid clustering now compacts deletion vectors more efficiently
Allow non-deterministic expressions in UPDATE
/INSERT
column values for MERGE
operations
Change Delta MERGE Python APIs to return DataFrame instead of Unit
⚡Lakebase
Synced tables are now automatically metered and billed
Synced tables are automatically tracked for usage and costs. This brings full cost transparency and control—critical for data teams scaling to many pipelines or sharing resources across business units. No more manual tracking when optimizing spend, forecasting budgets, or allocating costs between departments—usage is visible and billable out of the box. All usage is tracked and reported in the system.billing.usage
Databricks Apps support for Lakebase resources
Teams can now integrate Lakebase databases directly as resources for Databricks Apps, streamlining the workflow from analytics and operational data to application development. This enables building smarter, data-driven applications faster—without managing separate infrastructure or extra integrations—accelerating time to insight and innovation. 📖Documentation
Run job and for each have a separate limit
Complex workflows that rely on “Run job” or “For each” tasks can now operate in parallel without hitting the overall task concurrency ceiling. This means increased throughput for orchestrating batch workloads or parallel pipelines, less bottlenecking, and more efficient use of compute—especially in large-scale environments where maximizing concurrency is critical to meeting SLAs and deadlines.
🔗Data sharing
Delta Sharing supports sharing tables and schemas secured by ABAC
Delta Sharing providers can add tables and schemas secured by attribute-based access control to a Delta share. The policy does not govern the recipient's access, so recipients have full access to the shared asset. Recipients can apply their own ABAC policies.
The name of the organization is required to enable Delta Sharing on the metastore
An organization name must be specified if you are sharing data with a Databricks recipient not in your account when enabling Delta Sharing on your metastore. 📖Documentation
Git support updates
Databricks Git folder can be used to track and manage changes to alerts. 📖Documentation
You can use the UI to add and manage multiple Git credentials in the workspace from one or multiple git providers. 📖Documentation
Databricks Connector for Power BI supports the ADBC Driver
You can set the Databricks Connector for Power BI to use the Arrow Database Connectivity instead of the ODBC driver. 📖Documentation
🖥️Platform
New compute policy form
Jobs and Pipelines list now includes DBSQL Pipelines
The Jobs & Pipelines list now includes pipelines for materialized views and streaming tables that were created with Databricks SQL.
System tables improvements
The usage_metadata.job_name value in the system.billing.usage table now contains the run names for runs triggered through the one-time run API.
New columns are now available in the query history system table, providing additional query insights:
cache_origin_statement_id
: For query results fetched from cache, this field contains the statement ID of the query that originally inserted the result into the cache.
query_parameters
: A struct containing named and positional parameters used in parameterized queries.
written_rows
: The number of rows of persistent data written to cloud object storage.
written_files
: Number of files of persistent data written to cloud object storage.
Disable DBFS root and mounts is now available
You can disable access to DBFS and Mounts in existing Databricks workspaces. 📖Documentation
Notebooks improvements
Add a split view to edit notebooks side by side. 📖Documentation.
Pressing Cmd + F
(Mac) or Ctrl + F
(Windows) in a notebook now opens the native Databricks find-and-replace tool. This allows you to quickly search and replace text throughout your entire notebook, including content outside the current viewport. 📖Documentation
Data exploration using an LLM
You can ask natural language questions about the sample data using Catalog Explorer. The Assistant generates the SQL based on metadata context and table usage patterns. After the query is generated, you can validate the query and then run it against the underlying table.
Restore Python Variables after idle termination in Serverless Notebooks
Databricks snapshots your notebook’s Python variables before terminating idle serverless compute. When you reconnect your notebook is automatically restored from its snapshot letting you continue your work seamlessly.
🤖GenAI & ML
Expanding region availability for ai_parse_document
ai_parse_document
is now available in the following regions (AWS):us-west-2/ us-east-1/ us-east-2 /ap-northeast-1 /ap-northeast-2 /ap-south-1 /ap-southeast-1
ap-southeast-2/ca-central-1/eu-central-1 /eu-west-1 /eu-west-2 /sa-east-1
Agent Bricks: Multi-Agent Supervisor is in Beta
Agent Bricks: Multi-Agent Supervisor supports designing a multi-agent AI system that combines Genie spaces and Knowledge Assistant agent endpoints to work together on complex tasks that require different specialized skills. Documentation
Google Gemma 3 12B is available on Mosaic AI Model Serving
Gemma 3 12B supports text inputs for foundation models APIs pay-per-token and provisioned throughput and AI functions.
📝AIBI Genie
Caching enhancements speed up question performance when starting a new Genie session.
Genie automatically cancels query executions that take longer than 15 minutes to execute.
Genie allows editable parameters on date
and numeric
fields, expanding beyond previous support for string fields.
You can edit Genie space descriptions directly in place.
📊AIBI Dashboard
Authors can now undo and redo actions on the canvas using keyboard shortcuts or the toolbar.
Within the Add data dialog, authors can now filter to show only metric views.
Custom calculations now support AGGREGATE OVER
to enable window function-like behavior, such as computing moving averages and running totals. 📖Documentation
The maximum number of pages per dashboard has increased from ten to fifteen.
Support for parameterized schedules has been restored. 📖Documentation
🛡️Governance
Unity Catalog Python User-defined Table functions
You can now register Python UDTFs in Unity Catalog for centralized governance and reusable logic across SQL queries. 📖Documentation
Scalar Python UDFs support service credentials
Scalar Python UDFs can use Unity Catalog service credentials to securely access external cloud services. This is useful for integrating operations such as cloud-based tokenization, encryption, or secret management directly into your data transformations.
🔍 Data Warehousing
IDENTIFIER
support is now available in Databricks SQL for catalog operations.
CAN VIEW permission on SQL Warehouse is GA
The permission allows users to view SQL Warehouse, query history, and query profiles.
Expanded Spatial SQL expressions and Geometry and Geography Data types
Support for advanced spatial analytics and collation at the schema/catalog level simplifies complex location-based analytics and ensures consistency across all database objects, reducing friction in data warehousing projects. More than 80 new spatial SQL expressions are available. 📖Documentation
Support for Schema and Catalog level default collation
You can set a default collation for schemas and catalogs in Databricks Runtime 17.1. This allows you to define a collation that applies to all objects created within the schema or catalog, ensuring consistent collation behavior across your data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.