cancel
Showing results for 
Search instead for 
Did you mean: 
Product Platform Updates
Stay informed about the latest updates and enhancements to the Databricks platform. Learn about new features, improvements, and best practices to optimize your data analytics workflow.
cancel
Showing results for 
Search instead for 
Did you mean: 
Sujitha
Databricks Employee
Databricks Employee

July 2025 Release Highlights 

Screenshot 2025-09-01 at 12.03.39 PM.png

🛠️Data Engineering

Moving tables between the Lakeflow declarative pipeline is GA

Teams can seamlessly reorganize streaming tables and materialized views as needs change—split a pipeline, consolidate workloads, or migrate to improved refresh schedules—without disruption or rebuilding from scratch. This increases agility in managing and scaling data platforms, helping maintain velocity as business requirements evolve. 📖 Documentation

Dynamic Partition overwrite with INSERT REPLACE USING

You can now surgically replace rows in a table based on match conditions, improving performance and simplifying updates out of the box. 📖 Documentation

Real-time mode is now available in Structured Streaming

You can now use Real Time mode, a trigger type for Structured Streaming that enables sub-second latency data processing. It’s designed for operational workloads that require immediate response to streaming data (typically from kafka to kafka). 📖 Documentation

🪄Serverless

New features are now available on Serverless Compute:

  • SQL procedure support

  • Set a default collation for SQL Functions

  • Recursive common table expressions (rCTE) support

  • PySpark and Spark Connect now support the DataFrames df.mergeInto API

  • Support ALL CATALOGS in SHOW SCHEMAS

  • Liquid clustering now compacts deletion vectors more efficiently

  • Allow non-deterministic expressions in UPDATE/INSERT column values for MERGE operations

  • Change Delta MERGE Python APIs to return DataFrame instead of Unit

Lakebase

Synced tables are now automatically metered and billed

Synced tables are automatically tracked for usage and costs. This brings full cost transparency and control—critical for data teams scaling to many pipelines or sharing resources across business units. No more manual tracking when optimizing spend, forecasting budgets, or allocating costs between departments—usage is visible and billable out of the box. All usage is tracked and reported in the system.billing.usage

Databricks Apps support for Lakebase resources

Teams can now integrate Lakebase databases directly as resources for Databricks Apps, streamlining the workflow from analytics and operational data to application development. This enables building smarter, data-driven applications faster—without managing separate infrastructure or extra integrations—accelerating time to insight and innovation. 📖Documentation

Run job and for each have a separate limit

Complex workflows that rely on “Run job” or “For each” tasks can now operate in parallel without hitting the overall task concurrency ceiling. This means increased throughput for orchestrating batch workloads or parallel pipelines, less bottlenecking, and more efficient use of compute—especially in large-scale environments where maximizing concurrency is critical to meeting SLAs and deadlines.

🔗Data sharing

Delta Sharing supports sharing tables and schemas secured by ABAC

Delta Sharing providers can add tables and schemas secured by attribute-based access control to a Delta share. The policy does not govern the recipient's access, so recipients have full access to the shared asset. Recipients can apply their own ABAC policies.

The name of the organization is required to enable Delta Sharing on the metastore

An organization name must be specified if you are sharing data with a Databricks recipient not in your account when enabling Delta Sharing on your metastore. 📖Documentation

Git support updates

  • Databricks Git folder can be used to track and manage changes to alerts. 📖Documentation

  • You can use the UI to add and manage multiple Git credentials in the workspace from one or multiple git providers. 📖Documentation

Databricks Connector for Power BI supports the ADBC Driver

You can set the Databricks Connector for Power BI to use the Arrow Database Connectivity instead of the ODBC driver. 📖Documentation

🖥️Platform

New compute policy form

Jobs and Pipelines list now includes DBSQL Pipelines

The Jobs & Pipelines list now includes pipelines for materialized views and streaming tables that were created with Databricks SQL.

System tables improvements

The usage_metadata.job_name value in the system.billing.usage table now contains the run names for runs triggered through the one-time run API.

New columns are now available in the query history system table, providing additional query insights:

  • cache_origin_statement_id: For query results fetched from cache, this field contains the statement ID of the query that originally inserted the result into the cache.

  • query_parameters: A struct containing named and positional parameters used in parameterized queries.

  • written_rows: The number of rows of persistent data written to cloud object storage.

  • written_files: Number of files of persistent data written to cloud object storage.

Disable DBFS root and mounts is now available

You can disable access to DBFS and Mounts in existing Databricks workspaces. 📖Documentation

 

Notebooks improvements

  • Add a split view to edit notebooks side by side. 📖Documentation.

  • Pressing Cmd + F (Mac) or Ctrl + F (Windows) in a notebook now opens the native Databricks find-and-replace tool. This allows you to quickly search and replace text throughout your entire notebook, including content outside the current viewport. 📖Documentation

Data exploration using an LLM

You can ask natural language questions about the sample data using Catalog Explorer. The Assistant generates the SQL based on metadata context and table usage patterns. After the query is generated, you can validate the query and then run it against the underlying table.

 

 

Restore Python Variables after idle termination in Serverless Notebooks

Databricks snapshots your notebook’s Python variables before terminating idle serverless compute. When you reconnect your notebook is automatically restored from its snapshot letting you continue your work seamlessly.

🤖GenAI & ML

Expanding region availability for ai_parse_document

ai_parse_document is now available in the following regions (AWS):
us-west-2/ us-east-1/ us-east-2 /ap-northeast-1 /ap-northeast-2 /ap-south-1 /ap-southeast-1
ap-southeast-2/ca-central-1/eu-central-1 /eu-west-1 /eu-west-2 /sa-east-1

Agent Bricks: Multi-Agent Supervisor is in Beta

Agent Bricks: Multi-Agent Supervisor supports designing a multi-agent AI system that combines Genie spaces and Knowledge Assistant agent endpoints to work together on complex tasks that require different specialized skills. Documentation

Google Gemma 3 12B is available on Mosaic AI Model Serving

Gemma 3 12B supports text inputs for foundation models APIs pay-per-token and provisioned throughput and AI functions.

📝AIBI Genie

 
  • Caching enhancements speed up question performance when starting a new Genie session.

  • Genie automatically cancels query executions that take longer than 15 minutes to execute.

  • Genie allows editable parameters on date and numeric fields, expanding beyond previous support for string fields.

  • You can edit Genie space descriptions directly in place.

📊AIBI Dashboard

  • Authors can now undo and redo actions on the canvas using keyboard shortcuts or the toolbar.

  • Within the Add data dialog, authors can now filter to show only metric views.

  • Custom calculations now support AGGREGATE OVER to enable window function-like behavior, such as computing moving averages and running totals. 📖Documentation

  • The maximum number of pages per dashboard has increased from ten to fifteen.

  • Support for parameterized schedules has been restored. 📖Documentation

🛡️Governance

Unity Catalog Python User-defined Table functions

You can now register Python UDTFs in Unity Catalog for centralized governance and reusable logic across SQL queries. 📖Documentation

Scalar Python UDFs support service credentials

Scalar Python UDFs can use Unity Catalog service credentials to securely access external cloud services. This is useful for integrating operations such as cloud-based tokenization, encryption, or secret management directly into your data transformations.

🔍 Data Warehousing

IDENTIFIER support is now available in Databricks SQL for catalog operations.

CAN VIEW permission on SQL Warehouse is GA

The permission allows users to view SQL Warehouse, query history, and query profiles.

Expanded Spatial SQL expressions and Geometry and Geography Data types

Support for advanced spatial analytics and collation at the schema/catalog level simplifies complex location-based analytics and ensures consistency across all database objects, reducing friction in data warehousing projects. More than 80 new spatial SQL expressions are available. 📖Documentation

Support for Schema and Catalog level default collation

You can set a default collation for schemas and catalogs in Databricks Runtime 17.1. This allows you to define a collation that applies to all objects created within the schema or catalog, ensuring consistent collation behavior across your data.