Data Engineering
Migrate Lakeflow Declarative Pipelines from legacy publishing mode is GA
Lakeflow Declarative Pipelines has a legacy publishing mode that only allows publishing to a single catalog and schema. The default publishing mode enables publishing to multiple catalogs and schemas. 📖 Documentation
Selectively and atomically replace data with Insert replace using and insert replace on is GA
INSERT REPLACE USING
replaces rows when the USING
columns compare equal under equality. INSERT REPLACE ON
replaces rows when they match a user-defined condition.
Microsoft SQL Server connector and ServiceNow are GA
Salesforce Data Cloud File Sharing connector is GA
New table property for Delta lake compression
You can explicitly set the compression codec for a Delta table using the delta.parquet.compression.codec
table property. 📖 Documentation
Create external Delta tables from third party clients
You can now create Unity Catalog external tables backed by Delta Lake from external clients and systems, such as Apache Spark.📖 Documentation
Lakeflow Declarative Pipeline improvements
You can change the identity that a pipeline uses to run updates and the owner of tables published by the pipeline. This feature allows you to set a service principal as the run-as identity, which is safer and more reliable than using user accounts for automated workloads. 📖 Documentation
You can now easily create an ETL pipeline in a bundle in the workspace using the new Lakeflow Declarative Pipelines template project.
You can use automatic liquid clustering with CLUSTER BY AUTO
and Databricks intelligently chooses clustering keys to optimize query performance.
⚡Lakebase
OLTP Database tab renamed to Lakebase Postgres
Lakebase synced tables supports syncing Apache Iceberg and Foreign tables
You can create synced tables in Snapshot sync mode from Iceberg tables or foreign tables. 📖 Documentation
Data type mapping
For new synced tables: TIMESTAMP
types in source tables are mapped to TIMESTAMP WITH TIMEZONE
in synced tables.
Budget policy is supported by Lakebase
You can tag a database instance and a synced table with a budget policy to attribute billing usage to specific policies. Additionally, custom tags can be added to a database instance for more granular attribution of compute usage to teams, projects, or cost centers.
Lakebase is enabled by default
The Lakebase: Managed Postgres OLTP Database preview is enabled by default
🪄Serverless
Serverless compute for notebooks, workflows, and Lakeflow Declarative Pipelines is available in the Asia Pacific (Jakarta) region (ap-southeast-3
)
Base environments are custom environment specifications for serverless notebooks that define a serverless environment version and a set of dependencies. 📖 Documentation
Serverless GPU supports hyperparameters sweeps and Multinode Workloads and schedule jobs. 📖 Documentation
New features are available on Serverless Compute:
Databricks connect upgraded to 17.0
Scalar Python UDFS support service credentials
PySpark and Spark Connect support the df.mergeInto
🖥️Platform
Notebooks Improvements
Edit Mode in Assistant does multi-cell code refactoring and more. 📖Documentation.
Use the cell execution minimap to track your notebook’s progress at a glance. The minimap appears in the right margin and shows each cell’s execution state. Hover to see cell details, or click to jump directly to a cell.
Notebook autocomplete supports enhanced suggestions for complex data types including structs, maps, and arrays in SQL cells.
Lakeflow job improvements
Jobs that are set to run in continuous mode have the option to retry individual tasks on task failure. 📖Documentation
Power BI Databricks connector supports M2M OAuth
You can authenticate into Power BI Desktop using M2M OAuth. Databricks recommends switching to the new client credentials authentication option. 📖Documentation
Account SCIM 2.0 updates
Databricks has updated the Account SCIM API for identity management as follows:
Calling GET with filter params filter=displayName eq value_without_quotes
results in a syntax error. To prevent this error, use quotation marks to wrap the value
Calling GET /api/2.0/accounts/{account_id}/scim/v2/Groups
no longer returns members. Instead, iterate through get group details
to get membership information
Calling PATCH /api/2.0/accounts/{account_id}/scim/v2/Groups/{id}
returns a 204 response instead of a 200 response.
OAuth token federation is GA
Databricks assistant improvements
You can chat with Databricks Assistant on some compute pages. Use the Assistant chat panel to help you create a new compute resource, pool, and policy.
You can tailor how Databricks Assistant responds by adding custom user instructions. Guide the Assistant with preferences, coding conventions, and response guidelines.
Disable legacy features for new workspaces
A new account console setting allows account admins to disable certain legacy features on new workspaces created in their account. 📖Documentation
Serverless Workspaces are available
🤖GenAI & ML
AI Playground is GA
OpenAI GPT OSS models are available on Mosaic AI Model Serving
Mosaic AI Model Serving supports OpenAI's GPT OSS 120B and GPT OSS 20B as Databricks-hosted foundation models. 📖Documentation
Batch inference with GPT OSS models
OpenAI GPT OSS 120B and GPT OSS 20B are optimized for AI Functions, which means you can perform batch inference using these models and AI Functions like ai_query()
External MCP servers are in Beta
You can connect Databricks to external MCP servers. 📖 Documentation
Mosaic AI Vector Search reranker is available
Mosaic AI Vector Search offers reranking to help improve retrieval quality. 📖 Documentation
Token based rate limits are available on AI Gateway
You can configure token-based rate limits on your model serving endpoints.. 📖 Documentation
OpenAI GPT OSS models releases
The Databricks-hosted foundation models, OpenAI GPT OSS 120B and GPT OSS 20B support function and tool calling and provisioned throughput. 📖 Documentation
📝AIBI Genie
SQL expression validation for join relationships
File uploads are accessible only to the users who uploaded them
You can define join relationships locally within a Genie space's knowledge store. This is useful when authors lack permissions to define primary and foreign keys on upstream tables or when the join relationship has specific requirements, such as one-to-many or complex joins
Value dictionaries select the most frequent 1024 values from the first 100k rows, instead of the first 1024 values encountered.
📊AIBI Dashboard
Dashboard external embedding Public Preview and new custom calculations functions
You can define up to 200 custom calculations per dashboard.
🛡️Governance
Governed tags are public preview
You can create governed tags to enforce consistent tagging across data assets such as catalogs, schemas, and tables. Admins define the allowed keys and values and control which users and groups can assign them to objects. This helps standardize metadata for data classification, cost tracking, access control and automation. 📖Documentation
Single node compute on standard access mode is GA
This configuration allows multiple users to share a single-node compute resource with full user isolation. Single-node compute is useful for small jobs or non-distributed workloads.
Column masks retained when replacing a table
If a column in the new table matches a column name from the original table, its existing column mask is retained, even if no mask is specified. This change prevents accidental removal of column-level security policies during table replacement.
Access requests in Unity Catalog
You can enable self-service access requests in Unity Catalog by configuring access request destinations on securable objects.
Users can request access to Unity Catalog objects that they discover. These requests are sent to configured destinations, such as emails, Slack or Microsoft Teams channels, or they can be redirected to an internal access management system.
Path credential vending
You can use path credential vending to grant short-lived credentials to external locations in your Unity Catalog metastore. Documentation
🔍 Data Warehousing
Default warehouse setting is available in Beta
Support for timestamp without time zone syntax
Support for schema and catalog level default collation
Expanded spatial SQL expressions and GEOMETRY and GEOGRAPHY data types
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.