Predictive Optimisation is in GA, which uses AI to understand the maintenance operations required from Unity Catalog (eg: data access patterns) and automatically runs optimisations on your data layouts to improve query performance. This removes manual overhead of scheduling optimisation jobs with considerations around frequency, type of optimisation, tables are automatically managed
Cost Management Dashboards
This is in Public Preview. Account admins can now import dashboards to monitor costs at either an account level or on a workspace level. Use the dashboard to view the metrics below, with the option to fully customise the dashboard
Usage breakdown by SKU name
Usage analysis based on custom tags
Usage analysis on the most expensive usage
Usage breakdown by billing origin product
System Table updates
There are various updates around system tables, which is Databricks storage of operational data for observability:
Databricks Assistant system tablesin public preview: Track the usage of Databricks assistant through system.access.assistant_events table, which will record the workspace, datetime, and the email of the user initiating a message on assistant.
Node timeline system tablesare in public preview: The node timeline table provides node level utilisation at minute granularity. Monitor metrics such as node type, cpu & memory utilisation, as well as network traffic sent in bytes.
Query history system tablesin public preview: The system.query.history table records every SQL statement that has ran via SQL warehouses, where metrics such as the SQL statement, the warehouse id, execution duration, bytes read etc are available.
Billing system tablesare enabled by default in all Unity catalog workspaces. Billing tables allow you to get an overview of usage by SKU, duration etc
Workflows system tablesin public preview: There are 4 tables in the system.workflow schema, which allows you to monitor:
jobs: tracks creation, deletion & basic information of all jobs
job_tasks: tracks creation, deletion & basic information of all job tasks
jobs_run_timeline: records the start, end and resulting state of job runs
job_task_run_timeline: records the start, end, and resulting state of job tasks
Primary Key and Foreign Key constraints are GA and now enable faster queries
Primary keys (PK) and foreign keys (FK) can be defined for Unity Catalog tables for data modeling purposes. You can define it as a constraint during table creation or with modification. Do note that primary and foreign key constraints are currently not enforced. These are mainly used to indicate data integrity relationship, which also gives end users the ability to view the constraints in Unity Catalog via an Entity Relationship Diagram (ERD)
For valid primary keys, using the RELY option allows you to enable optimisations based on constraints as Databricks will factor in data integrity of the primary key declared into query plans to optimize queries
One optimization RELY enables is that it can eliminate unnecessary aggregates based on the primary key constraints. For example, if a distinct operation is ran over the table with a primary key using RELY, the unnecessary distinct operation is removed, which speeds up the query by 2x
Another optimization from RELY is removing unnecessary joins. If a query joins a table which is only referenced in the join condition, the primary key constraint present would indicate that the join will output one row, which in turn would help the query optimizer identify instances where it can eliminate the join from the query entirely. In the blog example, the optimization sped up the query from 1.5 minutes to 6 seconds!