Databricks Community

piotrsofts · ‎06-25-2025

Hello

While creating new Data Ingestion from GA4, can we set-up Liquid Clustering (either Manual or Automatical) on destination table which will contain fetched data from GA4?

mark_ott · ‎10-01-2025

Yes, in Databricks, it is possible to set up Liquid Clustering—both manual and automatic—on destination tables that store data ingested from Google Analytics 4 (GA4). This feature significantly improves table management and query performance compared to traditional partitioning or ZORDER techniques.

Manual Liquid Clustering Setup

Liquid clustering can be enabled during table creation using the CLUSTER BY clause in the SQL statement or corresponding DataFrame/DeltaTable APIs in Python/Scala.
Here’s an example for manual clustering:

text

CREATE TABLE ga4_events (...) CLUSTER BY (some_column);
This makes clustering flexible, allowing the clustering key to be redefined later without rewriting the historical data.

Automatic Liquid Clustering

As of Databricks Runtime 15.4 LTS and above, automatic liquid clustering is generally available for Unity Catalog–managed tables—a common setup when using Lakeflow or modern ingestion pipelines.
You can enable automatic clustering by using CLUSTER BY AUTO when creating the table.
Automatic liquid clustering intelligently selects the best key(s) to optimize performance, based on actual query usage and data patterns.

Key Considerations

Liquid clustering is not compatible with legacy partitioning or ZORDER, so these should be avoided when ingesting new data.
It is available and recommended for new streaming or batch tables, including those used as GA4 ingestion destinations.
Make sure the Databricks workspace is running a supported Databricks Runtime version (15.2+ for manual, 15.4+ for automatic liquid clustering).
For Unity Catalog–managed tables, which are typical for ingestion use cases, both manual and automatic liquid clustering are fully supported and recommended, ensuring optimal data management moving forward.

In summary, when ingesting data from GA4 into Databricks, Liquid Clustering—manual or automatic—can be enabled on the destination table, providing robust data layout optimization and query acceleration.

View solution in original post

mark_ott · ‎10-01-2025

Yes, in Databricks, it is possible to set up Liquid Clustering—both manual and automatic—on destination tables that store data ingested from Google Analytics 4 (GA4). This feature significantly improves table management and query performance compared to traditional partitioning or ZORDER techniques.

Manual Liquid Clustering Setup

Liquid clustering can be enabled during table creation using the CLUSTER BY clause in the SQL statement or corresponding DataFrame/DeltaTable APIs in Python/Scala.
Here’s an example for manual clustering:

text

CREATE TABLE ga4_events (...) CLUSTER BY (some_column);
This makes clustering flexible, allowing the clustering key to be redefined later without rewriting the historical data.

Automatic Liquid Clustering

As of Databricks Runtime 15.4 LTS and above, automatic liquid clustering is generally available for Unity Catalog–managed tables—a common setup when using Lakeflow or modern ingestion pipelines.
You can enable automatic clustering by using CLUSTER BY AUTO when creating the table.
Automatic liquid clustering intelligently selects the best key(s) to optimize performance, based on actual query usage and data patterns.

Key Considerations

Liquid clustering is not compatible with legacy partitioning or ZORDER, so these should be avoided when ingesting new data.
It is available and recommended for new streaming or batch tables, including those used as GA4 ingestion destinations.
Make sure the Databricks workspace is running a supported Databricks Runtime version (15.2+ for manual, 15.4+ for automatic liquid clustering).
For Unity Catalog–managed tables, which are typical for ingestion use cases, both manual and automatic liquid clustering are fully supported and recommended, ensuring optimal data management moving forward.

In summary, when ingesting data from GA4 into Databricks, Liquid Clustering—manual or automatic—can be enabled on the destination table, providing robust data layout optimization and query acceleration.

Databricks Community

LakeFlow Connect->GA4 - creation of Liquid Clustered stream table

Manual Liquid Clustering Setup

Automatic Liquid Clustering

Key Considerations

Manual Liquid Clustering Setup

Automatic Liquid Clustering

Key Considerations

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Databricks Community

LakeFlow Connect-&gt;GA4 - creation of Liquid Clustered stream table

Manual Liquid Clustering Setup

Automatic Liquid Clustering

Key Considerations

Manual Liquid Clustering Setup

Automatic Liquid Clustering

Key Considerations

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

LakeFlow Connect->GA4 - creation of Liquid Clustered stream table