LakeFlow Connect->GA4 - creation of Liquid Clustered stream table

piotrsofts — Wed, 25 Jun 2025 15:09:30 GMT

Hello

While creating new Data Ingestion from GA4, can we set-up Liquid Clustering (either Manual or Automatical) on destination table which will contain fetched data from GA4?

Re: LakeFlow Connect->GA4 - creation of Liquid Clustered stream table

mark_ott — Wed, 01 Oct 2025 13:21:27 GMT

Yes, in Databricks, it is possible to set up Liquid Clustering—both manual and automatic—on destination tables that store data ingested from Google Analytics 4 (GA4). This feature significantly improves table management and query performance compared to traditional partitioning or ZORDER techniques.

Manual Liquid Clustering Setup

Liquid clustering can be enabled during table creation using the CLUSTER BY clause in the SQL statement or corresponding DataFrame/DeltaTable APIs in Python/Scala.
Here’s an example for manual clustering:

text

CREATE TABLE ga4_events (...) CLUSTER BY (some_column);
This makes clustering flexible, allowing the clustering key to be redefined later without rewriting the historical data.

Automatic Liquid Clustering

As of Databricks Runtime 15.4 LTS and above, automatic liquid clustering is generally available for Unity Catalog–managed tables—a common setup when using Lakeflow or modern ingestion pipelines.
You can enable automatic clustering by using CLUSTER BY AUTO when creating the table.
Automatic liquid clustering intelligently selects the best key(s) to optimize performance, based on actual query usage and data patterns.

Key Considerations

Liquid clustering is not compatible with legacy partitioning or ZORDER, so these should be avoided when ingesting new data.
It is available and recommended for new streaming or batch tables, including those used as GA4 ingestion destinations.
Make sure the Databricks workspace is running a supported Databricks Runtime version (15.2+ for manual, 15.4+ for automatic liquid clustering).
For Unity Catalog–managed tables, which are typical for ingestion use cases, both manual and automatic liquid clustering are fully supported and recommended, ensuring optimal data management moving forward.

In summary, when ingesting data from GA4 into Databricks, Liquid Clustering—manual or automatic—can be enabled on the destination table, providing robust data layout optimization and query acceleration.

topic Re: LakeFlow Connect-&gt;GA4 - creation of Liquid Clustered stream table in Data Engineering

LakeFlow Connect->GA4 - creation of Liquid Clustered stream table

Re: LakeFlow Connect->GA4 - creation of Liquid Clustered stream table

Manual Liquid Clustering Setup

Automatic Liquid Clustering

Key Considerations

topic Re: LakeFlow Connect->GA4 - creation of Liquid Clustered stream table in Data Engineering