There is indeed a better and more integrated way to make Delta Lake tables accessible in Redshift without manually generating manifests and dynamically creating external tables or partitions. Some important points and options:
-
Databricks Delta Lake supports an automatic manifest generation mode by enabling the table property delta.compatibility.symlinkFormatManifest.enabled. This auto-updates the manifest files on each table update, so you don't have to manually generate them.
-
Redshift Spectrum can query external tables on S3 and supports partitioning, but requires manifest files for Delta Lake tables as a snapshot of the data. Using the symlink format manifest allows consistent snapshot querying.
-
Your current approach of recreating external tables and dynamically adding partitions can be simplified by making use of the automatic manifest generation and defining external tables only once to point to the manifest directory (e.g., <pathToDeltaTable>/_symlink_format_manifest). Then partitions can be automatically handled by Redshift Spectrum if Glue Catalog or other metadata stores are used properly.
-
Unity Catalog itself does not natively integrate with Redshift Spectrum to automatically sync tables or partitions without manual DDL operations, but you can use AWS Glue Catalog as a metadata store which integrates well with Redshift Spectrum to automate partition detection and table schema management.
-
AWS Glue Crawlers can automate the detection of partitions and schema updates for S3 data, including Delta Lake manifests, thus reducing or eliminating the need to manually alter tables in Redshift Spectrum.
Summary:
The best practice is to enable automatic manifest updates on the Delta table, register the Delta table location with Redshift Spectrum external tables once (pointing to the _symlink_format_manifest), and use AWS Glue Catalog with crawlers to automate partition and schema management. This avoids the need to dynamically generate DDL and add partitions manually. Full native integration between Unity Catalog and Redshift Spectrum for this purpose is not available yet.
If desired, you can also explore other approaches like partition projection in Redshift as a metadata-free alternative, but metadata-driven approach with Glue Catalog is more robust.