Databricks Community

stevenayers-bge · ‎10-27-2024

I built a script about 6 months ago to make our Delta Tables accessible in Redshift for another team, but it's a bit nasty...

Generate a delta lake manifest each time the databricks delta table is updated
Recreate the redshift external table (incase the schema has changed or table recreated with a different location) by dynamically generating the DDL pointing to '<pathToDeltaTable>/_symlink_format_manifest/' . [code 1]
If the delta table is partitioned, generate a list of the delta partitions and register each partition in redshift. (ALTER TABLE <tblname> ADD PARTITION IF NOT EXISTS (<partitioncol>='<colvalue>') LOCATION '<pathToDeltaTable>/_symlink_format_manifest/<partitioncol>=<colvalue>')

It's fine and it works, but is there a better way of doing this? Like integration between Unity Catalog and Redshift Spectrum that wouldn't require dynamically generating DDLs and adding partitions like this?

aashish122 · ‎10-28-2024

Still searching for the same pain point...may be in marketplace to integrate Unity Catalog and Redshift

mark_ott · yesterday

There is indeed a better and more integrated way to make Delta Lake tables accessible in Redshift without manually generating manifests and dynamically creating external tables or partitions. Some important points and options:

Databricks Delta Lake supports an automatic manifest generation mode by enabling the table property delta.compatibility.symlinkFormatManifest.enabled. This auto-updates the manifest files on each table update, so you don't have to manually generate them.
Redshift Spectrum can query external tables on S3 and supports partitioning, but requires manifest files for Delta Lake tables as a snapshot of the data. Using the symlink format manifest allows consistent snapshot querying.
Your current approach of recreating external tables and dynamically adding partitions can be simplified by making use of the automatic manifest generation and defining external tables only once to point to the manifest directory (e.g., <pathToDeltaTable>/_symlink_format_manifest). Then partitions can be automatically handled by Redshift Spectrum if Glue Catalog or other metadata stores are used properly.
Unity Catalog itself does not natively integrate with Redshift Spectrum to automatically sync tables or partitions without manual DDL operations, but you can use AWS Glue Catalog as a metadata store which integrates well with Redshift Spectrum to automate partition detection and table schema management.
AWS Glue Crawlers can automate the detection of partitions and schema updates for S3 data, including Delta Lake manifests, thus reducing or eliminating the need to manually alter tables in Redshift Spectrum.

Summary:

The best practice is to enable automatic manifest updates on the Delta table, register the Delta table location with Redshift Spectrum external tables once (pointing to the _symlink_format_manifest), and use AWS Glue Catalog with crawlers to automate partition and schema management. This avoids the need to dynamically generate DDL and add partitions manually. Full native integration between Unity Catalog and Redshift Spectrum for this purpose is not available yet.

If desired, you can also explore other approaches like partition projection in Redshift as a metadata-free alternative, but metadata-driven approach with Glue Catalog is more robust.

Databricks Community

Querying Unity Managed Tables from Redshift

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐