I built a script about 6 months ago to make our Delta Tables accessible in Redshift for another team, but it's a bit nasty...
- Generate a delta lake manifest each time the databricks delta table is updated
- Recreate the redshift external table (incase the schema has changed or table recreated with a different location) by dynamically generating the DDL pointing to '<pathToDeltaTable>/_symlink_format_manifest/' . [code 1]
- If the delta table is partitioned, generate a list of the delta partitions and register each partition in redshift. (ALTER TABLE <tblname> ADD PARTITION IF NOT EXISTS (<partitioncol>='<colvalue>') LOCATION '<pathToDeltaTable>/_symlink_format_manifest/<partitioncol>=<colvalue>')
It's fine and it works, but is there a better way of doing this? Like integration between Unity Catalog and Redshift Spectrum that wouldn't require dynamically generating DDLs and adding partitions like this?