Hello,
I have a large on-prem SQL database (~15TB). It heavily utilizes the sql_variant datatype. Would like to move it into a Databricks bronze layer, and have it synchronized as close to 'live' as possible.
What could be the solution?
It seems like a very basic scenario to use Databricks, but somehow couldn't fine any example nor explanation.
I tried two approaches, neither worked:
SQL CDC -> ADF Pipeline -> Blob Storage -> Databricks
- it seems unnecessary complex, fragile
- couldn't create a Databricks DLT that would be initiated from table 'snapshot' and kept updated by CDC exports
Lakeflow Connect
- does not support sql_variant
- changing SQL schema (to eliminate/replace/convert sql_variant) is not an option due to many reasons (size, performance, downtime)