My task is to sync 300 tables from on prem sql server to delta lake.
I will load CDC from Raw. First step is to move CDC data to bronze with autoloader. Then using delta stream get changes from bronze, make simple datatype changes and merge this dataset to silver.
Previously I had just 20 tables, so I made 20 notebooks. But now with 300 tables seems to me little bit too many notebooks.
I was thinking to make one notebook with loop through all tables and skip datatype changes. This approach would not be running in parallel in databricks job, right?
Is there any way to generate notebooks based on some metadata (table name, schema)?
Is it a good idea to run job where is 300 notebooks? Number of records from CDC will not be in millions, but let's say in thousands.
Thanks.