yesterday
I have an MS SQL server database that has several schemas we need to ingest data from. Call them "SCHEMA1" tables and "SCHEMA2" tables. Let's call the server S and the database D. In unity catalog I have a catalog called "staging" where the staging (bronze) data from all sources ends up in.
In staging I have a schema called "S_D", which is fine. However to distinguish the tables from the two different source schemas I end up calling them "SCHEMA1_TableName", "SCHEMA2_TableName", etc. That is, I use underscores to distinguish the two source schemas.
Is there a better way to handle this? Or is there a best practice? I would rather not have a separate catalog for each staging data source.
yesterday
Hello @dbernstein_tp ,
Generally, UC has a three-level naming convention: catalogue.schema.table, and currently, you are using something like staging.S_D.SCHEMA1_TableName
In my opinion, the right way to name it is to use staging.S_D_SCHEMA1.TableName, to represent the source schema instead of including it in the table name and also because it matches the medallion convention <env>_<zone>.<source_system>_<db>_<schema>.<table>
Therefore, as a best practice, I would suggest maintaining a single staging catalogue for bronze, but using one UC schema per source schema (or per source system). Avoid encoding the schema name into the table name when you can encode it into the UC schema level instead.
yesterday
Hello @dbernstein_tp ,
Generally, UC has a three-level naming convention: catalogue.schema.table, and currently, you are using something like staging.S_D.SCHEMA1_TableName
In my opinion, the right way to name it is to use staging.S_D_SCHEMA1.TableName, to represent the source schema instead of including it in the table name and also because it matches the medallion convention <env>_<zone>.<source_system>_<db>_<schema>.<table>
Therefore, as a best practice, I would suggest maintaining a single staging catalogue for bronze, but using one UC schema per source schema (or per source system). Avoid encoding the schema name into the table name when you can encode it into the UC schema level instead.
yesterday
Create indeed a different catalog for environment (DEV, QA, Staging, Prod), new schema/databases inside and then tables, views, volumes or functions following the three names convention <catalog_name>.<schema_name>.<table_name>. Remove, of course, the schema name from the tables or any other child data object. You can query system tables to list all objects per schema and catalog if needed.
23 hours ago
@dbernstein_tp - I would go with @K_Anudeep suggestion. This way you would always know upfront and explicit about source database mapping.
However, persoanlly, I will have one catalog per source system database.
12 hours ago
Thanks for the responses! @K_Anudeep suggestion makes sense in the context of our current lakehouse architecture so I think I will migrate to that.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now