We're dealing with this issue on our project in following way:
- we have defined config JSON file (could also be YAML - doesn't matter)
- and now let's say that you have param that has really long value - for the sake of example let's consider parameters related to tables that we want to load
So, we can definde our json config
config = [
{
"really_important_table": {
"table_name": "some_table_name",
"source_file_format": "json",
"data_lake_target_folder_name": "sample_target",
"data_source_path": "source_path",
"transform_function_name":
"function_name",
"autoloader_options": {
"cloudFiles.resourceGroup": "rg_name"
},
"clean_bronze": False
}
},
{
"table2": {
"table_name": "some_table_name2",
"source_file_format": "json",
"data_lake_target_folder_name": "folder_name",
"data_source_path": "src_path",
"transform_function_name": "transform_function",
"autoloader_options": {
"cloudFiles.resourceGroup": "rg_2"
},
"clean_bronze": False
}
}
]
Now you need to define python module that will read the content of this config file and will return config based on a provided key.
So for example, let's say you need to process really_important_table config. Then in your workflow you need to just pass really_important_table key and in your notebook/code use your module to get you a proper value associated with this key.