Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direction if that's appropriate.
In order to achieve the benefits of data science, data analytics and facilitate data quality, my company has made the decision to invest in a building data lake. Almost immediately our application solution engineers observed that could/should be able to get access to multi-domain and/or mastered single domain data through data APIs built on top of the lake, rather than relying on multiple application APIs or consuming API's built on unmastered/uncertified data in source systems. Assuming that one of the primary goals of the data lake is improving data quality, how can you introduce data quality rules at scale without creating a version control problem in your API catalog that your application owners ultimately can't keep up with and really just becomes tech debt? The promise of the lake can't simply be about science and analytics, can it?