Hi,
I am struggling with truly understanding how to work with external locations. As far as I am able to read, you have:
1) Managed catalogs
2) Managed schemas
3) Managed tables/volumes etc.
4) External locations that contains external tables and/or volumes
5) External volumes that can reside inside managed catalogs/schemas
Most of the time, we want to write data inside of databricks - so managed catalogs, schemas and tables/volums seems natural. However, there are times when we want to write data (that we need to access inside of databricks) outside of databricks. In those cases, I understand that the way to do so, is using external locations.
However, working with external locations afterwards, I don't find straight forward.
For volumes, I like how I can create an external volume inside of a catalog. Then I have my raw catalog, with domain schmas and belonging managed tables end external volumes are organized within. However, when working with tabular data I find it harder to understand what you are supposed to do with it.
Databricks says: "Don't grant general READ FILES [...] permission on external locations to end users". Then how exactly should my users (I am a platform engineer, my users are data engineers, scientists and analysts) access these files? I don't want to do the work of creating managed tables for every table in an external location - when new data appears, those tables must be refreshed with new data. We have a lot of streaming use cases as well. ideally, I want tables to be organized in my catalogs and schemas the same way you can do with external volumes.