cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use external locations

pernilak
New Contributor III

Hi,

I am struggling with truly understanding how to work with external locations. As far as I am able to read, you have:

1) Managed catalogs
2) Managed schemas
3) Managed tables/volumes etc.
4) External locations that contains external tables and/or volumes
5) External volumes that can reside inside managed catalogs/schemas

Most of the time, we want to write data inside of databricks - so managed catalogs, schemas and tables/volums seems natural. However, there are times when we want to write data (that we need to access inside of databricks) outside of databricks. In those cases, I understand that the way to do so, is using external locations. 

However, working with external locations afterwards, I don't find straight forward.

For volumes, I like how I can create an external volume inside of a catalog. Then I have my raw catalog, with domain schmas and belonging managed tables end external volumes are organized within. However, when working with tabular data I find it harder to understand what you are supposed to do with it.

Databricks says: "Don't grant general READ FILES [...] permission on external locations to end users". Then how exactly should my users (I am a platform engineer, my users are data engineers, scientists and analysts) access these files? I don't want to do the work of creating managed tables for every table in an external location - when new data appears, those tables must be refreshed with new data. We have a lot of streaming use cases as well. ideally, I want tables to be organized in my catalogs and schemas the same way you can do with external volumes.

 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @pernilak, Please refer to the official Databricks documentation on external locations.

Let’s say you have an external location for financial data stored in an S3 bucket (s3://depts/finance).

  • Here’s how you can set it up:

  • -- Grant `finance` user permission to create an external location on `my_aws_storage_cred` storage credential
    GRANT CREATE EXTERNAL LOCATION ON STORAGE CREDENTIAL `my_aws_storage_cred` TO `finance`;
    
    -- Create an external location on the specific path to which `my_aws_storage_cred` has access
    CREATE EXTERNAL LOCATION finance_loc URL 's3://depts/finance' WITH (CREDENTIAL my_aws_storage_cred) COMMENT 'finance';
    
    -- Grant read, write, and create table access to the finance location for `finance` user
    GRANT READ FILES, WRITE FILES, CREATE EXTERNAL TABLE ON EXTERNAL LOCATION `finance_loc` TO `finance`;
    
    -- `finance` can read from any storage path under 's3://depts/finance' but nowhere else
    SELECT count(1) FROM `delta`.`s3://depts/finance/forecast_delta_table`; -- Returns 100
    
    -- 's3://depts/hr/' is not under external location `finance_loc`, so `finance` cannot read it
    SELECT count(1) FROM `delta`.`s3://depts/hr/employees_delta_table`; -- Throws an error
    
    -- `finance` can create an external table over specific objects within the `finance_loc` location
    CREATE TABLE main.default.sec_filings LOCATION 's3://depts/finance/sec_filings';