cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use external locations

pernilak
New Contributor III

Hi,

I am struggling with truly understanding how to work with external locations. As far as I am able to read, you have:

1) Managed catalogs
2) Managed schemas
3) Managed tables/volumes etc.
4) External locations that contains external tables and/or volumes
5) External volumes that can reside inside managed catalogs/schemas

Most of the time, we want to write data inside of databricks - so managed catalogs, schemas and tables/volums seems natural. However, there are times when we want to write data (that we need to access inside of databricks) outside of databricks. In those cases, I understand that the way to do so, is using external locations. 

However, working with external locations afterwards, I don't find straight forward.

For volumes, I like how I can create an external volume inside of a catalog. Then I have my raw catalog, with domain schmas and belonging managed tables end external volumes are organized within. However, when working with tabular data I find it harder to understand what you are supposed to do with it.

Databricks says: "Don't grant general READ FILES [...] permission on external locations to end users". Then how exactly should my users (I am a platform engineer, my users are data engineers, scientists and analysts) access these files? I don't want to do the work of creating managed tables for every table in an external location - when new data appears, those tables must be refreshed with new data. We have a lot of streaming use cases as well. ideally, I want tables to be organized in my catalogs and schemas the same way you can do with external volumes.

 

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @pernilak, Please refer to the official Databricks documentation on external locations.

Let’s say you have an external location for financial data stored in an S3 bucket (s3://depts/finance).

  • Here’s how you can set it up:

  • -- Grant `finance` user permission to create an external location on `my_aws_storage_cred` storage credential
    GRANT CREATE EXTERNAL LOCATION ON STORAGE CREDENTIAL `my_aws_storage_cred` TO `finance`;
    
    -- Create an external location on the specific path to which `my_aws_storage_cred` has access
    CREATE EXTERNAL LOCATION finance_loc URL 's3://depts/finance' WITH (CREDENTIAL my_aws_storage_cred) COMMENT 'finance';
    
    -- Grant read, write, and create table access to the finance location for `finance` user
    GRANT READ FILES, WRITE FILES, CREATE EXTERNAL TABLE ON EXTERNAL LOCATION `finance_loc` TO `finance`;
    
    -- `finance` can read from any storage path under 's3://depts/finance' but nowhere else
    SELECT count(1) FROM `delta`.`s3://depts/finance/forecast_delta_table`; -- Returns 100
    
    -- 's3://depts/hr/' is not under external location `finance_loc`, so `finance` cannot read it
    SELECT count(1) FROM `delta`.`s3://depts/hr/employees_delta_table`; -- Throws an error
    
    -- `finance` can create an external table over specific objects within the `finance_loc` location
    CREATE TABLE main.default.sec_filings LOCATION 's3://depts/finance/sec_filings';
    
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!