cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use external locations

pernilak
New Contributor III

Hi,

I am struggling with truly understanding how to work with external locations. As far as I am able to read, you have:

1) Managed catalogs
2) Managed schemas
3) Managed tables/volumes etc.
4) External locations that contains external tables and/or volumes
5) External volumes that can reside inside managed catalogs/schemas

Most of the time, we want to write data inside of databricks - so managed catalogs, schemas and tables/volums seems natural. However, there are times when we want to write data (that we need to access inside of databricks) outside of databricks. In those cases, I understand that the way to do so, is using external locations. 

However, working with external locations afterwards, I don't find straight forward.

For volumes, I like how I can create an external volume inside of a catalog. Then I have my raw catalog, with domain schmas and belonging managed tables end external volumes are organized within. However, when working with tabular data I find it harder to understand what you are supposed to do with it.

Databricks says: "Don't grant general READ FILES [...] permission on external locations to end users". Then how exactly should my users (I am a platform engineer, my users are data engineers, scientists and analysts) access these files? I don't want to do the work of creating managed tables for every table in an external location - when new data appears, those tables must be refreshed with new data. We have a lot of streaming use cases as well. ideally, I want tables to be organized in my catalogs and schemas the same way you can do with external volumes.

 

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @pernilak, Please refer to the official Databricks documentation on external locations.

Let’s say you have an external location for financial data stored in an S3 bucket (s3://depts/finance).

  • Here’s how you can set it up:

  • -- Grant `finance` user permission to create an external location on `my_aws_storage_cred` storage credential
    GRANT CREATE EXTERNAL LOCATION ON STORAGE CREDENTIAL `my_aws_storage_cred` TO `finance`;
    
    -- Create an external location on the specific path to which `my_aws_storage_cred` has access
    CREATE EXTERNAL LOCATION finance_loc URL 's3://depts/finance' WITH (CREDENTIAL my_aws_storage_cred) COMMENT 'finance';
    
    -- Grant read, write, and create table access to the finance location for `finance` user
    GRANT READ FILES, WRITE FILES, CREATE EXTERNAL TABLE ON EXTERNAL LOCATION `finance_loc` TO `finance`;
    
    -- `finance` can read from any storage path under 's3://depts/finance' but nowhere else
    SELECT count(1) FROM `delta`.`s3://depts/finance/forecast_delta_table`; -- Returns 100
    
    -- 's3://depts/hr/' is not under external location `finance_loc`, so `finance` cannot read it
    SELECT count(1) FROM `delta`.`s3://depts/hr/employees_delta_table`; -- Throws an error
    
    -- `finance` can create an external table over specific objects within the `finance_loc` location
    CREATE TABLE main.default.sec_filings LOCATION 's3://depts/finance/sec_filings';
    

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group