cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to query deltatables stored in s3 through databricks SQL Endpoint?

athjain
New Contributor III

the delta tables after ETL are stored in s3 in csv or parquet format, so now question is how to allow databricks sql endpoint to run query over s3 saved files

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

First mount bucket:

%python
aws_bucket_name = "<aws-bucket-name>"
mount_name = "<mount-name>"
 
dbutils.fs.mount("s3a://%s:%s@%s" % (access_key, encoded_secret_key, aws_bucket_name), "/mnt/%s" % mount_name)

than register delta folder as table:

%sql
CREATE TABLE `t`
USING delta
LOCATION '/tmp/delta'

View solution in original post

5 REPLIES 5

-werners-
Esteemed Contributor III

You provide the data by using a hive metastore. This is basically putting a semantic view on top of your files so the data is served as a classic table. (f.e. saveAsTable or CREATE TABLE)

A Sql endpoint can access this hive metastore.

Do not forget to set up the data access (the sql endpoint needs access to the data with a service principal).

athjain
New Contributor III

Thanks @Werner Stinckens​ for your response

But the question is that if in future I would like to move from databricks and the delta tables stored in dbfs (at backend in s3 using parquet file format)

How will I be able to retrieve it from the files in s3?

if instead of using dbfs (saveAsTable stores delta table in dbfs) to store delta tables, how we can use s3 directly and retrive the tables via sql endpoint

Or how to connect that sql endpoint to directly point to s3 for the tables

-werners-
Esteemed Contributor III

Ok I get it.

In dbfs you have the option to use managed tables (data is managed by the databricks workspace) or unmanaged tables (data resides in an external storage like S3, ADLS etc).

So in your case you would be looking at unmanaged tables.

The SQL endpoint of course needs a table interface, so you still have to define tables on top of your data.

But those are only metadata.

You could even opt for an external metastore.

For example check this or this.

Hubert-Dudek
Esteemed Contributor III

First mount bucket:

%python
aws_bucket_name = "<aws-bucket-name>"
mount_name = "<mount-name>"
 
dbutils.fs.mount("s3a://%s:%s@%s" % (access_key, encoded_secret_key, aws_bucket_name), "/mnt/%s" % mount_name)

than register delta folder as table:

%sql
CREATE TABLE `t`
USING delta
LOCATION '/tmp/delta'

Anonymous
Not applicable

Hey @Athlestan Jain​ 

How are you doing?

Thanks for posting your question. Do you think you were able to resolve the issue?

We'd love to hear from you.