Databricks Community

athjain · ‎03-07-2022

the delta tables after ETL are stored in s3 in csv or parquet format, so now question is how to allow databricks sql endpoint to run query over s3 saved files

Hubert-Dudek · ‎03-07-2022

First mount bucket:

%python
aws_bucket_name = "<aws-bucket-name>"
mount_name = "<mount-name>"
 
dbutils.fs.mount("s3a://%s:%s@%s" % (access_key, encoded_secret_key, aws_bucket_name), "/mnt/%s" % mount_name)

than register delta folder as table:

%sql
CREATE TABLE `t`
USING delta
LOCATION '/tmp/delta'

View solution in original post

-werners- · ‎03-07-2022

You provide the data by using a hive metastore. This is basically putting a semantic view on top of your files so the data is served as a classic table. (f.e. saveAsTable or CREATE TABLE)

A Sql endpoint can access this hive metastore.

Do not forget to set up the data access (the sql endpoint needs access to the data with a service principal).

athjain · ‎03-07-2022

Thanks @Werner Stinckens for your response

But the question is that if in future I would like to move from databricks and the delta tables stored in dbfs (at backend in s3 using parquet file format)

How will I be able to retrieve it from the files in s3?

if instead of using dbfs (saveAsTable stores delta table in dbfs) to store delta tables, how we can use s3 directly and retrive the tables via sql endpoint

Or how to connect that sql endpoint to directly point to s3 for the tables

-werners- · ‎03-07-2022

Ok I get it.

In dbfs you have the option to use managed tables (data is managed by the databricks workspace) or unmanaged tables (data resides in an external storage like S3, ADLS etc).

So in your case you would be looking at unmanaged tables.

The SQL endpoint of course needs a table interface, so you still have to define tables on top of your data.

But those are only metadata.

You could even opt for an external metastore.

For example check this or this.

Hubert-Dudek · ‎03-07-2022

First mount bucket:

%python
aws_bucket_name = "<aws-bucket-name>"
mount_name = "<mount-name>"
 
dbutils.fs.mount("s3a://%s:%s@%s" % (access_key, encoded_secret_key, aws_bucket_name), "/mnt/%s" % mount_name)

than register delta folder as table:

%sql
CREATE TABLE `t`
USING delta
LOCATION '/tmp/delta'

Anonymous · ‎04-12-2022

Hey @Athlestan Jain

How are you doing?

Thanks for posting your question. Do you think you were able to resolve the issue?

We'd love to hear from you.

Databricks Community

How to query deltatables stored in s3 through databricks SQL Endpoint?

Connect with Databricks Users in Your Area

Introducing an exclusively Databricks-hosted Assistant

How to present and share your Notebook insights in AI/BI Dashboards

Meet the Databricks MVPs

Now Hiring: Databricks Community Technical Moderator

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs