- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 12:31 AM
the delta tables after ETL are stored in s3 in csv or parquet format, so now question is how to allow databricks sql endpoint to run query over s3 saved files
- Labels:
-
SQL
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 02:25 AM
First mount bucket:
%python
aws_bucket_name = "<aws-bucket-name>"
mount_name = "<mount-name>"
dbutils.fs.mount("s3a://%s:%s@%s" % (access_key, encoded_secret_key, aws_bucket_name), "/mnt/%s" % mount_name)
than register delta folder as table:
%sql
CREATE TABLE `t`
USING delta
LOCATION '/tmp/delta'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 12:56 AM
You provide the data by using a hive metastore. This is basically putting a semantic view on top of your files so the data is served as a classic table. (f.e. saveAsTable or CREATE TABLE)
A Sql endpoint can access this hive metastore.
Do not forget to set up the data access (the sql endpoint needs access to the data with a service principal).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 01:03 AM
Thanks @Werner Stinckens for your response
But the question is that if in future I would like to move from databricks and the delta tables stored in dbfs (at backend in s3 using parquet file format)
How will I be able to retrieve it from the files in s3?
if instead of using dbfs (saveAsTable stores delta table in dbfs) to store delta tables, how we can use s3 directly and retrive the tables via sql endpoint
Or how to connect that sql endpoint to directly point to s3 for the tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 01:44 AM
Ok I get it.
In dbfs you have the option to use managed tables (data is managed by the databricks workspace) or unmanaged tables (data resides in an external storage like S3, ADLS etc).
So in your case you would be looking at unmanaged tables.
The SQL endpoint of course needs a table interface, so you still have to define tables on top of your data.
But those are only metadata.
You could even opt for an external metastore.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 02:25 AM
First mount bucket:
%python
aws_bucket_name = "<aws-bucket-name>"
mount_name = "<mount-name>"
dbutils.fs.mount("s3a://%s:%s@%s" % (access_key, encoded_secret_key, aws_bucket_name), "/mnt/%s" % mount_name)
than register delta folder as table:
%sql
CREATE TABLE `t`
USING delta
LOCATION '/tmp/delta'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2022 09:37 AM
Hey @Athlestan Jain
How are you doing?
Thanks for posting your question. Do you think you were able to resolve the issue?
We'd love to hear from you.

