- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-13-2022 08:43 PM
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-13-2022 10:24 PM
You can write your ETL logic in notebooks, run the notebook over a cluster and write the data to a location where your S3 bucket is mounted.
Next, you can register that table with Hive MetaStore and access the same table in Databricks SQL.
To see the table, go to Data tab and select your schema/database to see registered tables.
Two ways to do this:
Option 1:
df.write.option("path",<s3-path-of-table>).saveAsTable(tableName)
Option 2
%python
df.write.save(<s3-path-of-table>)
%sql
CREATE TABLE <table-name>
USING DELTA
LOCATION <s3-path-of-table>
:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-13-2022 10:24 PM
You can write your ETL logic in notebooks, run the notebook over a cluster and write the data to a location where your S3 bucket is mounted.
Next, you can register that table with Hive MetaStore and access the same table in Databricks SQL.
To see the table, go to Data tab and select your schema/database to see registered tables.
Two ways to do this:
Option 1:
df.write.option("path",<s3-path-of-table>).saveAsTable(tableName)
Option 2
%python
df.write.save(<s3-path-of-table>)
%sql
CREATE TABLE <table-name>
USING DELTA
LOCATION <s3-path-of-table>
:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-14-2022 03:05 AM
@Aman Sehgalโ so basically you are telling to write the transformed data from Databricks pyspark into ADLS gen2 and then use Data bricks SQL analytics to do below what you said ...
- %sql
- CREATE TABLE <table-name>
- USING DELTA
- LOCATION <s3-path-of-table>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-14-2022 05:11 AM
Right.. Databricks is a platform to perform transformations.. Ideally your should either mount s3 bucket or ADLS gen 2 location in DBFS..
Read/Write/Update/Delete your data and to run SQL analytics from SQL tab, you'll have to register a table and start an endpoint..
You can also query the data via notebooks by using SQL in a cell. The only difference is, you'll have to spin up a cluster instead of an endpoint.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-15-2022 01:09 AM
@Aman Sehgalโ you are making me confused ....we need to spin up the cluster if we use SQL end point right ?
and Can we not use magic commands "%Sql" within same notebook to write the pyspark data to SQL end point as table ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-15-2022 05:58 AM
When you're in Data Engineering tab in workspace, then you need to spin up a cluster. After spinning up the cluster, you can create a notebook and use %sql to write SQL command and query your table.
When you're in SQL tab in workspace, then you need to spin up a SQL Endpoint. After spinning an end point, go to Queries tab and you can write a SQL query to query your tables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-18-2022 08:03 AM
@Aman Sehgalโ Can we write data from data engineering workspace to SQL end point in databricks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-19-2022 03:39 PM
You can write data to a table (eg. default.my_table) and consume data from same table using SQL end point.

