cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Hi Experts , I am new to databricks. I want to know how to copy pyspark data into databricks SQL analytics ?

BasavarajAngadi
Contributor

If we use two different clusters one for pyspark code for transformation and one for SQL analytics . how to make permenant tables derived from pyspark code make available for running queries in databricks SQL analytics

1 ACCEPTED SOLUTION

Accepted Solutions

AmanSehgal
Honored Contributor III

You can write your ETL logic in notebooks, run the notebook over a cluster and write the data to a location where your S3 bucket is mounted.

Next, you can register that table with Hive MetaStore and access the same table in Databricks SQL.

To see the table, go to Data tab and select your schema/database to see registered tables.

Two ways to do this:

Option 1:

df.write.option("path",<s3-path-of-table>).saveAsTable(tableName)

Option 2

%python
df.write.save(<s3-path-of-table>)
 
%sql
CREATE TABLE <table-name>
USING DELTA
LOCATION <s3-path-of-table>

:

View solution in original post

7 REPLIES 7

AmanSehgal
Honored Contributor III

You can write your ETL logic in notebooks, run the notebook over a cluster and write the data to a location where your S3 bucket is mounted.

Next, you can register that table with Hive MetaStore and access the same table in Databricks SQL.

To see the table, go to Data tab and select your schema/database to see registered tables.

Two ways to do this:

Option 1:

df.write.option("path",<s3-path-of-table>).saveAsTable(tableName)

Option 2

%python
df.write.save(<s3-path-of-table>)
 
%sql
CREATE TABLE <table-name>
USING DELTA
LOCATION <s3-path-of-table>

:

BasavarajAngadi
Contributor

@Aman Sehgal​  so basically you are telling to write the transformed data from Databricks pyspark into ADLS gen2 and then use Data bricks SQL analytics to do below what you said ...

  1. %sql
  2. CREATE TABLE <table-name>
  3. USING DELTA
  4. LOCATION <s3-path-of-table>

AmanSehgal
Honored Contributor III

Right.. Databricks is a platform to perform transformations.. Ideally your should either mount s3 bucket or ADLS gen 2 location in DBFS..

Read/Write/Update/Delete your data and to run SQL analytics from SQL tab, you'll have to register a table and start an endpoint..

You can also query the data via notebooks by using SQL in a cell. The only difference is, you'll have to spin up a cluster instead of an endpoint.

BasavarajAngadi
Contributor

@Aman Sehgal​  you are making me confused ....we need to spin up the cluster if we use SQL end point right ?

and Can we not use magic commands "%Sql" within same notebook to write the pyspark data to SQL end point as table ?

AmanSehgal
Honored Contributor III

Screen Shot 2022-02-16 at 12.54.55 amWhen you're in Data Engineering tab in workspace, then you need to spin up a cluster. After spinning up the cluster, you can create a notebook and use %sql to write SQL command and query your table.

When you're in SQL tab in workspace, then you need to spin up a SQL Endpoint. After spinning an end point, go to Queries tab and you can write a SQL query to query your tables.

Screen Shot 2022-02-16 at 12.54.37 am 

BasavarajAngadi
Contributor

@Aman Sehgal​  Can we write data from data engineering workspace to SQL end point in databricks?

AmanSehgal
Honored Contributor III

You can write data to a table (eg. default.my_table) and consume data from same table using SQL end point.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.