cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to run spark sql file through Azure Databricks

amama
New Contributor II

We have a process that will write spark sql to a file, this process will generate thousands of spark sql files in the production environment.
These files will be created in the ADLS Gen2 directory.

sample spark file

---
val 2023_I = spark.sql("select rm.* from reu_master rm where rm.year = 2023 and rm.system_part='I'")
val criteria1_r1 = 2023_I.filter("field_id"==="nknk" or "field_id"==="gei")
criteria1_r1.write.mode("overwrite").save(path_to_adls_dir)

--------

We are exploring the best way to invoke these files from Azure Databricks. We would like to avoid reading files through Python to a variable and use this variable in the spark sql statement.

4 REPLIES 4

shan_chandra
Honored Contributor III
Honored Contributor III

@amama - you can mount the ADLS storage location in databricks. Since, this is a scala code, you can use workflow and create tasks to execute these scala code by providing the input as the mount location. 

amama
New Contributor II

@shan_chandra  -  The workflow is implemented in Azure Data Factory, the process (Map Reduce) which we are planning to replace with Databricks notebook will be invoked by ADF.

Essentially, we would like to call all these scripts (pig equivalent spark scripts) through a notebook, and this notebook will be an activity in ADF.

shan_chandra
Honored Contributor III
Honored Contributor III

@amama - using Databricks Notebook Activity in ADF, kindly invoke these individual scripts as an individual notebook by specifying notebook path and configure the Databricks linked service in ADF. 

Kaniz
Community Manager
Community Manager

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 
 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.