Re: How to run spark sql file through Azure Databr...

amama · ‎01-24-2024

We have a process that will write spark sql to a file, this process will generate thousands of spark sql files in the production environment.
These files will be created in the ADLS Gen2 directory.

sample spark file

---
val 2023_I = spark.sql("select rm.* from reu_master rm where rm.year = 2023 and rm.system_part='I'")
val criteria1_r1 = 2023_I.filter("field_id"==="nknk" or "field_id"==="gei")
criteria1_r1.write.mode("overwrite").save(path_to_adls_dir)

--------

We are exploring the best way to invoke these files from Azure Databricks. We would like to avoid reading files through Python to a variable and use this variable in the spark sql statement.

shan_chandra · ‎01-24-2024

@amama - you can mount the ADLS storage location in databricks. Since, this is a scala code, you can use workflow and create tasks to execute these scala code by providing the input as the mount location.

amama · ‎01-29-2024

@shan_chandra - The workflow is implemented in Azure Data Factory, the process (Map Reduce) which we are planning to replace with Databricks notebook will be invoked by ADF.

Essentially, we would like to call all these scripts (pig equivalent spark scripts) through a notebook, and this notebook will be an activity in ADF.

shan_chandra · ‎01-29-2024

@amama - using Databricks Notebook Activity in ADF, kindly invoke these individual scripts as an individual notebook by specifying notebook path and configure the Databricks linked service in ADF.

How to run spark sql file through Azure Databricks