How to run spark sql file through Azure Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-24-2024 11:41 AM
We have a process that will write spark sql to a file, this process will generate thousands of spark sql files in the production environment.
These files will be created in the ADLS Gen2 directory.
sample spark file
---
val 2023_I = spark.sql("select rm.* from reu_master rm where rm.year = 2023 and rm.system_part='I'")
val criteria1_r1 = 2023_I.filter("field_id"==="nknk" or "field_id"==="gei")
criteria1_r1.write.mode("overwrite").save(path_to_adls_dir)
--------
We are exploring the best way to invoke these files from Azure Databricks. We would like to avoid reading files through Python to a variable and use this variable in the spark sql statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-24-2024 08:10 PM
@amama - you can mount the ADLS storage location in databricks. Since, this is a scala code, you can use workflow and create tasks to execute these scala code by providing the input as the mount location.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2024 10:43 AM
@shan_chandra - The workflow is implemented in Azure Data Factory, the process (Map Reduce) which we are planning to replace with Databricks notebook will be invoked by ADF.
Essentially, we would like to call all these scripts (pig equivalent spark scripts) through a notebook, and this notebook will be an activity in ADF.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2024 03:01 PM
@amama - using Databricks Notebook Activity in ADF, kindly invoke these individual scripts as an individual notebook by specifying notebook path and configure the Databricks linked service in ADF.