We have a process that will write spark sql to a file, this process will generate thousands of spark sql files in the production environment.
These files will be created in the ADLS Gen2 directory.
sample spark file
---
val 2023_I = spark.sql("select rm.* from reu_master rm where rm.year = 2023 and rm.system_part='I'")
val criteria1_r1 = 2023_I.filter("field_id"==="nknk" or "field_id"==="gei")
criteria1_r1.write.mode("overwrite").save(path_to_adls_dir)
--------
We are exploring the best way to invoke these files from Azure Databricks. We would like to avoid reading files through Python to a variable and use this variable in the spark sql statement.