<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to run spark sql file through Azure Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58360#M31110</link>
    <description>&lt;P&gt;We have a process that will write spark sql to a file, this process will generate thousands of spark sql files in the production environment.&lt;BR /&gt;These files will be created in the ADLS Gen2 directory.&lt;/P&gt;&lt;P&gt;sample spark file&lt;/P&gt;&lt;P&gt;---&lt;BR /&gt;val 2023_I = spark.sql("select rm.* from reu_master rm where rm.year = 2023 and rm.system_part='I'")&lt;BR /&gt;val criteria1_r1 = 2023_I.filter("field_id"==="nknk" or "field_id"==="gei")&lt;BR /&gt;criteria1_r1.write.mode("overwrite").save(path_to_adls_dir)&lt;/P&gt;&lt;P&gt;--------&lt;/P&gt;&lt;P&gt;We are exploring the best way to invoke these files from Azure Databricks. We would like to avoid reading files through Python to a variable and use this variable in the spark sql statement.&lt;/P&gt;</description>
    <pubDate>Wed, 24 Jan 2024 19:41:57 GMT</pubDate>
    <dc:creator>amama</dc:creator>
    <dc:date>2024-01-24T19:41:57Z</dc:date>
    <item>
      <title>How to run spark sql file through Azure Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58360#M31110</link>
      <description>&lt;P&gt;We have a process that will write spark sql to a file, this process will generate thousands of spark sql files in the production environment.&lt;BR /&gt;These files will be created in the ADLS Gen2 directory.&lt;/P&gt;&lt;P&gt;sample spark file&lt;/P&gt;&lt;P&gt;---&lt;BR /&gt;val 2023_I = spark.sql("select rm.* from reu_master rm where rm.year = 2023 and rm.system_part='I'")&lt;BR /&gt;val criteria1_r1 = 2023_I.filter("field_id"==="nknk" or "field_id"==="gei")&lt;BR /&gt;criteria1_r1.write.mode("overwrite").save(path_to_adls_dir)&lt;/P&gt;&lt;P&gt;--------&lt;/P&gt;&lt;P&gt;We are exploring the best way to invoke these files from Azure Databricks. We would like to avoid reading files through Python to a variable and use this variable in the spark sql statement.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jan 2024 19:41:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58360#M31110</guid>
      <dc:creator>amama</dc:creator>
      <dc:date>2024-01-24T19:41:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to run spark sql file through Azure Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58380#M31119</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/98755"&gt;@amama&lt;/a&gt;&amp;nbsp;- you can mount the ADLS storage location in databricks. Since, this is a scala code, you can use workflow and create tasks to execute these scala code by providing the input as the mount location.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jan 2024 04:10:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58380#M31119</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2024-01-25T04:10:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to run spark sql file through Azure Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58624#M31217</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/616"&gt;@shan_chandra&lt;/a&gt;&amp;nbsp; -&amp;nbsp; The workflow is implemented in Azure Data Factory, the process (Map Reduce) which we are planning to replace with Databricks notebook will be invoked by ADF.&lt;/P&gt;&lt;P&gt;Essentially, we would like to call all these scripts (pig equivalent spark scripts) through a notebook, and this notebook will be an activity in ADF.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jan 2024 18:43:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58624#M31217</guid>
      <dc:creator>amama</dc:creator>
      <dc:date>2024-01-29T18:43:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to run spark sql file through Azure Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58638#M31222</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/98755"&gt;@amama&lt;/a&gt;&amp;nbsp;- using Databricks Notebook Activity in ADF, kindly invoke these individual scripts as an individual notebook by specifying notebook path and configure the Databricks linked service in ADF.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jan 2024 23:01:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-run-spark-sql-file-through-azure-databricks/m-p/58638#M31222</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2024-01-29T23:01:38Z</dc:date>
    </item>
  </channel>
</rss>

