<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Use SQL Command LIST Volume for Alerts in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/use-sql-command-list-volume-for-alerts/m-p/55125#M1098</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96093"&gt;@gabsylvain&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;thanks for the reply! That could work for us.&amp;nbsp;&lt;/P&gt;&lt;P&gt;You don't even need to use dbutils, SQL in a Notebook works just fine:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.sql(f"LIST '/Volumes/my-catalog/my-schema/my-volume'")
df.write.mode("overwrite").saveAsTable("my_catalog.my_schema.my_volume_metadata_table")&lt;/LI-CODE&gt;&lt;P&gt;We are using SparkConnect and VSCode for development, so using dbutils is always a bit painful in local development. And the Databricks WorkspaceClient() Class does not seem to support listing volumes right now.&lt;/P&gt;&lt;P&gt;Kind regards&lt;/P&gt;&lt;P&gt;Robin&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 12 Dec 2023 07:56:43 GMT</pubDate>
    <dc:creator>RobinK</dc:creator>
    <dc:date>2023-12-12T07:56:43Z</dc:date>
    <item>
      <title>Use SQL Command LIST Volume for Alerts</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/use-sql-command-list-volume-for-alerts/m-p/54896#M1096</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;we have implemented a Databricks Workflow that saves an Excel Sheet to a Databricks Volume. Now we want to notify users with an Alert, when new data arrives in the volume.&lt;/P&gt;&lt;P&gt;In the docs I found the SQL command LIST which returns the columns path, name, size and modification_time for all files located in a volume.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;LIST '/Volumes/my-catalog/my-schema/my-volume'&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;But I cannot find a way to integrate the LIST command into a "normal" SQL Statement, which is required for Databricks Alerts. My approach (which returns an&amp;nbsp;&lt;SPAN&gt;[PARSE_SYNTAX_ERROR])&lt;/SPAN&gt;:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;WITH v AS (
  SELECT *
  FROM LIST '/Volumes/my-catalog/my-schema/my-volume'
)
SELECT *
FROM v&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Dec 2023 07:15:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/use-sql-command-list-volume-for-alerts/m-p/54896#M1096</guid>
      <dc:creator>RobinK</dc:creator>
      <dc:date>2023-12-08T07:15:45Z</dc:date>
    </item>
    <item>
      <title>Re: Use SQL Command LIST Volume for Alerts</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/use-sql-command-list-volume-for-alerts/m-p/55073#M1097</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/66027"&gt;@RobinK&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;I've tested your code and I was able to reproduce the error. Unfortunately, I haven't found a pure SQL alternative to selecting the results of the LIST command as part of a subquery or CTE, and create an alert based on that.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Fortunately, the content of a Volume (with the info you need, i.e. path, modification time, etc.) can be obtained via other means.&lt;/P&gt;
&lt;P&gt;My assumption here is that not every Workflow run will ingest a new Excel sheet in the Volume? And that is why you need additional logic to know when to send the alert? If that is the case, then my proposition to you is the following:&lt;/P&gt;
&lt;P&gt;First, you can use Databricks Utils to list the content of the volume in a Python notebook:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;dbutils.fs.ls("/Volumes/&amp;lt;catalog&amp;gt;/&amp;lt;schema&amp;gt;/&amp;lt;volume&amp;gt;/")&lt;/LI-CODE&gt;
&lt;P&gt;Then, you can iterate through the files and get each element's information into a JSON object:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;data = [{"path": x.path, "name": x.name, "size": x.size, "modificationTime": x.modificationTime} for x in files_in_volume]&lt;/LI-CODE&gt;
&lt;P&gt;And finally, you can create a dataframe from this JSON object and save it as a table:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;df = spark.createDataFrame(data)
df.write.mode("overwrite").saveAsTable("&amp;lt;catalog_name&amp;gt;.&amp;lt;schema_name&amp;gt;.&amp;lt;table_name&amp;gt;")&lt;/LI-CODE&gt;
&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":counterclockwise_arrows_button:"&gt;🔄&lt;/span&gt; This could be ran as an additional step to your Databricks Workflow, so that every time the workflow runs, the data table containing the volume content is updated as well.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt;&amp;nbsp;Once the results are loaded in a table, you can &lt;STRONG&gt;run SELECT queries&lt;/STRONG&gt; on it and therefore you can &lt;STRONG&gt;set alerts&lt;/STRONG&gt; as well.&lt;/P&gt;
&lt;P&gt;If, on the contrary, a new Excel sheet is loaded in the Volume everytime the Workflow runs, then you could simply &lt;A href="https://docs.databricks.com/en/workflows/jobs/job-notifications.html#add-email-and-system-notifications-for-job-events" target="_self"&gt;set an alert on success directly within your workflow&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Hope that can be useful to you.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Gab&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2023 18:02:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/use-sql-command-list-volume-for-alerts/m-p/55073#M1097</guid>
      <dc:creator>gabsylvain</dc:creator>
      <dc:date>2023-12-11T18:02:20Z</dc:date>
    </item>
    <item>
      <title>Re: Use SQL Command LIST Volume for Alerts</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/use-sql-command-list-volume-for-alerts/m-p/55125#M1098</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96093"&gt;@gabsylvain&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;thanks for the reply! That could work for us.&amp;nbsp;&lt;/P&gt;&lt;P&gt;You don't even need to use dbutils, SQL in a Notebook works just fine:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.sql(f"LIST '/Volumes/my-catalog/my-schema/my-volume'")
df.write.mode("overwrite").saveAsTable("my_catalog.my_schema.my_volume_metadata_table")&lt;/LI-CODE&gt;&lt;P&gt;We are using SparkConnect and VSCode for development, so using dbutils is always a bit painful in local development. And the Databricks WorkspaceClient() Class does not seem to support listing volumes right now.&lt;/P&gt;&lt;P&gt;Kind regards&lt;/P&gt;&lt;P&gt;Robin&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2023 07:56:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/use-sql-command-list-volume-for-alerts/m-p/55125#M1098</guid>
      <dc:creator>RobinK</dc:creator>
      <dc:date>2023-12-12T07:56:43Z</dc:date>
    </item>
  </channel>
</rss>

