<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic DLT cloudfiles trigger interval not working in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39289#M26910</link>
    <description>&lt;P&gt;I have the following streaming table definition using cloudfiles format and pipelines.trigger.interval setting to&amp;nbsp;&lt;SPAN&gt;reduce file discovery costs but the query is triggering every 12 seconds instead of every 5 minutes.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is there another configuration I am missing or DLT cloudfiles does not work with that setting?&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dlt.table
def s3_data(        
    spark_conf={"pipelines.trigger.interval" : "5 minutes"},
    table_properties={
        "quality": "bronze",
        "pipelines.reset.allowed": "false" # preserves the data in the delta table if you do full refresh
    }
):
    return (
        spark.readStream.format("cloudFiles")
        .option("cloudFiles.format", "json")
        .load("s3://my-bucket/")
        .withColumn("filePath", input_file_name())
    )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Aug 2023 17:58:00 GMT</pubDate>
    <dc:creator>elifa</dc:creator>
    <dc:date>2023-08-07T17:58:00Z</dc:date>
    <item>
      <title>DLT cloudfiles trigger interval not working</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39289#M26910</link>
      <description>&lt;P&gt;I have the following streaming table definition using cloudfiles format and pipelines.trigger.interval setting to&amp;nbsp;&lt;SPAN&gt;reduce file discovery costs but the query is triggering every 12 seconds instead of every 5 minutes.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is there another configuration I am missing or DLT cloudfiles does not work with that setting?&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dlt.table
def s3_data(        
    spark_conf={"pipelines.trigger.interval" : "5 minutes"},
    table_properties={
        "quality": "bronze",
        "pipelines.reset.allowed": "false" # preserves the data in the delta table if you do full refresh
    }
):
    return (
        spark.readStream.format("cloudFiles")
        .option("cloudFiles.format", "json")
        .load("s3://my-bucket/")
        .withColumn("filePath", input_file_name())
    )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Aug 2023 17:58:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39289#M26910</guid>
      <dc:creator>elifa</dc:creator>
      <dc:date>2023-08-07T17:58:00Z</dc:date>
    </item>
    <item>
      <title>Re: DLT cloudfiles trigger interval not working</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39314#M26915</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/86044"&gt;@elifa&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;P&gt;I have the following streaming table definition using cloudfiles format and pipelines.trigger.interval setting to&amp;nbsp;&lt;SPAN&gt;reduce file discovery costs but the query is triggering every 12 seconds instead of every 5 minutes.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is there another configuration I am missing or DLT cloudfiles does not work with that setting?&amp;nbsp;&amp;nbsp;&lt;A href="https://www.surveyzo.com/sheetz-listens-survey/" target="_self"&gt;&lt;FONT size="1 2 3 4 5 6 7" color="#FFFFFF"&gt;&lt;SPAN&gt;Sheetz Listens&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dlt.table
def s3_data(        
    spark_conf={"pipelines.trigger.interval" : "5 minutes"},
    table_properties={
        "quality": "bronze",
        "pipelines.reset.allowed": "false" # preserves the data in the delta table if you do full refresh
    }
):
    return (
        spark.readStream.format("cloudFiles")
        .option("cloudFiles.format", "json")
        .load("s3://my-bucket/")
        .withColumn("filePath", input_file_name())
    )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;The pipelines.trigger.interval setting is designed to control the discovery interval for new files in the input path when using Delta Lake Time Travel with CloudFiles as the streaming source in Databricks. However, there seems to be an issue with the trigger interval not being honored as expected.&lt;/P&gt;&lt;P&gt;First, verify that you are using the correct syntax for the pipelines.trigger.interval setting. The correct format is "5m", representing 5 minutes, instead of "5 minutes". Update your spark_conf to set the trigger interval as follows:&lt;/P&gt;&lt;P&gt;spark_conf={"pipelines.trigger.interval": "5m"}&lt;/P&gt;&lt;P&gt;If the issue persists, consider checking for any potential limitations or updates in the Databricks version you are using. It is possible that there might be a bug or a compatibility issue in the version you have installed.&lt;/P&gt;&lt;P&gt;Another possible approach is to check the Databricks documentation or forums to see if there are any known issues or workarounds related to the pipelines.trigger.interval setting when using CloudFiles as the streaming source.&lt;/P&gt;&lt;P&gt;In case you are still facing the problem, you may consider reaching out to Databricks support for assistance and further investigation. They can provide specific insights into the behavior of the pipelines.trigger.interval setting with CloudFiles in your Databricks environment and offer guidance on resolving the issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Aug 2023 04:39:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39314#M26915</guid>
      <dc:creator>Timothydickers</dc:creator>
      <dc:date>2023-08-08T04:39:51Z</dc:date>
    </item>
    <item>
      <title>Re: DLT cloudfiles trigger interval not working</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39315#M26916</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/86044"&gt;@elifa&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you check for this message in the log file?&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;INFO EnzymePlanner: Planning for flow: s3_data&lt;/PRE&gt;&lt;P&gt;According to the config pipelines.trigger.interval, the planning should happen once in every 5 minutes.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Aug 2023 04:39:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39315#M26916</guid>
      <dc:creator>Tharun-Kumar</dc:creator>
      <dc:date>2023-08-08T04:39:52Z</dc:date>
    </item>
    <item>
      <title>Re: DLT cloudfiles trigger interval not working</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39356#M26932</link>
      <description>&lt;P&gt;The log below is how I can see that it is running every 12 seconds. I am using the same configuration on other tables that are not cloudfiles format and it works fine on them.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;23/08/08 04:59:00 INFO MicroBatchExecution: Streaming query made progress: {
  "name" : "s3_data",
  "timestamp" : "2023-08-08T04:59:00.005Z",
  "numInputRows" : 0,
  "inputRowsPerSecond" : 0.0,
  "processedRowsPerSecond" : 0.0,
}

23/08/08 04:59:12 INFO MicroBatchExecution: Streaming query made progress: {
  "name" : "s3_data",
  "timestamp" : "2023-08-08T04:59:12.000Z",
  "numInputRows" : 0,
  "inputRowsPerSecond" : 0.0,
  "processedRowsPerSecond" : 0.0,
}

23/08/08 04:59:36 INFO MicroBatchExecution: Streaming query made progress: {
  "name" : "s3_data",
  "timestamp" : "2023-08-08T04:59:36.000Z",
  "numInputRows" : 0,
  "inputRowsPerSecond" : 0.0,
  "processedRowsPerSecond" : 0.0,
}

23/08/08 04:59:48 INFO MicroBatchExecution: Streaming query made progress: {
  "name" : "s3_data",
  "timestamp" : "2023-08-08T04:59:48.002Z",
  "numInputRows" : 0,
  "inputRowsPerSecond" : 0.0,
  "processedRowsPerSecond" : 0.0,
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Aug 2023 12:44:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cloudfiles-trigger-interval-not-working/m-p/39356#M26932</guid>
      <dc:creator>elifa</dc:creator>
      <dc:date>2023-08-08T12:44:01Z</dc:date>
    </item>
  </channel>
</rss>

