<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: InconsistentReadException: The file might have been updated during query - CSV backed table in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/inconsistentreadexception-the-file-might-have-been-updated/m-p/54095#M7396</link>
    <description>&lt;P&gt;One approach I'm testing (positive results so far, but still early).&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;%sql
# Prep and cleanup
REFRESH TABLE masterdata.lookup_host;
DROP TABLE IF EXISTS t_hosts;

# Forcibly cache the needed columns before using the data in another query.  
CACHE TABLE t_hosts
OPTIONS ('storageLevel' 'DISK_ONLY')
SELECT identifier,category,domain
FROM masterdata.lookup_host lh 
WHERE lh.category = 'workstation';

SELECT * FROM t_hosts&lt;/LI-CODE&gt;</description>
    <pubDate>Tue, 28 Nov 2023 12:16:05 GMT</pubDate>
    <dc:creator>hukel</dc:creator>
    <dc:date>2023-11-28T12:16:05Z</dc:date>
    <item>
      <title>InconsistentReadException: The file might have been updated during query - CSV backed table</title>
      <link>https://community.databricks.com/t5/get-started-discussions/inconsistentreadexception-the-file-might-have-been-updated/m-p/54024#M7395</link>
      <description>&lt;P&gt;I have some CSV files that I upload to DBFS storage several times a day.&amp;nbsp;&amp;nbsp; From these CSVs,&amp;nbsp; I have created SQL tables:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;CREATE TABLE IF NOT EXISTS masterdata.lookup_host
USING CSV
OPTIONS (header "true", inferSchema "true")
LOCATION '/mnt/masterdata/assets-lookup.csv';&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This works well for short queries but occasionally a long-running query will be disrupted when the underlying CSV is updated.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;23/11/27 21:09:57 ERROR Executor: Exception in task 2.0 in stage 339.0 (TID 1167)
com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/masterdata/assets-lookup.csv.
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:695)
.....
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.databricks.common.filesystem.InconsistentReadException: The file might have been updated during query execution. Ensure that no pipeline updates existing files during query execution and try again.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This may be the result of this change:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;A href="https://docs.databricks.com/en/release-notes/runtime/13.3lts.html#databricks-runtime-returns-an-error-if-a-file-is-modified-between-query-planning-and-invocation" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/release-notes/runtime/13.3lts.html#databricks-runtime-returns-an-error-if-a-file-is-modified-between-query-planning-and-invocation&lt;/A&gt;&lt;/P&gt;&lt;H3&gt;&lt;A href="https://docs.databricks.com/en/release-notes/runtime/13.3lts.html#id7" target="_blank" rel="noopener"&gt;Databricks Runtime returns an error if a file is modified between query planning and invocation&lt;/A&gt;&lt;/H3&gt;&lt;P class="lia-indent-padding-left-30px"&gt;Databricks Runtime queries now return an error if a file is updated between query planning and invocation. Before this change, Databricks Runtime would read a file between these stages, which occasionally lead to unpredictable results.&lt;/P&gt;&lt;P&gt;Is there a way to accept/ignore this file change and continue query execution?&lt;BR /&gt;&lt;BR /&gt;Is there a better way to keep simple master/lookup data available for SQL joins and subqueries?&lt;/P&gt;</description>
      <pubDate>Mon, 27 Nov 2023 23:01:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/inconsistentreadexception-the-file-might-have-been-updated/m-p/54024#M7395</guid>
      <dc:creator>hukel</dc:creator>
      <dc:date>2023-11-27T23:01:34Z</dc:date>
    </item>
    <item>
      <title>Re: InconsistentReadException: The file might have been updated during query - CSV backed table</title>
      <link>https://community.databricks.com/t5/get-started-discussions/inconsistentreadexception-the-file-might-have-been-updated/m-p/54095#M7396</link>
      <description>&lt;P&gt;One approach I'm testing (positive results so far, but still early).&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;%sql
# Prep and cleanup
REFRESH TABLE masterdata.lookup_host;
DROP TABLE IF EXISTS t_hosts;

# Forcibly cache the needed columns before using the data in another query.  
CACHE TABLE t_hosts
OPTIONS ('storageLevel' 'DISK_ONLY')
SELECT identifier,category,domain
FROM masterdata.lookup_host lh 
WHERE lh.category = 'workstation';

SELECT * FROM t_hosts&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 28 Nov 2023 12:16:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/inconsistentreadexception-the-file-might-have-been-updated/m-p/54095#M7396</guid>
      <dc:creator>hukel</dc:creator>
      <dc:date>2023-11-28T12:16:05Z</dc:date>
    </item>
    <item>
      <title>Re: InconsistentReadException: The file might have been updated during query - CSV backed table</title>
      <link>https://community.databricks.com/t5/get-started-discussions/inconsistentreadexception-the-file-might-have-been-updated/m-p/72008#M7397</link>
      <description>&lt;P&gt;Hello, I want to reopen this issue, since I am facing same error in our production environment and I am not able to solve this and want to ask for help.&lt;/P&gt;&lt;P&gt;Here is the error message I received:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Error while reading file dbfs:/mnt/dynamics/model.json.
Caused by: FileReadException: Error while reading file dbfs:/mnt/dynamics/model.json.
Caused by: InconsistentReadException: The file might have been updated during query execution. Ensure that no pipeline updates existing files during query execution and try again.
Caused by: IOException: Operation failed: "The condition specified using HTTP conditional header(s) is not met.", 412, GET, https://dynamics.dfs.core.windows.net/dataverse/model.json?timeout=90, ConditionNotMet, "The condition specified using HTTP conditional header(s) is not met. RequestId:78fa238b-501f-............ Time:2024-06-06T21:05:33.5168118Z"
Caused by: AbfsRestOperationException: Operation failed: "The condition specified using HTTP conditional header(s) is not met.", 412, GET, https://dynamics.dfs.core.windows.net/dataverse/model.json?timeout=90, ConditionNotMet, "The condition specified using HTTP conditional header(s) is not met. RequestId:78fa238b-............ Time:2024-06-06T21:05:33.5168118Z"&lt;/LI-CODE&gt;&lt;P&gt;So apparently in the processing of a table, the model.json changes - it changed real-time by the source, so this is expected.&lt;/P&gt;&lt;P&gt;What I want is to just ignore that is changed and dont throw error. Just let it take model.json as is in the time of processing.&lt;/P&gt;&lt;P&gt;I tried to cache dataframe as last suggested, but it doesnt work and still throws same error.&lt;/P&gt;&lt;P&gt;This is what I added:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;import org.apache.spark.storage.StorageLevel

val abcd = spark.read.json(somePath/model.json")
abcd.persist(StorageLevel.DISK_ONLY)&lt;/LI-CODE&gt;&lt;P&gt;Please suggest how to deal with this error.&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jun 2024 07:40:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/inconsistentreadexception-the-file-might-have-been-updated/m-p/72008#M7397</guid>
      <dc:creator>Retko</dc:creator>
      <dc:date>2024-06-07T07:40:02Z</dc:date>
    </item>
  </channel>
</rss>

