<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Autoloader cleansource option does not take any effect in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/136479#M50575</link>
    <description>&lt;P&gt;Any Solution found??&lt;/P&gt;</description>
    <pubDate>Tue, 28 Oct 2025 22:43:55 GMT</pubDate>
    <dc:creator>SanthoshU</dc:creator>
    <dc:date>2025-10-28T22:43:55Z</dc:date>
    <item>
      <title>Autoloader cleansource option does not take any effect</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/123436#M47008</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;I was very keen to try out the Autoloader's new cleanSource option so we can clean up our landing folder easily.&lt;/P&gt;&lt;P&gt;However I found out it does not have any effect whatsoever. As I cannot create a support case I am creating this post.&lt;/P&gt;&lt;P&gt;A simple streaming job such as&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = (
    spark
    .readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("cloudFiles.schemaLocation", "s3://my_bucket/data_schema")
    .option("cloudFiles.schemaEvolutionMode", "addNewColumns")
    .option("cloudFiles.inferColumnTypes", "true")
    .option("cloudFiles.cleanSource", "move")
    .option("cloudFiles.cleanSource.retentionDuration", "2 minutes")
    .option("cloudFiles.cleanSource.moveDestination", "s3://my_bucket/data_moved")
    .load("s3://my_bucket/data"))
)


(
    df
    .writeStream
    .format("delta")
    .option("checkpointLocation", "s3://my_bucket/data_checkpoint")
    .option("mergeSchema", "true")
    .queryName("abcd")
    .outputMode("append")
    .trigger(processingTime="10 seconds")
    .table("dev.bronze.tmp_cloud_files_testing")
)&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;ingests the data put into s3://my_bucket/data without issues, however the data is never moved as specified. Even when waiting for hours. I've tested few files as well as thousands of files.&lt;/P&gt;&lt;P&gt;I've tested s3 locations, dbfs locations as well as local locations. Autoloader has permissions to write to the locations as I could easily stream into parquet files.&lt;/P&gt;&lt;P&gt;What I find especially suspicious is that&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SELECT * FROM cloud_files_state("s3://my_bucket/data_checkpoint")&lt;/LI-CODE&gt;&lt;P&gt;shows there is no archive_mode set for the data.&lt;/P&gt;&lt;P&gt;Setting cloudFiles.cleanSource to DELETE does also not do anything.&lt;/P&gt;&lt;P&gt;All tested on clusters running DBR 16.4 or DBR 17.&lt;/P&gt;&lt;P&gt;Do we have to turn this feature on somewhere or did I implement something incorrectly? I really do not know what to try next.&lt;/P&gt;&lt;P&gt;Thank you&lt;BR /&gt;Jan&lt;/P&gt;</description>
      <pubDate>Tue, 01 Jul 2025 12:27:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/123436#M47008</guid>
      <dc:creator>janm2</dc:creator>
      <dc:date>2025-07-01T12:27:11Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader cleansource option does not take any effect</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/123473#M47015</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/173085"&gt;@janm2&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Could you try to replace your code with uppercase "MOVE" (or "DELETE" depending on which you want to use)?&lt;BR /&gt;I know that it sounds silly, but I've encounter several times cases where case-sensitive was shooting me in face &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;And since you wrote that&lt;SPAN&gt;&amp;nbsp;there is no archive_mode set for the data, that makes me wonder if this is due to this reason &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;.option("cloudfiles.cleanSource", "MOVE")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Jul 2025 14:19:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/123473#M47015</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-01T14:19:31Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader cleansource option does not take any effect</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/123488#M47021</link>
      <description>&lt;P&gt;Hello, thank you for your reply.&lt;BR /&gt;&lt;BR /&gt;I have tried this before and unfortunately this was not it.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Jul 2025 15:37:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/123488#M47021</guid>
      <dc:creator>janm2</dc:creator>
      <dc:date>2025-07-01T15:37:30Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader cleansource option does not take any effect</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/123544#M47029</link>
      <description>&lt;P&gt;Maybe a bug, here you can find same issue:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/autoloader-move-file-to-archive-immediately-after-processing/td-p/120692" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/autoloader-move-file-to-archive-immediately-after-processing/td-p/120692&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Jul 2025 20:54:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/123544#M47029</guid>
      <dc:creator>Pat</dc:creator>
      <dc:date>2025-07-01T20:54:21Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader cleansource option does not take any effect</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/136479#M50575</link>
      <description>&lt;P&gt;Any Solution found??&lt;/P&gt;</description>
      <pubDate>Tue, 28 Oct 2025 22:43:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/136479#M50575</guid>
      <dc:creator>SanthoshU</dc:creator>
      <dc:date>2025-10-28T22:43:55Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader cleansource option does not take any effect</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/136480#M50576</link>
      <description>&lt;P&gt;Any Solution ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Oct 2025 22:44:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/136480#M50576</guid>
      <dc:creator>SanthoshU</dc:creator>
      <dc:date>2025-10-28T22:44:44Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader cleansource option does not take any effect</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/149301#M53065</link>
      <description>&lt;P&gt;I had the same issue, which was caused by colons in the filenames.&amp;nbsp; It quietly failed in the app, but log4j contained warnings like this:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;26/02/20 07:11:07 WARN CleanSourceFileMover: [queryId = f0e53] Unexpected exception when cleaning: /Volumes/prod/datalake/raw/source/entity/json_files/2026-02-19_07:12:03.924349.json&lt;BR /&gt;java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: 2026-02-19_07:12:03.924349.json&lt;/P&gt;&lt;P&gt;Changing the : to - (or some other character) resolved the problem.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Feb 2026 16:02:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-cleansource-option-does-not-take-any-effect/m-p/149301#M53065</guid>
      <dc:creator>awhorton</dc:creator>
      <dc:date>2026-02-25T16:02:02Z</dc:date>
    </item>
  </channel>
</rss>

