<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: autoloader break on migration from community to trial premium with s3 mount in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-break-on-migration-from-community-to-trial-premium/m-p/4916#M1491</link>
    <description>&lt;P&gt;Hi @Joe Gorse​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 19 May 2023 06:34:52 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2023-05-19T06:34:52Z</dc:date>
    <item>
      <title>autoloader break on migration from community to trial premium with s3 mount</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-break-on-migration-from-community-to-trial-premium/m-p/4914#M1489</link>
      <description>&lt;P&gt;in dbx community edition, the autoloader works using the s3 mount. s3 mount, autoloader:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;dbutils.fs.mount(f"s3a://{access_key}:{encoded_secret_key}@{aws_bucket_name}", f"/mnt/{mount_name}&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql import SparkSession
from pyspark.sql.functions import from_json, col
&amp;nbsp;
source_directory = 'dbfs:/mnt/s3-mnt/logs/$aws/things/device/data'
destination_directory = "dbfs:/mnt/s3-mnt/data/davis/delta/data"
checkpoint_path =       "dbfs:/mnt/s3-mnt/data/davis/delta/data_checkpoint"
&amp;nbsp;
# switched to data_schema2 at s3 timestamp object 1682389110770
# added ac.Timestamp grab
schema = data_schema2
&amp;nbsp;
streaming_query = (spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("cloudFiles.schemaEvolutionMode", "rescue")
    # .option("source", "s3://joe-open/")
#     .option("cloudFiles.schemaLocation", checkpoint_path)
    .schema(schema)
    .option("rescuedDataColumn", "_rescued_data")
    .load(source_directory)
&amp;nbsp;
    .writeStream
    .format("delta")
    .option("path", destination_directory)
    .option("checkpointLocation", checkpoint_path)
    .option("cloudFiles.schemaEvolutionMode", "True")
    .option("mergeSchema", "true")
&amp;nbsp;
    .trigger(availableNow=True)
    .start()
)
&amp;nbsp;
streaming_query.awaitTermination()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;In premium trial, it fails with&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
StreamingQueryException                   Traceback (most recent call last)
File &amp;lt;command-3092456776679220&amp;gt;:38
     15 schema = data_schema2
     17 streaming_query = (spark.readStream
     18     .format("cloudFiles")
     19     .option("cloudFiles.format", "json")
   (...)
     35     .start()
     36 )
---&amp;gt; 38 streaming_query.awaitTermination()
&amp;nbsp;
File /databricks/spark/python/pyspark/sql/streaming/query.py:201, in StreamingQuery.awaitTermination(self, timeout)
    199     return self._jsq.awaitTermination(int(timeout * 1000))
    200 else:
--&amp;gt; 201     return self._jsq.awaitTermination()
&amp;nbsp;
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-&amp;gt; 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):
&amp;nbsp;
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:168, in capture_sql_exception.&amp;lt;locals&amp;gt;.deco(*a, **kw)
    164 converted = convert_exception(e.java_exception)
    165 if not isinstance(converted, UnknownException):
    166     # Hide where the exception came from that shows a non-Pythonic
    167     # JVM exception message.
--&amp;gt; 168     raise converted from None
    169 else:
    170     raise
&amp;nbsp;
StreamingQueryException: [STREAM_FAILED] Query [id = ba24256e-c098-4c9c-9672-a96898104770, runId = b9037af2-98b8-4669-944f-7559adac1b57] terminated with exception: The bucket in the file event `{"backfill":{"bucket":"dbfsv1-files","key":"mnt/s3-mnt/logs/$aws/things/device/data/1682993996652","size":12304,"eventTime":1682993997000}}` is different from expected by the source: `[s3 bucket name]`.
...
NOTE: [s3 bucket name] is my scrubbing of the s3 bucket name.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;What does it mean? How do I resume autoloading from community to paid dbx?&lt;/P&gt;</description>
      <pubDate>Tue, 02 May 2023 13:12:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-break-on-migration-from-community-to-trial-premium/m-p/4914#M1489</guid>
      <dc:creator>jhgorse</dc:creator>
      <dc:date>2023-05-02T13:12:12Z</dc:date>
    </item>
    <item>
      <title>Re: autoloader break on migration from community to trial premium with s3 mount</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-break-on-migration-from-community-to-trial-premium/m-p/4915#M1490</link>
      <description>&lt;P&gt;Solved with:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.conf.set("spark.databricks.cloudFiles.checkSourceChanged", False)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Nothing else seemed to work. Including the path rename Autoloader option.&lt;/P&gt;</description>
      <pubDate>Tue, 02 May 2023 14:49:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-break-on-migration-from-community-to-trial-premium/m-p/4915#M1490</guid>
      <dc:creator>jhgorse</dc:creator>
      <dc:date>2023-05-02T14:49:24Z</dc:date>
    </item>
    <item>
      <title>Re: autoloader break on migration from community to trial premium with s3 mount</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-break-on-migration-from-community-to-trial-premium/m-p/4916#M1491</link>
      <description>&lt;P&gt;Hi @Joe Gorse​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 19 May 2023 06:34:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-break-on-migration-from-community-to-trial-premium/m-p/4916#M1491</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-05-19T06:34:52Z</dc:date>
    </item>
  </channel>
</rss>

