<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70711#M34133</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105689"&gt;@mjar&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Okay, DBR version should not be an issue then.&lt;BR /&gt;Could you share a code snippet here?&lt;/P&gt;</description>
    <pubDate>Mon, 27 May 2024 11:42:19 GMT</pubDate>
    <dc:creator>daniel_sahal</dc:creator>
    <dc:date>2024-05-27T11:42:19Z</dc:date>
    <item>
      <title>ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70580#M34098</link>
      <description>&lt;P&gt;Recently we have run into an issue using foreachBatch after upgrading our Databricks cluster on Azure to a runtime version 14 with Spark 3.5 with Shared access mode and Unity catalogue.&lt;BR /&gt;The issue was manifested by ModuleNotFoundError error being thrown whenever we call a function from foreachBatch, which uses an object, which is not declared within the scope of a given function, but it is declared in another module.&lt;/P&gt;&lt;PRE&gt;&lt;SPAN&gt;SparkConnectGrpcException: (org.apache.spark.api.python.StreamingPythonRunner$StreamingPythonRunnerInitializationException) &lt;BR /&gt;[STREAMING_PYTHON_RUNNER_INITIALIZATION_FAILURE] Streaming Runner initialization failed, returned -2. &lt;BR /&gt;Cause: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 193, &lt;BR /&gt;in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 571, &lt;BR /&gt;in loads return cloudpickle.loads(obj, encoding=encoding) ModuleNotFoundError: No module named 'foreach_batch_test'&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;So, after&amp;nbsp;&lt;SPAN&gt;banging my head against the wall for some time&lt;/SPAN&gt;&lt;SPAN&gt;, I finally acknowledged that this could be a bug in Databricks&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;BR /&gt;While compiling the report, everything started to work again today??&lt;BR /&gt;Can anyone provide some details about what happened?&lt;BR /&gt;Cheers, thanks&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2024 10:15:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70580#M34098</guid>
      <dc:creator>mjar</dc:creator>
      <dc:date>2024-05-24T10:15:57Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70678#M34120</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105689"&gt;@mjar&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Which DBR are you using? I mean, exactly.&lt;BR /&gt;To use&amp;nbsp;foreachBatch in shared clusters you need at least 14.2&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2024 07:53:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70678#M34120</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2024-05-27T07:53:37Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70682#M34124</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/79106"&gt;@daniel_sahal&lt;/a&gt;,&amp;nbsp;thanks for getting back.&lt;BR /&gt;We are using 14.3, Spark 3.5.0, Scala 2.12&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2024 08:09:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70682#M34124</guid>
      <dc:creator>mjar</dc:creator>
      <dc:date>2024-05-27T08:09:04Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70711#M34133</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105689"&gt;@mjar&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Okay, DBR version should not be an issue then.&lt;BR /&gt;Could you share a code snippet here?&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2024 11:42:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70711#M34133</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2024-05-27T11:42:19Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70735#M34137</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;BR /&gt;Below you can find the minimal code to reproduce the scenario. which used to cause the error.&lt;BR /&gt;Do remember that this suddenly started to work as expected, while it used to fail prior to me posting this topic.&lt;BR /&gt;In any case, a few words on what we are doing.&lt;BR /&gt;We need &lt;SPAN&gt;streaming query to be processed using the provided function in foreachBatch, where this function should be configurable (i.e. we need to pass an object with some configuration args to it).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In the below example we simulate this by using higher order function which takes an instance of&amp;nbsp;SomeConfiguration.&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import col

class SomeConfiguration():    
    def __init__(self, name: str):
        self.name = name

def process_batch(config: SomeConfiguration):
    def say_hello_foreach_microbatch(micro_batch_df: DataFrame, micro_batch_id):
        print(f"Hello {config.name}!")
        print(
            f"The batch {micro_batch_id} has {micro_batch_df.count()} items.")      

    return say_hello_foreach_microbatch


def main():
    spark = SparkSession.builder.getOrCreate()

    data_stream = (
        spark.readStream.format("delta")
        .option("readChangeFeed", "true")
        .option("ignoreChanges", "true")
        .table("SOME_DELTA_TABLE")
        .filter(col("status") == "Staged")
        .filter(col("_change_type") == "insert")
    )

    data_stream.writeStream \
        .option(
            "checkpointLocation",
            f"SOME_CHECK_POINT_LOCATION",
        ) \
        .foreachBatch(process_batch(SomeConfiguration("Johnny"))) \
        .outputMode("append") \
        .trigger(availableNow=True) \
        .start()\
        .awaitTermination()


if __name__ == '__main__':
    main()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The above code used to fail on a line, which actually references the&amp;nbsp;&lt;BR /&gt;an instance of SomeConfiguration object, i.e.&amp;nbsp;print(f"Hello {&lt;STRONG&gt;config.name&lt;/STRONG&gt;}!") inside&amp;nbsp;say_hello_foreach_microbatch function&lt;STRONG&gt;.&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Same code started to work fine all of a sudden, despite the fact that there were no obvious changes to a cluster and definitely no changes to our code.&lt;/P&gt;&lt;P&gt;I was just curious if anyone new anything.&lt;/P&gt;&lt;P&gt;In this case it went from bad to better, but I am bit concerned if cluster can change behaviour without our control nor any official release from good to bad.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 May 2024 13:17:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/70735#M34137</guid>
      <dc:creator>mjar</dc:creator>
      <dc:date>2024-05-27T13:17:32Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/75585#M34993</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105689"&gt;@mjar&lt;/a&gt;&amp;nbsp;I have exactly the same issue... found any solution meanwhile?&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jun 2024 12:03:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/75585#M34993</guid>
      <dc:creator>Nastia</dc:creator>
      <dc:date>2024-06-24T12:03:06Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/75674#M35022</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103694"&gt;@Nastia&lt;/a&gt;&amp;nbsp;unfortunately I don't have any answers yet.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have one channel opened with Databricks though, but no news yet.&lt;BR /&gt;On plus side (well for us) the workflows still work as expected since the magic fix occurred in our environments.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jun 2024 08:35:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/75674#M35022</guid>
      <dc:creator>mjar</dc:creator>
      <dc:date>2024-06-25T08:35:59Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/87397#M37432</link>
      <description>&lt;P&gt;I am facing this issue with Scala Spark streaming in shared cluster with 15.4 LTS run time. Is there any fix or alternative for this. I can't used assigned cluster as my table has masked columns and my company hasn't enabled serverless yet in our workspaces&lt;/P&gt;</description>
      <pubDate>Mon, 02 Sep 2024 19:35:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/87397#M37432</guid>
      <dc:creator>ananddanny</dc:creator>
      <dc:date>2024-09-02T19:35:56Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/111771#M43995</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any news regarding that issue? I have the same one on job cluster with 15.4 LTS when using asset bundles with foreachBatch&amp;nbsp; in .py file and call it from notebook. When the same code is located in notebook - it works file.&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;(prep_silver_df(bronze_table_fqn_df)&lt;BR /&gt;    .writeStream&lt;BR /&gt;    .trigger(&lt;SPAN&gt;availableNow&lt;/SPAN&gt;=&lt;SPAN&gt;True&lt;/SPAN&gt;)&lt;BR /&gt;    .foreachBatch(&lt;SPAN&gt;lambda &lt;/SPAN&gt;df, batchId: upsertToDelta(df, batchId, silver_table_fqn))&lt;BR /&gt;    .option(&lt;SPAN&gt;"checkpointLocation"&lt;/SPAN&gt;, silver_checkpoint_path)&lt;BR /&gt;    .outputMode(&lt;SPAN&gt;"update"&lt;/SPAN&gt;)&lt;BR /&gt;    .start()&lt;BR /&gt;    .awaitTermination()&lt;BR /&gt;    )&lt;/PRE&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 05 Mar 2025 00:09:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/111771#M43995</guid>
      <dc:creator>Abond</dc:creator>
      <dc:date>2025-03-05T00:09:33Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/112127#M44118</link>
      <description>&lt;P&gt;No news here, although everything works fine on our clusters.&lt;/P&gt;</description>
      <pubDate>Mon, 10 Mar 2025 06:54:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/112127#M44118</guid>
      <dc:creator>mjar</dc:creator>
      <dc:date>2025-03-10T06:54:40Z</dc:date>
    </item>
    <item>
      <title>Re: ModuleNotFoundError when using foreachBatch on runtime 14 with Unity</title>
      <link>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/119123#M45801</link>
      <description>&lt;P&gt;I am having the same issue using serverless compute. I think the issue comes from this documentation limitations&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/structured-streaming/foreach#behavior-changes-for-foreachbatch-in-databricks-runtime-140" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/structured-streaming/foreach#behavior-changes-for-foreachbatch-in-databricks-runtime-140&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 14 May 2025 06:43:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/modulenotfounderror-when-using-foreachbatch-on-runtime-14-with/m-p/119123#M45801</guid>
      <dc:creator>dataeng42io</dc:creator>
      <dc:date>2025-05-14T06:43:41Z</dc:date>
    </item>
  </channel>
</rss>

