<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unit Testing with the new Databricks Connect in Python in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66399#M33101</link>
    <description>&lt;P&gt;I'd not use databricks connect/spark connect in that case.&lt;BR /&gt;Instead run spark locally.&amp;nbsp; Of course you will not have databricks specific tools (like dbutils etc)&lt;/P&gt;</description>
    <pubDate>Wed, 17 Apr 2024 07:28:40 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2024-04-17T07:28:40Z</dc:date>
    <item>
      <title>Unit Testing with the new Databricks Connect in Python</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66386#M33097</link>
      <description>&lt;P&gt;I would like to create a regular PySpark session in an isolated environment against which I can run my Spark based tests. I don't see how that's possible with the new Databricks Connect. I'm going in circles here, is it even possible?&lt;/P&gt;&lt;P&gt;I don't want to connect to some cluster or anywhere really. I want to be able to run my tests as per usual, without access to internet.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Apr 2024 20:48:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66386#M33097</guid>
      <dc:creator>cosminsanda</dc:creator>
      <dc:date>2024-04-16T20:48:23Z</dc:date>
    </item>
    <item>
      <title>Re: Unit Testing with the new Databricks Connect in Python</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66399#M33101</link>
      <description>&lt;P&gt;I'd not use databricks connect/spark connect in that case.&lt;BR /&gt;Instead run spark locally.&amp;nbsp; Of course you will not have databricks specific tools (like dbutils etc)&lt;/P&gt;</description>
      <pubDate>Wed, 17 Apr 2024 07:28:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66399#M33101</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-04-17T07:28:40Z</dc:date>
    </item>
    <item>
      <title>Re: Unit Testing with the new Databricks Connect in Python</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66416#M33108</link>
      <description>&lt;P&gt;Problem is that I don't see how you can have both spark native and Databricks Connect (Spark Connect). The guidelines suggest one or the other, which is a bit of a pickle.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Apr 2024 08:38:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66416#M33108</guid>
      <dc:creator>cosminsanda</dc:creator>
      <dc:date>2024-04-17T08:38:28Z</dc:date>
    </item>
    <item>
      <title>Re: Unit Testing with the new Databricks Connect in Python</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66418#M33109</link>
      <description>&lt;P&gt;you could try to separate the environments f.e. using containers/vm's.&amp;nbsp;&lt;BR /&gt;Probably there are other ways too, but these immediately came to mind.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Apr 2024 10:18:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66418#M33109</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-04-17T10:18:48Z</dc:date>
    </item>
    <item>
      <title>Re: Unit Testing with the new Databricks Connect in Python</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66555#M33150</link>
      <description>&lt;P&gt;Ok, so the best solution as it stands today (for me personally at least) is this:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Have &lt;STRONG&gt;pyspark&lt;/STRONG&gt;&amp;nbsp;&lt;EM&gt;^3.4&lt;/EM&gt; installed with the &lt;STRONG&gt;connect&lt;/STRONG&gt; extra feature.&lt;/LI&gt;&lt;LI&gt;My unit tests then don't have to change at all, as they use the regular spark session created on the fly&lt;/LI&gt;&lt;LI&gt;For running the script locally while taking advantage of Databricks, I use the open source&amp;nbsp;&lt;STRONG&gt;Spark Connect&lt;/STRONG&gt; and then set the&amp;nbsp;&lt;STRONG&gt;SPARK_REMOTE=sc://${WORKSPACE_INSTANCE_NAME}:443/;token=${PERSONAL_ACCESS_TOKEN};x-databricks-cluster-id=${CLUSTER_ID}&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Thu, 18 Apr 2024 08:16:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/66555#M33150</guid>
      <dc:creator>cosminsanda</dc:creator>
      <dc:date>2024-04-18T08:16:42Z</dc:date>
    </item>
    <item>
      <title>Re: Unit Testing with the new Databricks Connect in Python</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/70849#M34168</link>
      <description>&lt;P&gt;Given this doesn't work on serverless compute, aren't those tests very slow to complete due to the compute startup time? I'm trying to steer away from databricks connect for unit testing for this reason. If they supported serverless, that would be a different story.&lt;/P&gt;</description>
      <pubDate>Tue, 28 May 2024 06:51:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-testing-with-the-new-databricks-connect-in-python/m-p/70849#M34168</guid>
      <dc:creator>thibault</dc:creator>
      <dc:date>2024-05-28T06:51:51Z</dc:date>
    </item>
  </channel>
</rss>

