<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Connect my spark code running in AWS ECS to databricks cluster in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58797#M6432</link>
    <description>&lt;P&gt;I've replied here:&amp;nbsp;&lt;A href="https://community.databricks.com/t5/community-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58770/highlight/true#M3615" target="_blank"&gt;https://community.databricks.com/t5/community-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58770/highlight/true#M3615&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 31 Jan 2024 08:53:02 GMT</pubDate>
    <dc:creator>Surajv</dc:creator>
    <dc:date>2024-01-31T08:53:02Z</dc:date>
    <item>
      <title>Connect my spark code running in AWS ECS to databricks cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58464#M6427</link>
      <description>&lt;P&gt;Hi team,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wanted to know if there is a way to connect a piece of my pyspark code running in ECS to Databricks cluster and leverage the databricks compute using Databricks connect?&lt;/P&gt;&lt;P&gt;I see Databricks connect is for connecting local ide code to databricks cluster, but do we have a way to connect code running in ecs with databricks?&lt;/P&gt;</description>
      <pubDate>Fri, 26 Jan 2024 09:47:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58464#M6427</guid>
      <dc:creator>Surajv</dc:creator>
      <dc:date>2024-01-26T09:47:13Z</dc:date>
    </item>
    <item>
      <title>Re: Connect my spark code running in AWS ECS to databricks cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58613#M6429</link>
      <description>&lt;P&gt;In addition to the answer from&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;I would also add that your result set that would come back from a Databricks query may be too large to process in-memory on your ECS container node. Spark often excels when it comes to asynchronous workloads, not immediate result sets.&lt;/P&gt;&lt;P&gt;If you could briefly explain your use-case it would help to make a better recommendation.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jan 2024 16:36:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58613#M6429</guid>
      <dc:creator>RonDeFreitas</dc:creator>
      <dc:date>2024-01-29T16:36:01Z</dc:date>
    </item>
    <item>
      <title>Re: Connect my spark code running in AWS ECS to databricks cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58770#M6430</link>
      <description>&lt;P&gt;Noted&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/56586"&gt;@RonDeFreitas&lt;/a&gt;.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am currently using &lt;STRONG&gt;Databricks runtime v12.2&lt;/STRONG&gt; (which is &amp;lt; v13.0). I followed this &lt;A href="https://docs.databricks.com/en/dev-tools/databricks-connect-legacy.html" target="_blank" rel="noopener"&gt;doc&lt;/A&gt;&amp;nbsp;(Databricks Connect for Databricks Runtime 12.2 LTS and below) and &lt;STRONG&gt;connected my local terminal to Databricks cluster&lt;/STRONG&gt; and was able to execute a sample spark code utilising my cluster compute from the terminal. Parallelly was also able to execute code on remote jupyter notebook following docs.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Though I have a 1 questions regarding this.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Current architecture of our system&lt;/STRONG&gt; for context:&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;I have python scripts, in a service, triggered via Airflow jobs. These scripts run on ECS (wrapped around Airflow's ECSOperators). Primary job of these scripts is to import data from S3 do some processing and dump it back in S3. Today a lot of this computation is done in numpy/pandas/dask. And we want to move it to pyspark by leveraging Databricks cluster that we have. A rough overview of our goal is to create a connector, this connector will create spark session, and we will rewrite the pandas/dask code with spark. The underlying compute would be databricks-spark compute.&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;We are &lt;STRONG&gt;not inclined going with the approach of using Databricks operators&lt;/STRONG&gt; for now, hence goal is to use Databricks connector and leverage the compute.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;(s):&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;My current databricks version is 12.2. I do see some relevant info regarding how to leverage dataricks-connect with a remote spark in docs for version v13.0+ of databricks-runtime.&lt;STRONG&gt; Is an upgrade necessary&lt;/STRONG&gt;, just confirming?&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Does Airflow version matter in this regard&lt;/STRONG&gt;? Since in case of PoC'ing databricks operators, I got to know, it requires Airflow v2.5+. Our &lt;STRONG&gt;current Airflow version is v2.4.2&lt;/STRONG&gt;. Since, we more inclined towards using databrick-connect and not databricks operators to use the compute, version is not a matter isn't it?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Approach&lt;/STRONG&gt;(s):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;As a part of PoC'ing the approach, I setup latest Airflow locally v2.8.1 and followed the Databricks-connect docs, though I faced issues and realized its probably due to Databricks version 12.2 that we have.&lt;STRONG&gt; I will tweak my approach based on clarification from this question&lt;/STRONG&gt;.&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Wed, 31 Jan 2024 03:10:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58770#M6430</guid>
      <dc:creator>Surajv</dc:creator>
      <dc:date>2024-01-31T03:10:02Z</dc:date>
    </item>
    <item>
      <title>Re: Connect my spark code running in AWS ECS to databricks cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58796#M6431</link>
      <description>&lt;P&gt;Replied here:&amp;nbsp;&lt;A href="https://community.databricks.com/t5/community-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58770/highlight/true#M3615" target="_blank"&gt;https://community.databricks.com/t5/community-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58770/highlight/true#M3615&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jan 2024 08:52:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58796#M6431</guid>
      <dc:creator>Surajv</dc:creator>
      <dc:date>2024-01-31T08:52:11Z</dc:date>
    </item>
    <item>
      <title>Re: Connect my spark code running in AWS ECS to databricks cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58797#M6432</link>
      <description>&lt;P&gt;I've replied here:&amp;nbsp;&lt;A href="https://community.databricks.com/t5/community-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58770/highlight/true#M3615" target="_blank"&gt;https://community.databricks.com/t5/community-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58770/highlight/true#M3615&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jan 2024 08:53:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/connect-my-spark-code-running-in-aws-ecs-to-databricks-cluster/m-p/58797#M6432</guid>
      <dc:creator>Surajv</dc:creator>
      <dc:date>2024-01-31T08:53:02Z</dc:date>
    </item>
  </channel>
</rss>

