01-26-2024 01:47 AM
Hi team,
I wanted to know if there is a way to connect a piece of my pyspark code running in ECS to Databricks cluster and leverage the databricks compute using Databricks connect?
I see Databricks connect is for connecting local ide code to databricks cluster, but do we have a way to connect code running in ecs with databricks?
01-29-2024 01:53 AM
Hi @Surajv, With Databricks Connect, you can seamlessly connect your PySpark code, running in ECS, to a Databricks cluster. Not only does Databricks Connect enable you to secure your preferred programming language to a Databricks cluster, but it also allows for the execution of Spark commands from a variety of environments, including IDEs, notebooks, and custom applications.
Python code executes on your local machine, but for DataFrame tasks, PySpark code operates within a remote Databricks workspace cluster. The results are then transmitted back to the local user. In order to properly use this feature, input your specific <workspace-instance-name>, <access-token-value>, and <cluster-id>. The <access-token-value> is a personal access token for authenticating with Databricks.
Before starting, verify compatibility between Databricks Connect and Databricks Runtime versions. These instructions are tailored for Databricks Runtime versions 13.0 and above. However, remember that although Databricks Connect allows you to write and run Spark code on your Databricks clusters, certain limitations may depend on your setup and requirements.
As always, it's important to note these considerations to maximise this functionality.
01-31-2024 12:53 AM
01-29-2024 08:36 AM
In addition to the answer from @Kaniz_Fatma I would also add that your result set that would come back from a Databricks query may be too large to process in-memory on your ECS container node. Spark often excels when it comes to asynchronous workloads, not immediate result sets.
If you could briefly explain your use-case it would help to make a better recommendation.
01-31-2024 12:52 AM
01-30-2024 07:07 PM - edited 01-30-2024 07:10 PM
Noted @Kaniz_Fatma @RonDeFreitas.
I am currently using Databricks runtime v12.2 (which is < v13.0). I followed this doc (Databricks Connect for Databricks Runtime 12.2 LTS and below) and connected my local terminal to Databricks cluster and was able to execute a sample spark code utilising my cluster compute from the terminal. Parallelly was also able to execute code on remote jupyter notebook following docs.
Though I have a 1 questions regarding this.
Current architecture of our system for context:
Question(s):
Approach(s):
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group