โ04-16-2024 01:48 PM
I would like to create a regular PySpark session in an isolated environment against which I can run my Spark based tests. I don't see how that's possible with the new Databricks Connect. I'm going in circles here, is it even possible?
I don't want to connect to some cluster or anywhere really. I want to be able to run my tests as per usual, without access to internet.
โ04-18-2024 01:16 AM
Ok, so the best solution as it stands today (for me personally at least) is this:
โ04-17-2024 12:28 AM
I'd not use databricks connect/spark connect in that case.
Instead run spark locally. Of course you will not have databricks specific tools (like dbutils etc)
โ04-17-2024 01:38 AM
Problem is that I don't see how you can have both spark native and Databricks Connect (Spark Connect). The guidelines suggest one or the other, which is a bit of a pickle.
โ04-17-2024 03:18 AM
you could try to separate the environments f.e. using containers/vm's.
Probably there are other ways too, but these immediately came to mind.
โ04-18-2024 01:16 AM
Ok, so the best solution as it stands today (for me personally at least) is this:
โ05-27-2024 11:51 PM
Given this doesn't work on serverless compute, aren't those tests very slow to complete due to the compute startup time? I'm trying to steer away from databricks connect for unit testing for this reason. If they supported serverless, that would be a different story.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group