04-16-2024 01:48 PM
I would like to create a regular PySpark session in an isolated environment against which I can run my Spark based tests. I don't see how that's possible with the new Databricks Connect. I'm going in circles here, is it even possible?
I don't want to connect to some cluster or anywhere really. I want to be able to run my tests as per usual, without access to internet.
04-18-2024 01:16 AM
Ok, so the best solution as it stands today (for me personally at least) is this:
04-17-2024 12:28 AM
I'd not use databricks connect/spark connect in that case.
Instead run spark locally. Of course you will not have databricks specific tools (like dbutils etc)
04-17-2024 01:38 AM
Problem is that I don't see how you can have both spark native and Databricks Connect (Spark Connect). The guidelines suggest one or the other, which is a bit of a pickle.
04-17-2024 03:18 AM
you could try to separate the environments f.e. using containers/vm's.
Probably there are other ways too, but these immediately came to mind.
04-18-2024 01:16 AM
Ok, so the best solution as it stands today (for me personally at least) is this: