Databricks Community

Fed · ‎03-10-2023

This article rightly suggests to install `ray` with `%pip`, although it fails to mention that installing it as a cluster library won't work.

The reason, I think, is that `setup_ray_cluster` will use `sys.executable` (ie `/local_disk0/.ephemeral_nfs/envs/pythonEnv-{UUID}/bin/python`) to run start_ray_node.py, which in turn calls the `ray` executable command.

If `ray` is installed with `%pip` its executable command will be in the same folder as `sys.executable` and so everything works fine, but if `ray` is installed as a cluster library (ie in `/local_disk0/.ephemeral_nfs/cluster_libraries/python`) then it won't find it.

I've tried to add it to PATH but didn't work

import sys
 
sys.path.append("/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin")

And some more debugging (in a new session)

import subprocess
import sys
import os
 
print("/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin" in sys.path)  # False
print("/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin" in os.environ["PATH"])  # True
print(subprocess.run(["ray", "--version"], capture_output=True).stdout.decode("utf-8"))  # ray, version 2.3.0

Fed · ‎03-10-2023

Ugly, but this seems to work for now

import sys
import os
import shutil
from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster
 
shutil.copy(
    "/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/ray",
    os.path.dirname(sys.executable),
)
 
setup_ray_cluster(
  num_worker_nodes=4,
  num_cpus_per_node=8,
  collect_log_to_path="/dbfs/ray/logs"
)

View solution in original post

Fed · ‎03-10-2023

Ugly, but this seems to work for now

import sys
import os
import shutil
from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster
 
shutil.copy(
    "/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/ray",
    os.path.dirname(sys.executable),
)
 
setup_ray_cluster(
  num_worker_nodes=4,
  num_cpus_per_node=8,
  collect_log_to_path="/dbfs/ray/logs"
)

Databricks Community

Ray as a cluster library instead of notebook-scoped library

Connect with Databricks Users in Your Area

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Introducing SAP Databricks

Serverless Compute for Notebooks, Workflows and Pipelines is now Generally Available on Google Cloud

Welcoming BladeBridge to Databricks: Accelerating Data Warehouse Migrations to Lakehouse

Databricks Clean Rooms: Now Generally Available on AWS and Azure