Ray cannot detect GPU on the cluster

Administration & Architecture

Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.

I am trying to run ray on databricks for chunking and embedding tasks. The cluster I’m using is:

g4dn.xlarge
1-4 workers with 4-16 cores
1 GPU and 16GB memory

I have set spark.task.resource.gpu.amount to 0.5 currently.

This is how I have setup my ray cluster:

setup_ray_cluster( min_worker_nodes=1, max_worker_nodes=3, num_gpus_head_node=1, )

And this is the chunking function:

@ray.remote(num_gpus=0.2)
def chunk_udf(row):
    texts = row["content"]
    data = row.copy()
    split_text = splitter.split_text(texts)
    split_text = [text.replace("\n", " ") for text in split_text]
    return list(zip(split_text,data))

When I run the flat_map function for chunking. It throws the following error:

chunked_ds = ds.flat_map(chunk_udf)
chunked_ds.show(5)

At least one of the input arguments for this task could not be computed: ray.exceptions.RaySystemError: System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Is there something I need to change in my setup?
torch.cuda.is_available() returns True in the notebook.

0 REPLIES 0

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.