by
sanjay
• Valued Contributor II
- 38118 Views
- 2 replies
- 1 kudos
Hi,I am using pynote/whisper large model and trying to process data using spark UDF and getting following error.torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 14.76 GiB total capacity; 6.07 GiB already allocated...
- 38118 Views
- 2 replies
- 1 kudos
Latest Reply
Try to run these codesimport torchtorch.cuda.empty_cache()And make sure to find the optimize batch size otherwise the error can occur again
1 More Replies
by
ppang
• New Contributor III
- 1527 Views
- 1 replies
- 0 kudos
I have been trying to start a cluster using DCS with GPU containers (https://github.com/databricks/containers/tree/master/ubuntu/gpu), but was only successful with Databricks Runtime 10.4 LTS and lower. With Databricks Runtime 11.3 LTS and higher, I ...
- 1527 Views
- 1 replies
- 0 kudos
Latest Reply
Hello @ppang !
Since you posted your question, the repository you shared has received an update, which includes the following warning:
"Using conda in DCS images is no longer supported starting Databricks Runtime 9.0. We highly recommend users to ext...
- 2213 Views
- 2 replies
- 0 kudos
I am running training of a Keras/Tensorflow deep learning model on a cluster of (for now) 2 workers and 1 driver (T4 GPU, 28GB, 4 core) using the Databricks provided HorovodRunner. It all seems to go well and the performance scales quite nicely over ...
- 2213 Views
- 2 replies
- 0 kudos
Latest Reply
I personally suspect it's your callbacks. Can you remove all those state callbacks and see if that is it?
1 More Replies
- 3626 Views
- 2 replies
- 1 kudos
Hello Databricks community!We are facing a strong need of serving some of public and our private models on GPU clusters and we have several requirements:1) We'd like to be able to start/stop the endpoints (best with scheduling) to avoid excess consum...
- 3626 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @Alisher Akh Does @Debayan Mukherjee's answer help? If yes, would you be happy to mark the answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you further. Cheers!
1 More Replies
by
zzy
• New Contributor III
- 2093 Views
- 2 replies
- 2 kudos
I have a dataset about 5 million rows with 14 features and a binary target. I decided to train a pyspark random forest classifier on Databricks. The CPU cluster I created contains 2 c4.8xlarge workers (60GB, 36core) and 1 r4.xlarge (31GB, 4core) driv...
- 2093 Views
- 2 replies
- 2 kudos
Latest Reply
In many cases, you need to adjust your code to utilize GPU.
1 More Replies