cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Serhii
by Contributor
  • 1777 Views
  • 3 replies
  • 1 kudos

Could not launch jobs due to node_type_id (instance) unavailability

I am running hourly job on a cluster using p3.2xlarge GPU instance, but sometimes cluster couldn't start due to instance unavailability. I wander is there is any fallback mechanism to, for example, try a different instance type if one is not availabl...

  • 1777 Views
  • 3 replies
  • 1 kudos
Latest Reply
abagshaw
New Contributor III
  • 1 kudos

 (AWS only) For anyone experiencing capacity related cluster launch failures on non-GPU instance types, AWS Fleet instance types are now GA and available for clusters and instance pools. They help improve chance of successful cluster launch by allowi...

  • 1 kudos
2 More Replies
darkraisisi
by New Contributor
  • 886 Views
  • 0 replies
  • 0 kudos

Is there a way to manually update the cuda required file in the db runtime? There are some rather annoying bugs still in TF 2.11 that have been fixed ...

Is there a way to manually update the cuda required file in the db runtime?There are some rather annoying bugs still in TF 2.11 that have been fixed in TF 2.12.Sadly the latest DB runtime 13.1 (beta) only supports the older TF 2.11 even tho 2.12 was ...

  • 886 Views
  • 0 replies
  • 0 kudos
zzy
by New Contributor III
  • 2050 Views
  • 3 replies
  • 2 kudos

Why is pytorch cuda total memory not aligned with the memory size of GPU cluster I created?

No matter GPU cluster of which size I create, cuda total capacity is always ~16 Gb. Does anyone know what is the issue?The code I use to get the total capacity:torch.cuda.get_device_properties(0).total_memory

  • 2050 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Simon Zhang​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 2 kudos
2 More Replies
jamesw
by New Contributor II
  • 2442 Views
  • 1 replies
  • 1 kudos

Ganglia not working with custom container services

Setup:custom docker container starting from the "databricksruntime/gpu-conda:cuda11" base image layer10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)multi-node, p3.8xlarge GPU computeWhen I try to view Ganglia metrics I am met with "502 Bad Gatewa...

image.png image
  • 2442 Views
  • 1 replies
  • 1 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 1 kudos

Hi @James W​ , Ganglia is not available for custom docker containers by default. This is a known limitation. However, you can try this experimental support for ganglia in custom DCS:https://github.com/databricks/containers/tree/master/experimental/ub...

  • 1 kudos
vishallakha
by New Contributor II
  • 1160 Views
  • 1 replies
  • 2 kudos

How to Enable Files in Repos in DBR 7.3 LTS ML ?

we need a custom version of a GPU cluster with following requirements for a certain project:Ubuntu 18.04Cuda 10.1.Tesla T4 GPU.Availability of /Workspace/Repos folder.All of these requirements are available with DBR ML 7.3 LTS. But one critical compo...

  • 1160 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, To work with non-notebook files in Databricks Repos, you must be running Databricks Runtime 8.4 or above.https://docs.databricks.com/files/workspace.html#configure-support-for-workspace-files

  • 2 kudos
VictorP
by New Contributor
  • 1690 Views
  • 1 replies
  • 3 kudos

Resolved! Does databricks run on GPU?

Does databricks run on GPU?

  • 1690 Views
  • 1 replies
  • 3 kudos
Latest Reply
ron_defreitas
Contributor
  • 3 kudos

There is support for running on GPU which will be beneficial to certain ML workloads.​Cluster​s are configured to run on CPU by default, but you can choose GPU based nodes during creation.

  • 3 kudos
Anonymous
by Not applicable
  • 3028 Views
  • 4 replies
  • 2 kudos

Resolved! Anyone using RAPIDS and cuGraph on a current runtime?

We're in the process of migrating a large graph computation workload to nvidia RAPIDS + cuGraph for GPU acceleration. The package isn't a part of the base runtime and it is available by conda package management only, so can't be installed via init sc...

  • 3028 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Thanks @Prabakar Ammeappin​ , we're looking at this. Strangely, the last commit removed the rapids libraries from the base cuda-images. We're adding them back in.

  • 2 kudos
3 More Replies
Labels