How to utilize clustered gpu for large hf models
Hi,I am using clustered GPU(driver -1GPU and Worker-3GPU), and caching model data into unity catalog but while loading model checkpoint shards its always use driver memory and failed due insufficient memory.How to use complete cluster GPU while loadi...
- 296 Views
- 0 replies
- 0 kudos