cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to distribute the workload to different worker

BR_DatabricksAI
Contributor

Hello Team, 

I am unable to distribute the workload to databricks different worker while using the hugging face GPT2 LLM model. Jobs always use the 1 node even though we have the min and max worker node setting with 2. 

Appreciate if anyone can share any insight on this. 

Thanks. 

3 REPLIES 3

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, You can check https://docs.databricks.com/en/compute/cluster-config-best-practices.html for reference for cluster configurations best practises. About distributing workloads , could you please elaborate how is this expected? In a general scenario one run  is designed to use a whole cluster until it's healthy. 

BR_DatabricksAI
Contributor

Hello, 

We have around between 5k to 10K transcript files available in the ADLS gen 2 and we are using hugging face gpt2 model to train and server the model and expecting to pass the workload to different cluster nodes while serving the LLM model and my model takes around 20-30 seconds to process the model and generate the output. 

Would like to run the load in the batch mode so that I can reduce the process time and process the files concurrently while selecting the T4 GPU and extending the min (2 Nodes) to max (8 Nodes) workers. 

 

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, It would be ideal to reach your Databricks Account Executive, so that we can consider the situation with your specific architecture and provide you the best solution in this scenario.