Hello,
We have around between 5k to 10K transcript files available in the ADLS gen 2 and we are using hugging face gpt2 model to train and server the model and expecting to pass the workload to different cluster nodes while serving the LLM model and my model takes around 20-30 seconds to process the model and generate the output.
Would like to run the load in the batch mode so that I can reduce the process time and process the files concurrently while selecting the T4 GPU and extending the min (2 Nodes) to max (8 Nodes) workers.