cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DataBRObin
by New Contributor III
  • 1042 Views
  • 2 replies
  • 0 kudos

Running Keras model training with HorovodRunner works until the training function is exited ("The MPI_Query_thread() function was called after MPI_FINALIZE was invoked.")

I am running training of a Keras/Tensorflow deep learning model on a cluster of (for now) 2 workers and 1 driver (T4 GPU, 28GB, 4 core) using the Databricks provided HorovodRunner. It all seems to go well and the performance scales quite nicely over ...

  • 1042 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

I personally suspect it's your callbacks. Can you remove all those state callbacks and see if that is it?

  • 0 kudos
1 More Replies
Tilo
by New Contributor
  • 1891 Views
  • 3 replies
  • 3 kudos

Resolved! MLFlow: How to load results from model and continue training

I'd like to continue / finetune training of an existing keras/tensorflow model. We use MLFlow to store the model. How can I load the wieght from an existing model to the model and continue "fit" preferable with a different learning rate.Just loading ...

  • 1891 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Tilo Wünsche​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 3 kudos
2 More Replies
Labels