How to save model produce by distributed training?
I am trying to save model after distributed training via the following codeimport sys from spark_tensorflow_distributor import MirroredStrategyRunner import mlflow.keras mlflow.keras.autolog() mlflow.log_param("learning_rate", 0.001) import...
- 4507 Views
- 6 replies
- 7 kudos
Latest Reply
I think I finally worked this out.Here is the extra code to save out the model only once and from the 1st node:context = pyspark.BarrierTaskContext.get() if context.partitionId() == 0: mlflow.keras.log_model(model, "mymodel")
- 7 kudos