Databricks Community

Xiaowei · 03-21-2024

I think I finally worked this out.Here is the extra code to save out the model only once and from the 1st node:context = pyspark.BarrierTaskContext.get() if context.partitionId() == 0: mlflow.keras.log_model(model, "mymodel")

Xiaowei · 03-14-2024

I guess spark_tensorflow_distributor is probably obsolete since there is no update since 2020.Horovod (https://github.com/horovod) seems a better choice on using tensorflow in Databricks with Spark.

Xiaowei · 03-13-2024

Is there any update on the answer? I am curious too.Is there a merge operation after all the distributed training finished?

Databricks Community

User Stats

User Activity

Re: How to save model produce by distributed training?

Re: How to save model produce by distributed training?

Re: How to save model produce by distributed training?