- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-03-2025 06:13 PM
This is a common issue with MLflow on Databricks, particularly when dealing with large experiments or numerous artifacts.
The "filedescriptor out of range in select()" error typically occurs due to resource exhaustion or connection pool issues with
the Py4J gateway that bridges Python and Spark/JVM.
The most effective immediate solution is usually to reduce the frequency of artifact logging and increase the file descriptor limits.
If the issue persists, try separating the training and logging phases entirely.
Reduce Artifact Logging Frequency
Instead of logging artifacts at every epoch, log them at intervals:
# Log artifacts every 10 epochs instead of every epoch
if epoch % 10 == 0:
mlflow.log_artifact(artifact_path)