Stable Baselines 3 (SB3) models can be optimized with Optuna for hyperparameter search, but parallelizing these searches using Joblib with Spark as the backend—like the classic scikit-learn example—commonly encounters issues. The root problem is that SB3 reinforcement learning models are resource-intensive and not always "joblib-friendly" due to how they interact with underlying hardware resources (e.g., GPUs, multiprocessing environments, or TensorFlow/PyTorch backends).
Optuna Parallelization with SB3
-
Basic Optuna parallelization, such as using n_jobs with threads or processes, is supported for typical objective functions.
-
Optuna generally works with SB3 for hyperparameter searches, as documented in several examples.
-
Using joblib or joblib-spark with Optuna is feasible for standard tasks, but issues may arise with RL frameworks due to non-serializable objects and resource sharing (like open GPU contexts or multiprocessing conflicts).
Special Considerations for SB3 and Spark
-
Many users report that while joblib-based parallelization works for light ML models, SB3's heavier dependencies and multiprocessing for environments (e.g., SubprocVecEnv) can conflict with joblib's job management. This often results in jobs running sequentially instead of in true parallel when Spark or similar backends are used.
-
When running RL workloads, running parallel jobs in isolated processes or separate Python environments (rather than threads or shared workers) is often necessary, as concurrent RL runs can try to use the same physical resources and fail or slow down.
SB3's Internal Parallelism
-
SB3 has its own parallel environment system (SubprocVecEnv), which is the recommended way to parallelize experience collection (timesteps) within a single run—not across multiple separate hyperparameter optimization runs.
-
To run many independent SB3 model trainings in parallel (e.g., each with a different trial/hyperparameter set), use distributed or process-based orchestration (such as launching several independent Python scripts or using cluster tools that isolate each Python process via Docker, Kubernetes, or separate Spark executors).
-
Using tools like Ray or launching many processes directly (and connecting trials to a shared Optuna storage/database) is often more robust than using joblib with Spark for RL, despite Ray's different API.
Practical Recommendation
-
While Spark/joblib parallelism works for simple tasks, it's not reliable or recommended for SB3 RL training runs due to resource conflicts and serialization limitations. Instead:
-
Use distributed trial runners (multiple processes/Python scripts).
-
Back Optuna storage with a database (sqlite or mysql), so all runners share info about completed trials.
-
Consider using Ray when ready for a robust, production-friendly RL HPO, as community support and features are more extensive.
In sum, for parallel SB3 hyperparameter search with Optuna, run individual worker jobs (Python scripts or subprocesses) connected to the same Optuna study in a central database, rather than relying directly on joblib's Spark backend.