05-01-2024 10:35 AM
Have been running into an issue when running a pymc-marketing model in a Databricks notebook. The cell that fits the model gets hung up and the progress bar stops moving, however the code completes and dumps all needed output into a folder. After the code completes I have to then detach the notebook since hitting Interrupt doesn't respond. I took a peek at the cluster logs and can confirm everything runs as expected (see screenshot!).
Any ideas the issue here or have you run into the same issue??
05-09-2024 10:04 AM
Hi @Retired_mod,
Thank you for replying with some trouble shooting steps. Really appreciate it! I've added some more context below in red.
05-10-2024 08:02 AM
Hey @tim-mcwilliams,
got exactly, I mean exactly the same problem. Have you found any solution?
05-10-2024 11:03 AM
Hey @Piotrus321 ,
I have not found any solution as of yet. I've been messing with cluster configs, but it seems to be a bigger problem here than compute power.
05-13-2024 06:10 AM
Hey @tim-mcwilliams
I think I've found a solution that seems to work. It's seems that py-mc marketing displayed output somehow crashed the databricks cell. I disabled it by adding %%capture at the beginning of the cell and ; at the end of the cell.
05-14-2024 09:37 AM
Hey @Piotrus321
Good find! I gave that a try but unfortunately I am getting the same behavior. I added %%capture to both the beginning and end of the cell that run the model fitting code. The cell ran for about an hour and a half, while I was doing some other work. Came back to it and canceled the cell, but it still hung up on me.
My data isn't big, about 4 months worth with about 6 variables. The same model run in about 1.5 mins on my local machine.
05-14-2024 10:15 AM - edited 05-14-2024 10:18 AM
This can be a frustrating situation where the notebook cell appears stuck, but the code execution actually finishes in the background. Here are some steps you can troubleshoot to resolve this: camzap bazoocam
1. Restart vs Interrupt:
2. Check for Deadlocks:
3. Identify Long-Running Processes:
4. Resource Constraints:
5. Concurrency and Parallelism:
6. Logging and Debugging:
7. Update Libraries and Restart Kernel:
8. Consider Alternatives:
10-07-2024 02:59 AM
hi @tim-mcwilliams, Did you manage to fix the issue or identify the root cause?
It would be really helpful to know. Thanks a lot.
10-14-2024 11:27 AM
Did you manage to fix the issue or identify the root cause?
10-14-2024 10:54 PM - edited 10-14-2024 11:00 PM
Hi @tim-mcwilliams, @haseebasif, @Mickel, @g000gl, @Piotrus321,
I encountered a similar problem with a Prophet Markov chain Monte Carlo (MCMC) model that caused my browser to completely drain both the CPU and RAM. Even after the workflow was completed, attempting to open either a notebook or a script used to run the code resulted in the same issue.
I suspect the behaviour you are experiencing is similar to mine and is related to the large amount of log output generated during the Bayesian inference process.
In my case, this behaviour was caused by the cmdstanpy library which is what runs the MCMC simulation and likely stems from the underlying C++ compiled code. Therefore, using the logging library to set different levels for the python packages does not work.
Since the output is printed out directly to the stdout and stderr streams by the underlying Stan processes, regardless of whether you are running the code from a notebook or a script file, you can just redirect both stdout and stderr streams while running the code with a custom contextmanager class.
import os import sys
from contextlib import contextmanager @contextmanager def suppress_stdout_stderr(): with open(os.devnull, 'w') as devnull: old_stdout = sys.stdout old_stderr = sys.stderr try: sys.stdout = devnull sys.stderr = devnull yield finally: sys.stdout = old_stdout sys.stderr = old_stderr
with suppress_stdout_stderr(): model = your_model_class() model.fit(your_data)
This way the context manager opens a null file for writing the logs and then discards all data written to it.
12-10-2024 10:16 AM
@tim-mcwilliams I'm not sure if you found a workaround or a fix for this issue.
We have recently found another issue (Integration between PyMC and Databricks Kernel does not go well. Specifically, the rendering logic of the progress bar in PyMC) that I think is similar and relates to the issue described. While we are yet to root out the issue, it has been found that disabling the progress bar (progressbar=False) helped to keep the Notebook cell alive/responsive. You could try disabling the progress bar a try.
Here is how the workaround we used for the other use case involving PyMC
trace_04 = pm.sample(nuts={'target_accept':0.8}, var_names=["alpha", "delta", "mu", "sig"], progressbar=False)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group