Databricks Community

tgen · ‎06-03-2024

Hi everyone

I'm currently running a shell script in a notebook, and I'm encountering a segmentation fault. This is due to the stack size limitation. I'd like to increase the stack size using ulimit -s unlimited, but I'm facing issues with setting this limit in the notebook environment.

I am using:

2-12 Workers 256-1,536 GB Memory 64-384 Cores

1 Driver256 GB Memory, 64 Cores

Runtime15.2.x-scala2.12

Could anyone provide guidance on how to properly increase the stack size for my shell script using Notebooks in Databricks? Any tips or alternative solutions to avoid the segmentation fault would also be greatly appreciated.

Kaniz_Fatma · ‎06-06-2024

Hi @tgen, To increase the stack size for your shell script in Databricks Notebooks, follow these steps:

Spark Configuration Property: With Databricks Runtime 12.2 LTS and above, you can increase the stack...¹. This property controls the maximum output length for the REPL (Read-Eval-Print Loop) in the notebook environment.
Setting the Configuration: In your Databricks Notebook, navigate to the “Advanced Options” section. Add the following configuration:
```
spark.databricks.driver.maxReplOutputLength <desired_value>
```
Replace <desired_value> with the desired stack size limit (e.g., unlimited).
Restart the Notebook: After making this change, restart your notebook to apply the new configuration.

If you encounter any issues or need further assistance, feel free to ask! 😊

tgen · ‎06-06-2024

Hi @Kaniz_Fatma ,

Thanks for your response. I tried this and unfortunately I could not get it to work.

When I set spark.databricks.driver.maxReplOutputLength to unlimited in the cluster configurations, I got this error message when running in the Notebook: Failure starting repl. Try detaching and re-attaching the notebook. I tried detaching and re-attaching the cluster and continued to get the same message. Looking into it more, it looks like it has to be set to an integer value. I also tried this on the web terminal and I continued to get the segmentation fault error.

Next, I tried setting spark.databricks.driver.maxReplOutputLength to a very high number (e.g. 500000000) and received the same segmentation fault error when running it in the Notebook and web terminal.

Do you have any other ideas of things I could try?

Databricks Community

Increase stack size Databricks

Connect with Databricks Users in Your Area

Intelligent Data Engineering: Beyond the AI Hype

Introducing Databricks Assistant Quick Fix

Establish your Generative AI expertise with the latest Databricks certification

Big Book of Data Engineering — 3rd Edition

GenAI: The Shift to Data Intelligence