โ07-04-2025 12:51 PM
Hi,
Is it possible to directly execute a Bash script or a binary executable from the operating system of a Databricks job compute node using a single node cluster?
Iโm using databricks asset bundels for job initialization and execution. When the job starts, I plan to clone a repository containing Bash scripts and binary executables so that these are available within the single node cluster environment.
The goal is to run one of these scripts or binaries directly from the compute nodeโs operating system, passing parameters at runtime, and without intermediaries like notebooks cells sh , scripts python subprocesos, or init scripts
โ07-05-2025 09:19 AM
Hi @jorperort I know there is no direct task like bash_task for jobs that allows you to run bash scripts without using notebook cells %sh or Python's subprocess. Have you considered using Init scripts for your cluster while setting up the job?
With the init script, you can execute bash scripts to download or even run commands on a cluster. You can try it and tell me how it goes
2 weeks ago
Hello @jorperort , I did some research internally and have some tips/suggestions for you to consider:
Mechanism | Can Run Bash/Binary Directly on OS? | Notes |
---|---|---|
Init Scripts | Yes | Only runs at startup, not job runtime |
Notebook %sh Cell/Subprocess | Indirect, via managed kernel | Not โraw OSโ, but runs in shell subprocess |
Databricks Asset Bundles | No | Can set up scripts, but must call via supported interfaces |
SSH/SHELL to Node | No | Not available for job clusters |
Custom Job Task (raw script) | No | Only notebook, SQL, dbt, JAR, or Python supported |
โ07-05-2025 09:19 AM
Hi @jorperort I know there is no direct task like bash_task for jobs that allows you to run bash scripts without using notebook cells %sh or Python's subprocess. Have you considered using Init scripts for your cluster while setting up the job?
With the init script, you can execute bash scripts to download or even run commands on a cluster. You can try it and tell me how it goes
โ07-05-2025 11:17 AM
Good afternoon @eniwoke @ , and thank you very much for your commen
Yes, I had already considered using scripts, but I was hoping to find a different solution, since the idea is to pass parameters to these Bash scripts or compiled executables. I assume that if I approach it with init_scripts, I would have to retrieve those parameters through environment variables.
I'm not sure if I can retrieve through environment variables, but Iโm not certain whether this init script runs before the environment variables have been set.
I would prefer another solution. I'm not sure if spark-submit can be used for this, because while it's possible to do a spark-submit of a compiled JAR, if it's not a JAR but another type of compiled file, I donโt know if it can be executed.
That's another point: if anyone has encountered this issue, it would be really helpful if they could share their experience. It would be greatly appreciiate.
2 weeks ago
Hello @jorperort , I did some research internally and have some tips/suggestions for you to consider:
Mechanism | Can Run Bash/Binary Directly on OS? | Notes |
---|---|---|
Init Scripts | Yes | Only runs at startup, not job runtime |
Notebook %sh Cell/Subprocess | Indirect, via managed kernel | Not โraw OSโ, but runs in shell subprocess |
Databricks Asset Bundles | No | Can set up scripts, but must call via supported interfaces |
SSH/SHELL to Node | No | Not available for job clusters |
Custom Job Task (raw script) | No | Only notebook, SQL, dbt, JAR, or Python supported |
Monday
Thanks for the info and your response โ everything is clear now.
In short, I wanted to know if itโs possible to run a compiled Rust binary in Databricks. I tested locally and was able to process large datasets without Spark. Pandas wasnโt enough, Polars in Python worked, but Rust with Polars gave much better performance.
I was hoping to use that binary in Databricks without a full cluster, as a single machine with Rust was enough. But from your explanation, I understand thatโs not currently possible.
So, if I want to leverage Rust, I should create bindings with Python (e.g., using PyO3) and call it from scripts or notebooks, combining Rustโs performance with Pythonโs ease of use.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now