Hey, I've hit almost this exact wall before and that Java thread dump in stderr is a very specific symptom — let me share what I've learned.
On your question about the JVM crash behavior
Yes, this is a known (if underdocumented) failure mode. What you're seeing isn't really a "crash" in the traditional sense — it's the Databricks cluster health monitor detecting that a child process spawned during init is hanging or consuming resources unexpectedly, and it's dumping JVM state as part of its diagnostic shutdown sequence. The misleading part is that it surfaces in stderr of your init script log, making it look like your bash script caused it. In reality, the script likely triggered a process that got intercepted.
On the security hardening question
Almost certainly yes. If your workspace is deployed behind a VNet with custom DNS, NSGs, or has an EDR agent running (SentinelOne and CrowdStrike are both known to do this), apt-get and dpkg can get killed mid-execution because they attempt to modify /etc, /lib, or run post-install hooks that the security agent flags. The frustrating thing is the exit code gets swallowed or misattributed. You're not imagining it — the environment is locked down, and that's by design.
What actually works in this scenario
Rather than fighting the init script path, I'd step back and challenge the ODBC requirement entirely. If your goal is Python notebook → remote Databricks SQL Warehouse, you have options that don't require any OS-level driver installation:
- databricks-sql-connector timing out — this is almost certainly a network issue, not a library issue. The connector uses port 443 to the warehouse's HTTP path, same as your browser. If it's hanging, your cluster's egress is being blocked at the NSG or firewall level for that specific destination. Check whether adb-<your-target-workspace>.azuredatabricks.net on port 443 is allowed outbound from your cluster's subnet. This is fixable without touching init scripts.
- If you must use ODBC — instead of installing via dpkg in an init script, try pre-building a custom Docker container image with the ODBC driver baked in, and use Databricks Container Services to run your cluster from that image. This sidesteps the init script execution entirely and is generally tolerated even in hardened environments because you're not modifying the OS at runtime.
- Spark remote execution — depending on what you're actually doing with the SQL Warehouse, spark.read with a JDBC URL pointed at the warehouse's built-in JDBC endpoint might be the cleanest path, again with no driver install needed.
Bottom line
Your diagnosis is correct. The environment is locked down. But the right fix is probably unblocking the network path for databricks-sql-connector rather than trying to win a fight against the security layer with init scripts. Work with your infra/networking team to confirm outbound 443 is allowed to the target workspace hostname from your cluster subnet — that's usually the single thing standing between you and a working connection.
Hope this helps narrow it down.