cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job fails on clusters only with library dependency

matmad
New Contributor III

Hello!

I have following problem: All my job runs fail when the job uses a library. Even the most basic job (print a string) and the most basic library package (no secondary dependencies, the script does not even import/use the library) fails with `Failed to reach the driver`:

matmad_0-1754491920584.png

* All my libraries are python wheels
* I use `spark_python_task` (but also tested `python_wheel_task` with same error)
* If I use serverless (same script, same .whl), all works fine
* If I remove the package from the job's library section, all works fine (as said: I don't even import/use the library)
* I also used a different python wheel package and created a wheel using https://docs.databricks.com/aws/en/jobs/how-to/use-python-wheels-in-workflows#step-6-run-the-job-and...
* It doesn't matter if I configure the job via yml in asset bundle or "manually" in the UI

The logs of the clusters don't really help me.

I really appreciate your ideas - thank you!

The script:

matmad_1-1754492300025.png

The job yml:

matmad_2-1754492381339.png

 

1 ACCEPTED SOLUTION

Accepted Solutions

matmad
New Contributor III

I think I found a (the?) solution. The cluster tried to connect to the legacy Hive Catalog, so I

* set the default catalog for the workspace to the proper catalog
* disabled "Legacy access"

These steps solve my `DriverError`. This log4j error message gave the hint: `Caused by: com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Could not connect to address=(host=consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com)(port=3306)(type=master) : Socket fail to connect to host:consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com, port:3306. Connect timed out`

I still don't know why installing the wheel made such problems, but I consider my problem now as solved.

View solution in original post

3 REPLIES 3

matmad
New Contributor III

Maybe worth mentioning: If I install the library in a python notebook using
%pip install /Workspace/Shared/code/my_package-0.1-py3-none-any.whl
all works fine.

matmad
New Contributor III

My current workaround (I'm surprised that this works) is to install the library via "pypi" (actually: an internal pypi mirror using artifactory) instead of via the .whl file. I would still be interested in a reason of and a solution to the problem, though. 
Thanks!

matmad
New Contributor III

I think I found a (the?) solution. The cluster tried to connect to the legacy Hive Catalog, so I

* set the default catalog for the workspace to the proper catalog
* disabled "Legacy access"

These steps solve my `DriverError`. This log4j error message gave the hint: `Caused by: com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Could not connect to address=(host=consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com)(port=3306)(type=master) : Socket fail to connect to host:consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com, port:3306. Connect timed out`

I still don't know why installing the wheel made such problems, but I consider my problem now as solved.