cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Using init scripts on UC enabled shared access mode clusters

ah0896
New Contributor III

I know that UC enabled shared access mode clusters do not allow init script usage and I have tried multiple workarounds to use the required init script in the cluster(pyodbc-install.sh, in my case) including installing the pyodbc package as a workspace library and using that library in the cluster and also using magic commands to install the init script directly in the notebooks but have received errors with both those workarounds. Is there any other ways to use the init scripts in the UC enabled shared access mode cluster?

I have attached the error being received when I try to install the pyodbc library. I am using an admin account as well.

Library installation attempted on the driver node of cluster 0516-171623-3i9on4it and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: org.apache.spark.SparkException: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install 'pyodbc 4.0.39' --disable-pip-version-check) exited with code 1. WARNING: The directory '/home/libraries/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
ERROR: Invalid requirement: 'pyodbc 4.0.39'

17 REPLIES 17

karthik_p
Esteemed Contributor

@Akarsh Hebbar​ init scripts are only supported on single user access modes, please use single user access mode and try

ah0896
New Contributor III

@karthik p​ Thank you for the answer but I do understand that init scripts are not supported on UC shared clusters and are supported only on single user clusters. That's why I wanted to check if there is a workaround to enable the odbc driver on the shared UC cluster.

Anonymous
Not applicable

Hi @Akarsh Hebbar​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

karthik_p
Esteemed Contributor

@Akarsh Hebbar​ by any chance r u customizing libraries, if not try to go with workspace library config as shown in below article https://docs.databricks.com/libraries/cluster-libraries.html

ah0896
New Contributor III

@karthik p​ No I am installing the pyodbc library through the workspace by first installing the library in the workspace and then trying to install it on the cluster and it just fails to install on the cluster and gives the error shown above.

rsenjins
New Contributor III

Hello @karthik_p , we encounter this problem. We're having a custom library which we want to use in a convenient way on our multi-user UC cluster. Best case scenario we want to install it by default on the cluster. Although it is not possible to fetch it from our private repository via the libraries interface, and also no init scripts are available. What is the best solution to solve this? We'd rather not install it at every (notebook) instance...

karthik_p
Esteemed Contributor

@aersen never tried this scenario, but can you create global init script and see if it works, as this script is available for all clusters and looks it should support shared mode also (DBR should be 13.1 and above please) https://docs.databricks.com/clusters/init-scripts.html

rsenjins
New Contributor III

Thanks for the quick response. Unfortunately the global init scripts did not work for the multi-user UC cluster. It seems quite a challenge to make a customer library available on such clusters. Are you maybe aware whether init scripts for multi user UC clusters are in the pipeline to be developed in further databricks versions >13.1?

karthik_p
Esteemed Contributor

@rsenjins looks response provided by @KKDataEngineer should help you, looks he used workspace library config and use workspace lib in cluster should be able to help.

@KKDataEngineer looks custom library config is not possible for workspace level library, did you make any custom changes to during library config and used in shared access mode please. i can see custom changes may not be supported 

 

 

@karthik_p  we are still in the preliminary steps on this in our SAAS product. we have not thoroughly tested but have not seen any issues so far. we were able to consistently add libraries to all the clusters with the init scrip in a UC enable workspace.

 

@KKDataEngineer looks custom library config is not possible for workspace level library, did you make any custom changes to during library config and used in shared access mode please. i can see custom changes may not be supported 

No we have not done any custom library configs. You could be right on the limitations.

karthik_p
Esteemed Contributor

@Akarsh Hebbar​ can you go with cluster specific(cluster-installed library) and pick runtime 13.1 or above image.png

ah0896
New Contributor III

@karthik p​ Yup. that's the only runtime version that supports library installs so the cluster is on version 13.1.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.