Databricks

wschoi · ‎03-07-2023

If possible, how can one go about installing a Python library with SDK dependencies like pyRFC? (https://github.com/SAP/PyRFC)

The SDK dependencies depend on the type of OS, and since we're running Databricks out of AWS, I assume one would have to match the SDK to correspond with a Linux OS.

And how would we point the cluster to look for the SDK files needed for library compilation? Do we have to keep the SDK files in an S3 storage (outside of Databricks), or can we keep these files within DBFS?

I am assuming environment variables pointing to the location of necessary files for the library to run can be declared within the cluster creation UI.

Sidenote: Team is working on building the infra to stage our Databricks, so hopefully I can get some test results in once that is done to beef up this question, and maybe even make it a post at some point.

Would appreciate insight! Thanks!

wschoi · ‎03-10-2023

To answer my own question:

It was actually a much simpler task than I had originally imagined.

Created a python whl package for the library, which was compatible with the AWS Linux Ubuntu VMs.

Uploaded the dependency files (SDK) onto DBFS storage.

Set the cluster environment variables to point to the dependency file path,

Then installed the library via whl install.

Also worked in configuring job clusters.

Was actually easier than installing the library locally on a Arm64 chip.

View solution in original post

Aviral-Bhardwaj · ‎03-07-2023

this will help you https://docs.databricks.com/clusters/init-scripts.html

wschoi · ‎03-08-2023

Will give this a try now that we have a workspace running. Thanks!

wschoi · ‎03-10-2023

To answer my own question:

It was actually a much simpler task than I had originally imagined.

Created a python whl package for the library, which was compatible with the AWS Linux Ubuntu VMs.

Uploaded the dependency files (SDK) onto DBFS storage.

Set the cluster environment variables to point to the dependency file path,

Then installed the library via whl install.

Also worked in configuring job clusters.

Was actually easier than installing the library locally on a Arm64 chip.

Anonymous · ‎03-31-2023

Hi @Wonseok Choi

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you. Thank you!

Databricks

How can I cluster-install a c-Python library (pyRFC)?

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI