cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I cluster-install a c-Python library (pyRFC)?

wschoi
New Contributor III

If possible, how can one go about installing a Python library with SDK dependencies like pyRFC? (https://github.com/SAP/PyRFC)

The SDK dependencies depend on the type of OS, and since we're running Databricks out of AWS, I assume one would have to match the SDK to correspond with a Linux OS.

And how would we point the cluster to look for the SDK files needed for library compilation? Do we have to keep the SDK files in an S3 storage (outside of Databricks), or can we keep these files within DBFS?

I am assuming environment variables pointing to the location of necessary files for the library to run can be declared within the cluster creation UI.

Sidenote: Team is working on building the infra to stage our Databricks, so hopefully I can get some test results in once that is done to beef up this question, and maybe even make it a post at some point.

Would appreciate insight! Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

wschoi
New Contributor III

To answer my own question:

It was actually a much simpler task than I had originally imagined.

Created a python whl package for the library, which was compatible with the AWS Linux Ubuntu VMs.

Uploaded the dependency files (SDK) onto DBFS storage.

Set the cluster environment variables to point to the dependency file path,

Then installed the library via whl install.

Also worked in configuring job clusters.

Was actually easier than installing the library locally on a Arm64 chip.

View solution in original post

4 REPLIES 4

Aviral-Bhardwaj
Esteemed Contributor III

this will help you https://docs.databricks.com/clusters/init-scripts.html

AviralBhardwaj

Will give this a try now that we have a workspace running. Thanks!

wschoi
New Contributor III

To answer my own question:

It was actually a much simpler task than I had originally imagined.

Created a python whl package for the library, which was compatible with the AWS Linux Ubuntu VMs.

Uploaded the dependency files (SDK) onto DBFS storage.

Set the cluster environment variables to point to the dependency file path,

Then installed the library via whl install.

Also worked in configuring job clusters.

Was actually easier than installing the library locally on a Arm64 chip.

Anonymous
Not applicable

Hi @Wonseok Choi​ 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you. Thank you!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group