cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I cluster-install a c-Python library (pyRFC)?

wschoi
New Contributor III

If possible, how can one go about installing a Python library with SDK dependencies like pyRFC? (https://github.com/SAP/PyRFC)

The SDK dependencies depend on the type of OS, and since we're running Databricks out of AWS, I assume one would have to match the SDK to correspond with a Linux OS.

And how would we point the cluster to look for the SDK files needed for library compilation? Do we have to keep the SDK files in an S3 storage (outside of Databricks), or can we keep these files within DBFS?

I am assuming environment variables pointing to the location of necessary files for the library to run can be declared within the cluster creation UI.

Sidenote: Team is working on building the infra to stage our Databricks, so hopefully I can get some test results in once that is done to beef up this question, and maybe even make it a post at some point.

Would appreciate insight! Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

wschoi
New Contributor III

To answer my own question:

It was actually a much simpler task than I had originally imagined.

Created a python whl package for the library, which was compatible with the AWS Linux Ubuntu VMs.

Uploaded the dependency files (SDK) onto DBFS storage.

Set the cluster environment variables to point to the dependency file path,

Then installed the library via whl install.

Also worked in configuring job clusters.

Was actually easier than installing the library locally on a Arm64 chip.

View solution in original post

4 REPLIES 4

Aviral-Bhardwaj
Esteemed Contributor III

Will give this a try now that we have a workspace running. Thanks!

wschoi
New Contributor III

To answer my own question:

It was actually a much simpler task than I had originally imagined.

Created a python whl package for the library, which was compatible with the AWS Linux Ubuntu VMs.

Uploaded the dependency files (SDK) onto DBFS storage.

Set the cluster environment variables to point to the dependency file path,

Then installed the library via whl install.

Also worked in configuring job clusters.

Was actually easier than installing the library locally on a Arm64 chip.

Anonymous
Not applicable

Hi @Wonseok Choi​ 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you. Thank you!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.