cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How Install Pyrfc into AWS Databrick using Volumes

Miguel_Salas
New Contributor II

I'm trying to install Pyrfc in a Databricks Cluster (already tried in r5.xlarge, m5.xlarge, and c6gd.xlarge). I'm following these link.

https://community.databricks.com/t5/data-engineering/how-can-i-cluster-install-a-c-python-library-py...

But I am still having problems installing Pyrfc

Set environment variables in cluster sets, and put an init script in cluster.

2 REPLIES 2

Miguel_Salas
New Contributor II

More details about the error

Library installation attempted on the driver node of cluster 0000-000000-00000 and failed. Please refer to the following error message to fix the library or contact Databricks support. Error code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error message: org.apache.spark.SparkException: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install 'pyrfc==3.3.1' --disable-pip-version-check) exited with code 1.   Running command pip subprocess to install build dependencies
  Using pip 23.2.1 from /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pip (python 3.11)
  Non-user install by explicit request
  Created build tracker: /tmp/pip-build-tracker-xk61ox_k
  Entered build tracker: /tmp/pip-build-tracker-xk61ox_k
  Created temporary directory: /tmp/pip-install-y24e5y8f
  Created temporary directory: /tmp/pip-ephem-wheel-cache-462pfql0
  Looking in indexes: https://pypi.org/simple, 

stbjelcevic
Databricks Employee
Databricks Employee

Thanks for the details. The PyRFC package is a Python binding around the SAP NetWeaver RFC SDK and requires the SAP NW RFC SDK to be present at build/run time; it does not work as a pure Python wheel on Linux without the SDK.

The project is archived and no longer maintained by SAP, so installation can be brittle, and your environment must match what the SDK supports.

On Linux, PyRFC builds from source and needs a C toolchain plus Cython; prebuilt wheels are only provided for some platforms (Windows/macOS and certain Ubuntu builds in the GitHub releases).

PyRFC 3.x added Python 3.11 support, so using DBR with Python 3.11 is fine from a version perspective; the failures youโ€™re seeing are almost certainly due to missing SDK headers/libs or environment variables not being visible to the install process.

Step-by-step fix (Databricks cluster)
1) Acquire the SDK: Download SAP NW RFC SDK 7.50 PL12 from the SAP Support Portal and store the ZIP somewhere accessible (e.g., DBFS, S3). You need proper SAP credentials to access downloads.

2) Upload the SDK to DBFS: Put the ZIP at /dbfs/FileStore/nwrfcsdk/nwrfc750P_12.zip (adjust name as needed).

3) Create a global init script that:
- Installs build tooling and Cython (Linux).
- Unzips the SDK to a fixed path on every node.
- Exports SAPNWRFC_HOME and LD_LIBRARY_PATH so theyโ€™re visible to all processes (including the โ€œlibrariesโ€ user).