Hi @Kaviana , You can connect Databricks Enterprise to an Oracle On-premise database using the cx_Oracle Python module.
- To Install Oracle Client libraries, follow these steps:
- Download the Oracle Instant Client Basic Light Package.
- Unzip the contents to a folder.
- Upload the Instant Client folder to a cluster.
- Copy the Instant Client folder to a system directory.
- Set the environment variables LD_LIBRARY_PATH and ORACLE_HOME.
- Install cx_Oracle from PyPI.
- Restart the cluster.
- Automate the steps using an init script.
Here is a template:
python
%python
dbutils.fs.put("dbfs:/databricks/<init-script-folder>/oracle_ctl.sh","""
#!/bin/bash
wget --quiet -O /tmp/instantclient-basiclite-linuxx64.zip https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip
unzip /tmp/instantclient-basiclite-linuxx64.zip -d /databricks/driver/oracle_ctl/
sudo echo 'export LD_LIBRARY_PATH="/databricks/driver/oracle_ctl/"' >> /databricks/spark/conf/spark-env.sh
sudo echo 'export ORACLE_HOME="/databricks/driver/oracle_ctl/"' >> /databricks/spark/conf/spark-env.sh
""", True)
- Configure the init script as a cluster-scoped init script.
- Install the cx_Oracle library as a cluster-installed library and restart your cluster.
- Permissions needed: "Can Attach To" permission to connect to the running cluster and "Can Restart" permission to trigger the cluster to start if its state is terminated.