cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to install SAP JDBC on job cluster via asset bundles

VicS
Contributor

I'm trying to use the SAP JDBC driver to read data in my Spark application which I deploy via asset bundles with job computes.

I was able to install the SAP JDBC Driver on a general purpose cluster by adding the jar (com.sap.cloud.db.jdbc:ngdbc:2.25.9) in the UI via Libraries -> Install new -> maven -> maven coordinates.
Then I could executed the code, no problem.

But when I try to add the jar on a job cluster in Databricks asset bundles, via spark config - like this:

spark_conf:
      "spark.databricks.cluster.profile": "singleNode"
      "spark.master": "local[*]"
      "spark.sql.session.timeZone": "UTC"
      "spark.databricks.dataLineage.enabled": "true"
      "spark.databricks.delta.retentionDurationCheck.enabled": "false"
      "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension"
      "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"
      "spark.jars.packages": "com.microsoft.azure:azure-storage:8.6.3,org.apache.hadoop:hadoop-azure:3.3.1,io.delta:delta-core_2.12:2.4.0,com.sap.cloud.db.jdbc:ngdbc:2.25.9"

I cannot get it to work, my application fails with the erorr 'java.lang.ClassNotFoundException: com.sap.db.jdbc.Driver' as if it is not installing it correctly. The other jars are available however. 

In the Log4j output of the cluster, I can see the spark config being set correctly:

spark.home=/databricks/spark
spark.jars.packages=com.microsoft.azure:azure-storage:8.6.3,org.apache.hadoop:hadoop-azure:3.3.1,io.delta:delta-core_2.12:2.4.0,com.sap.cloud.db.jdbc:ngdbc:2.25.9

I do not see the sap jar being referenced anywhere else in the log output. 

What am I doing wrong / how can I fix it / furher debug the problem?

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @VicS ,

To add a Maven package to a job task definition , in libraries, specify a maven mapping for each Maven package to be installed. For each mapping, specify the following:

 

resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - maven:
                coordinates: com.databricks:databricks-sdk-java:0.8.1
            - maven:
                coordinates: com.databricks:databricks-dbutils-scala_2.13:0.1.4
                repo: https://mvnrepository.com/
                exclusions:
                  - org.scala-lang:scala-library:2.13.0-RC*

 

For more details refer to following documentation entry:

 Databricks Asset Bundles library dependencies | Databricks Documentation

View solution in original post

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @VicS ,

To add a Maven package to a job task definition , in libraries, specify a maven mapping for each Maven package to be installed. For each mapping, specify the following:

 

resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - maven:
                coordinates: com.databricks:databricks-sdk-java:0.8.1
            - maven:
                coordinates: com.databricks:databricks-dbutils-scala_2.13:0.1.4
                repo: https://mvnrepository.com/
                exclusions:
                  - org.scala-lang:scala-library:2.13.0-RC*

 

For more details refer to following documentation entry:

 Databricks Asset Bundles library dependencies | Databricks Documentation