Databricks Community

PabloCSD · ‎09-30-2024

Context:

Hello, I was using a workflow for a periodic process, with my team we were using a Job Compute, but the libraries were not working (even though we had a PIP_EXTRA_INDEX_URL defined in the Environment Variables of the Cluster, so we now use a workaround where we generated a cluster and we manually installed each library in the libraries section of the cluster.

Problem:

Py4JJavaError: An error occurred while calling o1160.save.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: com.microsoft.sqlserver.jdbc.spark. Please find packages at `https://spark.apache.org/third-party-projects.html`. SQLSTATE: 42K02

Also when I check the website there is nothing, what do you recommend?

PabloCSD · ‎09-30-2024

I installed in the cluster this library:

spark_mssql_connector_2_12_1_4_0_BETA.jar

A colleague passed me this .jar file. It seems that can be obtained from here: https://github.com/microsoft/sql-spark-connector/releases.

This allows the task to end succesfully being a way for fixing this error.

View solution in original post

PabloCSD · ‎09-30-2024

I installed in the cluster this library:

spark_mssql_connector_2_12_1_4_0_BETA.jar

A colleague passed me this .jar file. It seems that can be obtained from here: https://github.com/microsoft/sql-spark-connector/releases.

This allows the task to end succesfully being a way for fixing this error.

Databricks Community

[DATA_SOURCE_NOT_FOUND] Failed to find data source

Photos

Connect with Databricks Users in Your Area

Data + AI Summit 2025 — registration now open!

Jumpstart Your Data Journey with Databricks Get Started Days!

Databricks DevConnect: Global Community Meetups for Data Engineers

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Introducing SAP Databricks