Databricks Community

PabloCSD · ‎09-30-2024

Context:

Hello, I was using a workflow for a periodic process, with my team we were using a Job Compute, but the libraries were not working (even though we had a PIP_EXTRA_INDEX_URL defined in the Environment Variables of the Cluster, so we now use a workaround where we generated a cluster and we manually installed each library in the libraries section of the cluster.

Problem:

Py4JJavaError: An error occurred while calling o1160.save.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: com.microsoft.sqlserver.jdbc.spark. Please find packages at `https://spark.apache.org/third-party-projects.html`. SQLSTATE: 42K02

Also when I check the website there is nothing, what do you recommend?

PabloCSD · ‎09-30-2024

I installed in the cluster this library:

spark_mssql_connector_2_12_1_4_0_BETA.jar

A colleague passed me this .jar file. It seems that can be obtained from here: https://github.com/microsoft/sql-spark-connector/releases.

This allows the task to end succesfully being a way for fixing this error.

View solution in original post

PabloCSD · ‎09-30-2024

I installed in the cluster this library:

spark_mssql_connector_2_12_1_4_0_BETA.jar

A colleague passed me this .jar file. It seems that can be obtained from here: https://github.com/microsoft/sql-spark-connector/releases.

This allows the task to end succesfully being a way for fixing this error.

Databricks Community

[DATA_SOURCE_NOT_FOUND] Failed to find data source

Join Us as a Local Community Builder!

🚀 Weekly Delta (24-30 September): A Look Back at This Week’s Top Community Highlights!

Announcing Data Intelligence for Cybersecurity

🌟 Community Sparks of the Week | September 19 – 25 🌟

Run OpenAI Models Directly on Databricks

Solution Accelerator Series | #3 - Build Demand Forecasts at Scale