cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Cannot install com.microsoft.azure.kusto:kusto-spark

blackcoffeeAR
Contributor

Hello,

I'm trying to install/update the library com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.x

Tried to install with Maven central repository and using Terraform.

It was working previously and now the installation always ends with error:

โ”‚ Error: cannot create library: mvn:com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.7 failed: Library installation attempted on the driver node of cluster ...... and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File file:/local_disk0/tmp/clusterWideResolutionDir/maven/ivy/jars/io.netty_netty-transport-native-kqueue-4.1.59.Final.jar does not exist

I'm not sure if it is Databricks or Maven or library issue.

Install other libraries e.g. com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22 works.

Clusters configuration:

{
    "autoscale": {
        "min_workers": 1,
        "max_workers": 4
    },
    "cluster_name": "....",
    "spark_version": "11.3.x-scala2.12",
    "spark_conf": {
        "spark.locality.wait": "1800s"
    },
    "azure_attributes": {
        "first_on_demand": 1,
        "availability": "ON_DEMAND_AZURE",
        "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_F8s",
    "driver_node_type_id": "Standard_F8s",
    "ssh_public_keys": [],
    "custom_tags": {},
    "cluster_log_conf": {
        "dbfs": {
            "destination": "dbfs:/cluster-logs"
        }
    },
    "spark_env_vars": {},
    "autotermination_minutes": 0,
    "enable_elastic_disk": true,
    "cluster_source": "UI",
    "init_scripts": [],
    "cluster_id": "....."
}

Thanks for helping

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

From 11.3, the installation has changed. Before, it was in the control plane, and now it is in the data plane. So basically, installation is done entirely on the cluster. Maybe because of routing settings in your Azure environment, your cluster doesn't have access to https://repo.maven.apache.org/ (before it was in the control plane, so there was no problem as routing was from the databricks side, but that architecture was not correct. That'st's why it was changed)

But then I would not be able to install anything else, because other libraries are installed from the same repository (e.g. com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22)

phisolani
New Contributor II

I have the same problem with a slightly different version of the connector (change on the minor version). I have a job that runs every hour and specifically, this started to happen on the 23rd of January onwards. The error indeed does say the same:

Run result unavailable: job failed with error message
 Library installation failed for library due to user error for maven {
  coordinates: "com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.6"
}
. Error messages:
Library installation attempted on the driver node of cluster 0123-210011-798kk68l and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File file:/local_disk0/tmp/clusterWideResolutionDir/maven/ivy/jars/io.netty_netty-transport-native-kqueue-4.1.59.Final.jar does not exist

I've tried installing "io.netty:netty-transport-native-kqueue:4.1.59.Final" from maven and it worked. However, when installing "com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.6" it shows the following error, pointing to a different jar file:

Library installation attempted on the driver node of cluster 0131-110547-yen1qo6z and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File file:/local_disk0/tmp/clusterWideResolutionDir/maven/ivy/jars/io.netty_netty-resolver-dns-native-macos-4.1.59.Final.jar does not exist

Also, I can confirm that other libraries can be installed via maven, e.g., "com.databricks:databricks-jdbc:2.6.32" (see screenshot below)

image

Hello @Pedro Heleno Isolaniโ€‹ ,

my temporary workaround is manually install missing libs/jars:

io.netty:netty-transport-native-kqueue:4.1.59.Final

io.netty:netty-resolver-dns-native-macos:4.1.59.Final

io.netty:netty-transport-native-epoll:4.1.59.Final.

The main reason for the error is that during installation of kusto-spark-lib the "netty*" jars are downloaded with different file names:

netty-transport-native-kqueue-4.1.59.Final-osx-x86_64.jar

BR

Thanks @blackcoffee ARโ€‹! The workaround did work for me!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group