cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot install com.microsoft.azure.kusto:kusto-spark

blackcoffeeAR
Contributor

Hello,

I'm trying to install/update the library com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.x

Tried to install with Maven central repository and using Terraform.

It was working previously and now the installation always ends with error:

│ Error: cannot create library: mvn:com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.7 failed: Library installation attempted on the driver node of cluster ...... and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File file:/local_disk0/tmp/clusterWideResolutionDir/maven/ivy/jars/io.netty_netty-transport-native-kqueue-4.1.59.Final.jar does not exist

I'm not sure if it is Databricks or Maven or library issue.

Install other libraries e.g. com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22 works.

Clusters configuration:

{
    "autoscale": {
        "min_workers": 1,
        "max_workers": 4
    },
    "cluster_name": "....",
    "spark_version": "11.3.x-scala2.12",
    "spark_conf": {
        "spark.locality.wait": "1800s"
    },
    "azure_attributes": {
        "first_on_demand": 1,
        "availability": "ON_DEMAND_AZURE",
        "spot_bid_max_price": -1
    },
    "node_type_id": "Standard_F8s",
    "driver_node_type_id": "Standard_F8s",
    "ssh_public_keys": [],
    "custom_tags": {},
    "cluster_log_conf": {
        "dbfs": {
            "destination": "dbfs:/cluster-logs"
        }
    },
    "spark_env_vars": {},
    "autotermination_minutes": 0,
    "enable_elastic_disk": true,
    "cluster_source": "UI",
    "init_scripts": [],
    "cluster_id": "....."
}

Thanks for helping

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

From 11.3, the installation has changed. Before, it was in the control plane, and now it is in the data plane. So basically, installation is done entirely on the cluster. Maybe because of routing settings in your Azure environment, your cluster doesn't have access to https://repo.maven.apache.org/ (before it was in the control plane, so there was no problem as routing was from the databricks side, but that architecture was not correct. That'st's why it was changed)

But then I would not be able to install anything else, because other libraries are installed from the same repository (e.g. com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22)

phisolani
New Contributor II

I have the same problem with a slightly different version of the connector (change on the minor version). I have a job that runs every hour and specifically, this started to happen on the 23rd of January onwards. The error indeed does say the same:

Run result unavailable: job failed with error message
 Library installation failed for library due to user error for maven {
  coordinates: "com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.6"
}
. Error messages:
Library installation attempted on the driver node of cluster 0123-210011-798kk68l and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File file:/local_disk0/tmp/clusterWideResolutionDir/maven/ivy/jars/io.netty_netty-transport-native-kqueue-4.1.59.Final.jar does not exist

I've tried installing "io.netty:netty-transport-native-kqueue:4.1.59.Final" from maven and it worked. However, when installing "com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.6" it shows the following error, pointing to a different jar file:

Library installation attempted on the driver node of cluster 0131-110547-yen1qo6z and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File file:/local_disk0/tmp/clusterWideResolutionDir/maven/ivy/jars/io.netty_netty-resolver-dns-native-macos-4.1.59.Final.jar does not exist

Also, I can confirm that other libraries can be installed via maven, e.g., "com.databricks:databricks-jdbc:2.6.32" (see screenshot below)

image

Hello @Pedro Heleno Isolani​ ,

my temporary workaround is manually install missing libs/jars:

io.netty:netty-transport-native-kqueue:4.1.59.Final

io.netty:netty-resolver-dns-native-macos:4.1.59.Final

io.netty:netty-transport-native-epoll:4.1.59.Final.

The main reason for the error is that during installation of kusto-spark-lib the "netty*" jars are downloaded with different file names:

netty-transport-native-kqueue-4.1.59.Final-osx-x86_64.jar

BR

Thanks @blackcoffee AR​! The workaround did work for me!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.