Install maven package to serverless cluster

Livingstone
New Contributor II

My task is to export data from CSV/SQL into Excel format with minimal latency. To achieve this, I used a Serverless cluster.

Since PySpark does not support saving in XLSX format, it is necessary to install the Maven package spark-excel_2.12. However, Serverless clusters do not allow the installation of additional libraries as regular clusters do. Therefore, I attempted to install it using the REST API.

 

headers = {
    'Authorization': f'Bearer {TOKEN}',
}

data = {
  "cluster_id": CLUSTER_ID,
  "libraries": [
    {
      "maven": {
        "coordinates": "com.crealytics:spark-excel_2.13:3.4.1_0.19.0"
      }
    }
  ]
}


response = requests.post(f'{HOST}/api/2.0/libraries/install', headers=headers, json=data)

 

But when I try to save the file in Excel format, it returns an error

 

[DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.crealytics.spark.excel. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version. SQLSTATE: 42K02

 

 

How can this issue be resolved? Are there any other ways to export an Excel file ASAP without waiting for the cluster to start up?