<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cluster library installation fails in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/cluster-library-installation-fails/m-p/49540#M28578</link>
    <description>&lt;P&gt;Sure &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;, here are the details:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Summary&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;1 Driver&lt;/SPAN&gt;&lt;SPAN class=""&gt;64&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;GB Memory,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;8&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Cores&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Runtime&lt;/SPAN&gt;&lt;SPAN class=""&gt;11.3.x-scala2.12&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Standard_L8s_v2&lt;/DIV&gt;&lt;DIV class=""&gt;2&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;DBU/h&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;SPAN&gt;Databricks Runtime Version:&amp;nbsp;11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Node type: Standard_L8s_v2&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Terminate after 10 minutes of inactivity.&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Is that helpful?&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 19 Oct 2023 15:19:27 GMT</pubDate>
    <dc:creator>jgen17</dc:creator>
    <dc:date>2023-10-19T15:19:27Z</dc:date>
    <item>
      <title>Cluster library installation fails</title>
      <link>https://community.databricks.com/t5/data-engineering/cluster-library-installation-fails/m-p/49515#M28572</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;I get a weird error when installing additional libraries in my cluster.&lt;/P&gt;&lt;P&gt;I have a predefined Databricks cluster (Standard_L8s_v2) as a Compute instance. I run pipelines on that cluster in Azure ADF. The pipeline consists several tasks. The tasks run Python code.&lt;/P&gt;&lt;P&gt;I install my Python code with a prebuilt wheel. Additionally I need to add four more libraries to the Tasks - Settings - Additional Libraries and install them with pip. This step is necessary, so that pytorch (one of the four libraries) is installed with GPU support, as the libraries and dependencies of the wheel are defined with poetry.&amp;nbsp;&lt;/P&gt;&lt;P&gt;But the library installation fails regularly. It does not always fail for the same task on the same day. Sometimes it fails for Task1 on day1 and the other day for Task2 on day2. Sometimes all succeed and sometimes all fail.&lt;/P&gt;&lt;P&gt;Here's the error message:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;run failed with error message Library installation failed for library due to user error for pypi { package: "sentence-transformers==2.2.2" } Error messages: Library installation failed after PENDING for 10 minutes since cluster entered RUNNING state. Error Code: CHAUFFEUR_RPC_SERVER_UNAVAILABLE. Library request cannot reach driver node on cluster 0511-114900-l5r08j93. This could be caused by network connectivity to the driver node being temporarily down. If this doesn't self correct in a while, please check your network settings or contact Databricks Support.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;What I suspected that the configuration of the cluster:&amp;nbsp;&lt;EM&gt;Terminate after 10&amp;nbsp;&lt;/EM&gt;&lt;SPAN&gt;&lt;EM&gt;minutes of inactivity&lt;/EM&gt;. The assumption I had was that the cluster is not in RUNNING state during the time of installing the libraries in the appended libraries section. Does that make sense?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I increased the time to 20 and 30 minutes but it still sometimes fails. It seems works more stable when increasing it to 40 minutes. But the results I have here are not really validated. It also more regularly fails if the cluster is triggered by an automatic trigger than when starting the pipeline manually (I don't understand why).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Does anyone have an idea why the library installation fails? Let me know if you need further context!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks for your help. Really appreciated!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2023 08:06:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cluster-library-installation-fails/m-p/49515#M28572</guid>
      <dc:creator>jgen17</dc:creator>
      <dc:date>2023-10-19T08:06:56Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster library installation fails</title>
      <link>https://community.databricks.com/t5/data-engineering/cluster-library-installation-fails/m-p/49535#M28576</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thanks for your response. So that's the last 5 minutes of Log4J output, when libraries are installed. Is that what you mean? It fails quite exactly after 10 minutes of starting the Log4J logs. So it might be that the cluster is not in active/running state as long as the libraries as installed and therefore it is shut down?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;23/10/19 12:10:11 INFO PoolingHiveClient: Hive metastore connection pool implementation is HikariCP&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:11 INFO LocalHiveClientsPool: Create Hive Metastore client pool of size 1&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:11 INFO DriverCorral: DBFS health check ok&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:12 INFO HiveClientImpl: Warehouse location for Hive client (version 0.13.1) is dbfs:/user/hive/warehouse&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:12 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:12 INFO ObjectStore: ObjectStore, initialize called&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:12 INFO Persistence: Property datanucleus.fixedDatastore unknown - will be ignored&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:12 INFO Persistence: Property datanucleus.connectionPool.idleTimeout unknown - will be ignored&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:12 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:12 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:12 INFO HikariDataSource: HikariPool-1 - Started.&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:13 INFO HikariDataSource: HikariPool-2 - Started.&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:13 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:16 INFO ObjectStore: Initialized ObjectStore&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:16 INFO HiveMetaStore: Added admin role in metastore&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:16 INFO HiveMetaStore: Added public role in metastore&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:16 INFO HiveMetaStore: No user is added in admin role, since config is empty&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:16 INFO HiveMetaStore: 0: get_database: default&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:16 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:17 INFO HiveMetaStore: 0: get_database: default&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:17 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:17 INFO DriverCorral: Metastore health check ok&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:43 INFO SharedDriverContext: Successfully attached library dbfs:/mnt/cddm-DEV/application/MIR_task/cddm-0.x-py3-none-any.whl to Spark&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:43 INFO LibraryState: [Thread 132] Successfully attached library dbfs:/mnt/cddm-DEV/application/MIR_task/cddm-0.x-py3-none-any.whl&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:43 INFO SharedDriverContext: [Thread 132] attachLibrariesToSpark PythonPyPiPkgId(torch,Some(1.13.1),None,List())&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:43 INFO SharedDriverContext: Attaching Python lib: python-pypi;torch;;1.13.1; to clusterwide nfs path&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:10:43 INFO Utils: resolved command to be run: List(bash, /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip, install, torch==1.13.1, --disable-pip-version-check)&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:13:57 INFO SharedDriverContext: Successfully attached library python-pypi;torch;;1.13.1; to Spark&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:13:57 INFO LibraryState: [Thread 132] Successfully attached library python-pypi;torch;;1.13.1;&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:13:57 INFO SharedDriverContext: [Thread 132] attachLibrariesToSpark PythonPyPiPkgId(lightning,Some(2.0.2),None,List())&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:13:57 INFO SharedDriverContext: Attaching Python lib: python-pypi;lightning;;2.0.2; to clusterwide nfs path&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:13:57 INFO Utils: resolved command to be run: List(bash, /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip, install, lightning==2.0.2, --disable-pip-version-check)&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:14:56 INFO DataSourceFactory$: DataSource Jdbc URL: jdbc:mariadb://consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com:3306/organization257243788442763?useSSL=true&amp;amp;sslMode=VERIFY_CA&amp;amp;disableSslHostnameVerification=true&amp;amp;trustServerCertificate=false&amp;amp;serverSslCert=/databricks/common/mysql-ssl-ca-cert.crt&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:14:56 INFO HikariDataSource: metastore-monitor - Starting...&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:14:56 INFO HikariDataSource: metastore-monitor - Start completed.&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:14:56 INFO HikariDataSource: metastore-monitor - Shutdown initiated...&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:14:56 INFO HikariDataSource: metastore-monitor - Shutdown completed.&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:14:56 INFO MetastoreMonitor: Metastore healthcheck successful (connection duration = 194 milliseconds)&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:09 INFO SharedDriverContext: Successfully attached library python-pypi;lightning;;2.0.2; to Spark&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:09 INFO LibraryState: [Thread 132] Successfully attached library python-pypi;lightning;;2.0.2;&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:09 INFO SharedDriverContext: [Thread 132] attachLibrariesToSpark PythonPyPiPkgId(pytorch-lightning,Some(2.0.2),None,List())&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:09 INFO SharedDriverContext: Attaching Python lib: python-pypi;pytorch-lightning;;2.0.2; to clusterwide nfs path&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:09 INFO Utils: resolved command to be run: List(bash, /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip, install, pytorch-lightning==2.0.2, --disable-pip-version-check)&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:11 INFO DriverCorral: DBFS health check ok&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:11 INFO HiveMetaStore: 0: get_database: default&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:11 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:11 INFO DriverCorral: Metastore health check ok&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:15 INFO SharedDriverContext: Successfully attached library python-pypi;pytorch-lightning;;2.0.2; to Spark&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:15 INFO LibraryState: [Thread 132] Successfully attached library python-pypi;pytorch-lightning;;2.0.2;&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:15 INFO SharedDriverContext: [Thread 132] attachLibrariesToSpark PythonPyPiPkgId(sentence-transformers,Some(2.2.2),None,List())&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:15 INFO SharedDriverContext: Attaching Python lib: python-pypi;sentence-transformers;;2.2.2; to clusterwide nfs path&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;23/10/19 12:15:15 INFO Utils: resolved command to be run: List(bash, /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip, install, sentence-transformers==2.2.2, --disable-pip-version-check)&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Oct 2023 13:05:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cluster-library-installation-fails/m-p/49535#M28576</guid>
      <dc:creator>jgen17</dc:creator>
      <dc:date>2023-10-19T13:05:38Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster library installation fails</title>
      <link>https://community.databricks.com/t5/data-engineering/cluster-library-installation-fails/m-p/49540#M28578</link>
      <description>&lt;P&gt;Sure &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;, here are the details:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Summary&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;1 Driver&lt;/SPAN&gt;&lt;SPAN class=""&gt;64&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;GB Memory,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;8&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Cores&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Runtime&lt;/SPAN&gt;&lt;SPAN class=""&gt;11.3.x-scala2.12&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Standard_L8s_v2&lt;/DIV&gt;&lt;DIV class=""&gt;2&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;DBU/h&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;SPAN&gt;Databricks Runtime Version:&amp;nbsp;11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Node type: Standard_L8s_v2&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Terminate after 10 minutes of inactivity.&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Is that helpful?&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 19 Oct 2023 15:19:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cluster-library-installation-fails/m-p/49540#M28578</guid>
      <dc:creator>jgen17</dc:creator>
      <dc:date>2023-10-19T15:19:27Z</dc:date>
    </item>
  </channel>
</rss>

