Databricks Community

OlehSemeniuk · ‎01-14-2025

Hi,

I need to ingest and transform historical climate data into a Delta table. The data is stored in .nc format (NetCDF). To work with this format, specific C libraries for Python are required, along with particular versions of Python libraries (e.g., numpy).

On my local machine, I resolved this using Anaconda, which installed the necessary libraries (xarray, netCDF4) and handled all dependencies seamlessly.

However, I'm encountering issues when trying to achieve the same on a Databricks cluster:

Upgrading certain libraries (e.g., numpy) causes dependency conflicts, breaking the cluster's functionality.

I came across the Databricks Container Service, which seems to allow customization by using custom containers.

https://docs.databricks.com/en/compute/custom-containers.html#enable

Is this the only way to install xarray, netCDF4, and upgrade pre-installed libraries? Are there alternative approaches to handle this without compromising the cluster's stability?

Any help or guidance would be much appreciated!

Thanks!

Walter_C · ‎01-14-2025

Using custom containers is generally the most stable and flexible approach to ensure all dependencies are correctly managed and do not interfere with the cluster's functionality.

View solution in original post

Walter_C · ‎01-14-2025

Using custom containers is generally the most stable and flexible approach to ensure all dependencies are correctly managed and do not interfere with the cluster's functionality.