Hi,
I need to ingest and transform historical climate data into a Delta table. The data is stored in .nc format (NetCDF). To work with this format, specific C libraries for Python are required, along with particular versions of Python libraries (e.g., numpy).
On my local machine, I resolved this using Anaconda, which installed the necessary libraries (xarray, netCDF4) and handled all dependencies seamlessly.
However, I'm encountering issues when trying to achieve the same on a Databricks cluster:
- Upgrading certain libraries (e.g., numpy) causes dependency conflicts, breaking the cluster's functionality.
I came across the Databricks Container Service, which seems to allow customization by using custom containers.
https://docs.databricks.com/en/compute/custom-containers.html#enable
Is this the only way to install xarray, netCDF4, and upgrade pre-installed libraries? Are there alternative approaches to handle this without compromising the cluster's stability?
Any help or guidance would be much appreciated!
Thanks!