Rasterio on shared/standard cluster has no access to proj.db

der
Valued Contributor

We try to use rasterio on a Databricks shared/standard cluster with DBR 17.1. Rasterio is directly installed on the cluster as library. 

Code:

import rasterio
rasterio.show_versions()

Output: 

rasterio info:
rasterio: 1.4.3
GDAL: 3.9.3
PROJ: 9.4.1
GEOS: 3.11.1
PROJ DATA: /databricks/native/proj-data
GDAL DATA: /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/rasterio/gdal_data

System:
python: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
executable: /local_disk0/.ephemeral_nfs/envs/pythonEnv-b2347f39-219b-43b3-a5db-676ce38e43ca/bin/python
machine: Linux-5.15.0-1092-azure-x86_64-with-glibc2.39

Python deps:
affine: 2.4.0
attrs: 24.3.0
certifi: 2025.01.31
click: 8.1.7
cligj: 0.7.2
cython: 3.0.12
numpy: 2.1.3
click-plugins: None
setuptools: 74.0.0 

Test script:

import numpy as np
from rasterio.io import MemoryFile
from rasterio.transform import from_origin

meta = {
    "driver": "GTiff",
    "height": 1, "width": 1, "count": 1,
    "dtype": "uint8",
    "crs": "EPSG:2056", # <-- forces an EPSG lookup in proj.db
    "transform": from_origin(0, 1, 1, 1),
}

with MemoryFile() as mem:
    with mem.open(**meta) as ds:
        ds.write(np.zeros((1, 1, 1), dtype="uint8"))

CRSError: The EPSG code is unknown. PROJ: internal_proj_create_from_database: Cannot find proj.db

So we have no access to /databricks/native/proj-data where the proj.db is stored.

I am aware, that there are possible hacks and workarounds:

  1. putting the proj.db somewhere where users have access and than work with environment variables as PROJ_DATA
  2. write the test script that no lookup is needed

However, this is just working around the issue. Dear Databricks could you please change this access rights or give me more insights, why it is the right way?