<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Rasterio on shared/standard cluster has no access to proj.db in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135745#M50420</link>
    <description>&lt;P&gt;Exactly that part. In shared access mode they specifically forbidden access to some paths. Of course you can complain about this but they won't change this behavior because your library doesn't work as expected - especially since other approaches - like using Volumes or other access mode are available.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 22 Oct 2025 16:22:38 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-10-22T16:22:38Z</dc:date>
    <item>
      <title>Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135696#M50402</link>
      <description>&lt;P&gt;We try to use rasterio on a&amp;nbsp;Databricks shared/standard cluster with&amp;nbsp;DBR 17.1. Rasterio is directly installed on the cluster as library.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Code:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import rasterio
rasterio.show_versions()&lt;/LI-CODE&gt;&lt;P&gt;Output:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;rasterio info:&lt;BR /&gt;rasterio: 1.4.3&lt;BR /&gt;GDAL: 3.9.3&lt;BR /&gt;PROJ: 9.4.1&lt;BR /&gt;GEOS: 3.11.1&lt;BR /&gt;PROJ DATA: /databricks/native/proj-data&lt;BR /&gt;GDAL DATA: /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.12/site-packages/rasterio/gdal_data&lt;BR /&gt;&lt;BR /&gt;System:&lt;BR /&gt;python: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]&lt;BR /&gt;executable: /local_disk0/.ephemeral_nfs/envs/pythonEnv-b2347f39-219b-43b3-a5db-676ce38e43ca/bin/python&lt;BR /&gt;machine: Linux-5.15.0-1092-azure-x86_64-with-glibc2.39&lt;BR /&gt;&lt;BR /&gt;Python deps:&lt;BR /&gt;affine: 2.4.0&lt;BR /&gt;attrs: 24.3.0&lt;BR /&gt;certifi: 2025.01.31&lt;BR /&gt;click: 8.1.7&lt;BR /&gt;cligj: 0.7.2&lt;BR /&gt;cython: 3.0.12&lt;BR /&gt;numpy: 2.1.3&lt;BR /&gt;click-plugins: None&lt;BR /&gt;setuptools: 74.0.0&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;Test script:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import numpy as np
from rasterio.io import MemoryFile
from rasterio.transform import from_origin

meta = {
    "driver": "GTiff",
    "height": 1, "width": 1, "count": 1,
    "dtype": "uint8",
    "crs": "EPSG:2056", # &amp;lt;-- forces an EPSG lookup in proj.db
    "transform": from_origin(0, 1, 1, 1),
}

with MemoryFile() as mem:
    with mem.open(**meta) as ds:
        ds.write(np.zeros((1, 1, 1), dtype="uint8"))&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN class=""&gt;CRSError: &lt;/SPAN&gt;&lt;SPAN&gt;The EPSG code is unknown. PROJ: internal_proj_create_from_database: Cannot find proj.db&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;So we have no access to&amp;nbsp;&lt;STRONG&gt;/databricks/native/proj-data&lt;/STRONG&gt; where the &lt;STRONG&gt;proj.db&lt;/STRONG&gt; is stored.&lt;/P&gt;&lt;P&gt;I am aware, that there are possible hacks and workarounds:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;putting the proj.db somewhere where users have access and than work with environment variables as&amp;nbsp;&lt;SPAN&gt;PROJ_DATA&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;write the test script that no lookup is needed&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;However, this is just working around the issue.&amp;nbsp;Dear Databricks could you please change this access rights or give me more insights, why it is the right way?&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 13:01:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135696#M50402</guid>
      <dc:creator>der</dc:creator>
      <dc:date>2025-10-22T13:01:00Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135701#M50404</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/154226"&gt;@der&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;I guess this is related to limitation of standard/shared cluster access mode which you're using.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/compute/standard-limitations#network-and-file-system-limitations" target="_blank"&gt;https://docs.databricks.com/aws/en/compute/standard-limitations#network-and-file-system-limitations&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can try to go with dedicated access mode or is that's not an option for you then setup path to proj.db to i.e UC Volume which should be accessible by standard access mode.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 13:29:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135701#M50404</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-10-22T13:29:41Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135719#M50414</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes on dedicated access mode it works fine, because we have access to&amp;nbsp;&lt;STRONG&gt;/databricks/native/proj-data&lt;/STRONG&gt;.&amp;nbsp;From a cost perspective we want to stay on&amp;nbsp;standard/shared cluster.&lt;/P&gt;&lt;P&gt;Pushing proj.db to UC volume and change the environment path is what I meant with hack/workaround 1.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I still do not get it why the "proj-data" is in a path, where we have no "read" access from a standard cluster?&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 14:57:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135719#M50414</guid>
      <dc:creator>der</dc:creator>
      <dc:date>2025-10-22T14:57:52Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135729#M50415</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/154226"&gt;@der&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Can you try adding this in your test script.&lt;BR /&gt;&lt;BR /&gt;import os&lt;/P&gt;&lt;P&gt;os.environ["PROJ_LIB"]="/databricks/native/proj-data"&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Hope users have access to this path&amp;nbsp;/databricks/native/proj-data&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 15:16:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135729#M50415</guid>
      <dc:creator>Chiran-Gajula</dc:creator>
      <dc:date>2025-10-22T15:16:07Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135731#M50416</link>
      <description>&lt;P&gt;This is explained in the limitations section in my previous answer.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 15:21:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135731#M50416</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-10-22T15:21:56Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135733#M50417</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/193103"&gt;@Chiran-Gajula&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Exactly this is the issue. On standard cluster there is&amp;nbsp;&lt;SPAN&gt;no access to&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;/databricks/native/proj-data&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;where the&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;proj.db&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;is stored.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;So your proposed solution won't work.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 15:22:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135733#M50417</guid>
      <dc:creator>der</dc:creator>
      <dc:date>2025-10-22T15:22:41Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135742#M50418</link>
      <description>&lt;P&gt;To make it easier:&amp;nbsp;&lt;/P&gt;&lt;P&gt;"Standard compute runs commands as a low-privilege user forbidden from accessing sensitive parts of the filesystem.&lt;/P&gt;&lt;P&gt;POSIX-style paths (/) for DBFS are not supported."&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 15:54:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135742#M50418</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-10-22T15:54:00Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135743#M50419</link>
      <description>&lt;P&gt;Which limitations do you mean?&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Standard compute runs commands as a low-privilege user forbidden from accessing sensitive parts of the filesystem.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Not sure if proj-data is sensitive data.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="der_0-1761147625449.png" style="width: 588px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20962i6F27EAA5E6399DD6/image-dimensions/588x97?v=v2" width="588" height="97" role="button" title="der_0-1761147625449.png" alt="der_0-1761147625449.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;So for Databricks would be simple to change the group to "spark-users" and change the rights.&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;sudo chown -R root:spark-users native&lt;BR /&gt;sudo chmod -R 750 native&lt;BR /&gt;sudo chmod g+s native&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;No workaround would be needed. They did the same for "licenses", "python3", ....&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 15:54:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135743#M50419</guid>
      <dc:creator>der</dc:creator>
      <dc:date>2025-10-22T15:54:13Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135745#M50420</link>
      <description>&lt;P&gt;Exactly that part. In shared access mode they specifically forbidden access to some paths. Of course you can complain about this but they won't change this behavior because your library doesn't work as expected - especially since other approaches - like using Volumes or other access mode are available.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 16:22:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/135745#M50420</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-10-22T16:22:38Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/137921#M50832</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;Yes, we could replicate the content of /databricks/native/proj-data to a volume, cloud-native storage, dbfs (which we have deactivated by the way) or somewhere else. But after that, we have to change the environment variable PROJ_DATA&amp;nbsp;pointing to the new location. If we change the environment variable, we introduce new issues. Guess what the native &lt;A href="https://learn.microsoft.com/en-gb/azure/databricks/sql/language-manual/sql-ref-st-geospatial-functions" target="_self"&gt;Databricks geospatial function&lt;/A&gt; relies on the same environment variable and do not work anymore correctly!&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;And the library (rasterio) use PROJ system library as it should.&amp;nbsp;&lt;SPAN&gt;Additional Information about rasterio and &lt;/SPAN&gt;&lt;SPAN&gt;PROJ_DATA&lt;/SPAN&gt;&lt;DIV&gt;&lt;A title="https://rasterio.readthedocs.io/en/stable/faq.html#why-can-t-rasterio-find-proj-db-rasterio-from-pypi-versions-1-2-0" href="https://rasterio.readthedocs.io/en/stable/faq.html#why-can-t-rasterio-find-proj-db-rasterio-from-pypi-versions-1-2-0" target="_blank" rel="noopener"&gt;https://rasterio.readthedocs.io/en/stable/faq.html#why-can-t-rasterio-find-proj-db-rasterio-from-pypi-versions-1-2-0&lt;/A&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 06 Nov 2025 10:15:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/137921#M50832</guid>
      <dc:creator>der</dc:creator>
      <dc:date>2025-11-06T10:15:10Z</dc:date>
    </item>
    <item>
      <title>Re: Rasterio on shared/standard cluster has no access to proj.db</title>
      <link>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/137925#M50833</link>
      <description>&lt;P&gt;Current Workaround:&lt;BR /&gt;If you select the "Photon" engine on a Standard/Shared Cluster, they change the access rights of&amp;nbsp;&lt;STRONG&gt;/databricks/native/proj-data&amp;nbsp;&lt;/STRONG&gt;and rasterio works fine.&lt;/P&gt;&lt;P&gt;The downside:&lt;BR /&gt;Pay for "Photon" compute to use a Python library, which do not use Spark/Photon engine -&amp;gt; $$$&lt;/P&gt;&lt;P&gt;Future:&lt;BR /&gt;I talked to Databricks support and they will check if the engineering team can activate the same for Standard engine with Standard/Shared access mode, because the folder content is definitely not sensitive.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Nov 2025 10:29:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rasterio-on-shared-standard-cluster-has-no-access-to-proj-db/m-p/137925#M50833</guid>
      <dc:creator>der</dc:creator>
      <dc:date>2025-11-06T10:29:18Z</dc:date>
    </item>
  </channel>
</rss>

