Databricks Community

data_boy_2022 · ‎09-09-2022

I want to import data using the autoloader from a S3 bucket into a table which is managed inside a Unity Catalog.

Right now, I run the code on an interactive cluster inside a notebook. In the future the code should run in a job cluster.

The error I get is the following:

py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.streaming.DataStreamReader org.apache.spark.sql.streaming.DataStreamReader.format(java.lang.String) is not whitelisted on class class org.apache.spark.sql.streaming.DataStreamReader

What have I tried so far:

Enabling credential passthrough on the cluster:

=> Doesn't work since Unity Catalog can't be used with this option

I also tried setup an external location as described here:

https://docs.databricks.com/data-governance/unity-catalog/manage-external-locations-and-credentials....

On top, I have found this article but the solution is not actionable for me:

https://kb.databricks.com/en_US/streaming/readstream-is-not-whitelisted

Can anybody help?

Tian · ‎09-19-2022

Hi!

Databricks recently released the documentation on using Unity Catalog with Structured Streaming: https://docs.databricks.com/structured-streaming/unity-catalog.html

Per document requirement, for both interactive notebooks and scheduled jobs, you must use single user clusters for Structured Streaming on Unity Catalog. Python and Scala are supported. Could you verify if the cluster access model is single user?

View solution in original post

Tian · ‎09-19-2022

Hi!

Databricks recently released the documentation on using Unity Catalog with Structured Streaming: https://docs.databricks.com/structured-streaming/unity-catalog.html

Per document requirement, for both interactive notebooks and scheduled jobs, you must use single user clusters for Structured Streaming on Unity Catalog. Python and Scala are supported. Could you verify if the cluster access model is single user?