Databricks Community

pg289 · ‎03-03-2025

I manage a large data lake of Iceberg tables stored on premise in S3 storage from MinIO. I need a Spark cluster to run ETL jobs. I decided to try Databricks as there were no other good options. However, I'm unable to properly access my tables or even raw files. Databricks is assuming and trying to connect as if it's AWS. Hence, the AWS style path access. I explicitly configured it to use S3A style access, but still it's not able to understand and fetch the file correctly. I have pasted some code snippets and error below. Any input on what I'm missing, or if anyone has previously connected to Databricks to non-AWS S3 storage, or is it not at all possible?

BUCKET = "s3a://test/file1"
conf.setAll([
    ("spark.hadoop.fs.s3a.endpoint", AWS_ENDPOINT),
    ("spark.hadoop.fs.s3a.path.style.access", "true"),
    ("spark.hadoop.fs.s3a.access.key", AWS_ACCESS_KEY),
    ("spark.hadoop.fs.s3a.secret.key", AWS_SECRET_KEY),
    ("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"),
    ("spark.hadoop.fs.s3a.impl.disable.cache", "false"),
    ("spark.hadoop.fs.s3a.connection.ssl.enabled", "true"),
    ("spark.hadoop.hadoop.rpc.protection", "privacy")
])

Py4JJavaError: An error occurred while calling o458.parquet.
: java.nio.file.AccessDeniedException: s3a://test/file1/fp.parquet: getFileStatus on s3a://test/file1/fp.parquet: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://test.s3.us-west-2.amazonaws.com file1/fp.parquet {} Hadoop 3.3.6, aws-sdk-java/1.12.638 Linux/5.15.0-1078-azure OpenJDK_64-Bit_Server_VM/17.0.11+9-LTS java/17.0.11 scala/2.12.15 kotlin/1.9.10 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest; credentials-provider: com.amazonaws.auth.AnonymousAWSCredentials credential-header: no-credential-header signature-present: false (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;;
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD 
File , line 3
----> 3 df = spark.read.parquet(s3_path)

SP_6721 · ‎04-02-2025

Not sure, but Databricks may default to AWS-style paths if the configurations are incomplete. Try setting the MinIO endpoint by configuring spark.hadoop.fs.s3a.endpoint to your MinIO server's URL. If MinIO uses HTTP, disable SSL by setting spark.hadoop.fs.s3a.connection.ssl.enabled to false.

It's 403 Forbidden error so check bucket permissions, endpoint settings, and ensure no conflicting AWS configurations are overriding your settings.

Databricks Community

How to connect to an on-premise implementation of S3 storage (such as Minio) in Databricks Notebooks

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! December 12 – 21, 2025

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

Celebrating Our First Brickster Champion: Louis Frolio