01-23-2023 09:02 PM
Trying to create table in minio bucket using databricks.
spark.sql("create database if not exists minio_db_1 managed location 's3a://my-bucket/minio_db_1'");
I am passing the s3 configurations using spark context.
access_key = 'XXXX'
secret_key = 'XXXXXXX'
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", access_key)
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", secret_key)
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "http://my-ip:9000")
With this config I am able to write data in s3 using
df.write.format("parquet").save("s3a://my-bucket/file-path");
But it's throwing exception when I m trying to create table/database;
spark.sql("create database if not exists minio_db_1 managed location 's3a://my-bucket/minio_db_1'");
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://my-bucket/my-database: getFileStatus on s3a://test2/minio_db_1: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://test2.s3.us-east-1.amazonaws.com minio_db_1 {} Hadoop 3.3.4, aws-sdk-java/1.12.189 Linux/5.4.0-1093-aws OpenJDK_64-Bit_Server_VM/25.345-b01 java/1.8.0_345 scala/2.12.14 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: 6YBEAZY59EYGAEVB, Extended Request ID: o+h6YBGczQmWsnFMW8kLGi+llJ+v3ysqoz05fnNYTH901+ACgmi5x50dE2ekXbNrr3qQf81uOx8=, Cloud Provider: AWS, Instance ID: i-072d1969af3c17cb6 (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 6YBEAZY59EYGAEVB; S3 Extended Request ID: o+h6YBGczQmWsnFMW8kLGi+llJ+v3ysqoz05fnNYTH901+ACgmi5x50dE2ekXbNrr3qQf81uOx8=; Proxy: null), S3 Extended Request ID: o+h6YBGczQmWsnFMW8kLGi+llJ+v3ysqoz05fnNYTH901+ACgmi5x50dE2ekXbNrr3qQf81uOx8=:403 Forbidden)
The request should routed to the s3a endpoint, but it's routing to the generic s3 endpoint. Somehow spar.sql not honouring the spark context configurations.
Can anyone please point out the configs lacking here for table creation?
01-23-2023 09:34 PM
Hi @Wasim Reza I think there are some permission issue while creating the database-
(message:Got exception: java.nio.file.AccessDeniedException s3a://my-bucket/my-database: getFileStatus on s3a://test2/minio_db_1:)
01-23-2023 11:15 PM
Hi, what if , if you change s3a://<> to s3://<>?
01-24-2023 12:46 AM
Hi @Wasim Reza
Why is fs.s3a.endpoint pointing to http://my-ip:9000? Can you verify if this is the right aws endpoint?
Is there any instance profile attached to the cluster? access-secret keys along with instance profile can be confusing.
Verify the permissions on the AWS side.
01-24-2023 01:30 AM
@Vivian Wilfred I am using Minio as a s3 provider, Minio apis are s3 compatable it only has different endpoint.
01-24-2023 02:51 AM
@Wasim Reza Can you try setting it to s3.amazonaws.com or https://s3.<region>.amazonaws.com ?
01-24-2023 03:46 AM
@Vivian Wilfred right now by default its pointing to the https://test2.s3.us-east-1.amazonaws.com [from the logs]. If we use s3.<region> it will not find the bucket as the bucket is in minio cluster not in the s3 region
01-24-2023 10:45 AM
MANAGED LOCATION is for Unity Catalog. Please check if you are under the unity catalog, not under hive metastore. Additionally, with Unity, you are not using sc._jsc.hadoopConfiguration() etc. but just register storage credentials and external location in metastore first.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group