benchmark tpc-ds from external parquet hive structure in S#

hillel1 — Fri, 17 Nov 2023 19:49:13 GMT

Hi I am just getting started in databricks would appreciate some help here.

I have 10TB TPCDS in S3 i a hive partition structure.
My goal is to benchmark a data bricks cluster on this data.

after setting all IAM credentials according to this https://docs.databricks.com/en/data-governance/unity-catalog/manage-external-locations-and-credentials.html

I set bucket as external location in catalog I am trying now to load data from S3 but I am getting this error

> Error loading files.

> parent external location for path `s3://326989250725-datasets/` does not exist.

what is the issue here? In general is this the correct approach? I would really like to use a hive like command to create external tables in s3 then execute on them in spark.

topic benchmark tpc-ds from external parquet hive structure in S# in Get Started Discussions

benchmark tpc-ds from external parquet hive structure in S#