cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

AnalysisException: [ErrorClass=INVALID_PARAMETER_VALUE] Missing cloud file system scheme

Madison
New Contributor II

I am trying to follow along Apache Spark Programming training module where the instructor creates events table from a parquet file like this:

%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/mnt/training/ecommerce/events/events.parquet");

When I tried to run the above command, I got the following error message:

AnalysisException: [RequestId=... ErrorClass=INVALID_PARAMETER_VALUE] Missing cloud file system scheme
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-644583705732552> in <cell line: 1>()
      5     display(df)
      6     return df
----> 7   _sqldf = ____databricks_percent_sql()
      8 finally:
      9   del ____databricks_percent_sql

<command-644583705732552> in ____databricks_percent_sql()
      2   def ____databricks_percent_sql():
      3     import base64
----> 4     df = spark.sql(base64.standard_b64decode("...=").decode())
      5     display(df)
      6     return df

/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
     46             start = time.perf_counter()
     47             try:
---> 48                 res = func(*args, **kwargs)
3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @MadisonThe error message is due to a missing cloud file system scheme in the path you're providing to the Parquet file. In Databricks, when reading data from cloud storage like AWS S3, Azure data lake Storage, or Google Cloud Storage, you must include the scheme corresponding to the cloud storage.

Madison
New Contributor II

@Kaniz Thanks for your response. I didn't provide cloud file system scheme in the path while creating the table using DataFrame API, but I was still able to create the table. 

 

%python
# File location and type
file_location = "/mnt/training/ecommerce/users/users.parquet"
file_type = "parquet"

df = spark.read.format(file_type) \
  .load(file_location)

display(df)

temp_table_name = "test_catalog.test_schema.users"
df.createOrReplaceTempView(temp_table_name)

 

When I provided the scheme in SQL, I got the following error:

 

%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "s3://mnt/training/ecommerce/events/events.parquet");

AnalysisException: No parent external location found for path 's3://mnt/training/ecommerce/events/events.parquet'

 

 

Kaniz
Community Manager
Community Manager

Hi @Madison , The error message "AnalysisException: No parent external location found for path 's3://mnt/training/ecommerce/events/events.parquet'" indicates that the system cannot find the specified S3 path. This could be due to several reasons.

1.  The path does not exist: You should check to make sure that the S3 path is correct and that the file 'events.parquet' exists in the specified location.

2. Incorrect permissions: You may not have the necessary permissions to access the file or directory. Check your permissions and make sure you have read access to the S3 bucket.

3. Network issues: There could be a problem with the network connection between the Databricks workspace and the S3 bucket.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.