cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

AnalysisException: [ErrorClass=INVALID_PARAMETER_VALUE] Missing cloud file system scheme

Madison
New Contributor II

I am trying to follow along Apache Spark Programming training module where the instructor creates events table from a parquet file like this:

%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/mnt/training/ecommerce/events/events.parquet");

When I tried to run the above command, I got the following error message:

AnalysisException: [RequestId=... ErrorClass=INVALID_PARAMETER_VALUE] Missing cloud file system scheme
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-644583705732552> in <cell line: 1>()
      5     display(df)
      6     return df
----> 7   _sqldf = ____databricks_percent_sql()
      8 finally:
      9   del ____databricks_percent_sql

<command-644583705732552> in ____databricks_percent_sql()
      2   def ____databricks_percent_sql():
      3     import base64
----> 4     df = spark.sql(base64.standard_b64decode("...=").decode())
      5     display(df)
      6     return df

/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
     46             start = time.perf_counter()
     47             try:
---> 48                 res = func(*args, **kwargs)
3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @MadisonThe error message is due to a missing cloud file system scheme in the path you're providing to the Parquet file. In Databricks, when reading data from cloud storage like AWS S3, Azure data lake Storage, or Google Cloud Storage, you must include the scheme corresponding to the cloud storage.

Madison
New Contributor II

@Kaniz_Fatma Thanks for your response. I didn't provide cloud file system scheme in the path while creating the table using DataFrame API, but I was still able to create the table. 

 

%python
# File location and type
file_location = "/mnt/training/ecommerce/users/users.parquet"
file_type = "parquet"

df = spark.read.format(file_type) \
  .load(file_location)

display(df)

temp_table_name = "test_catalog.test_schema.users"
df.createOrReplaceTempView(temp_table_name)

 

When I provided the scheme in SQL, I got the following error:

 

%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "s3://mnt/training/ecommerce/events/events.parquet");

AnalysisException: No parent external location found for path 's3://mnt/training/ecommerce/events/events.parquet'

 

 

Hi @Madison , The error message "AnalysisException: No parent external location found for path 's3://mnt/training/ecommerce/events/events.parquet'" indicates that the system cannot find the specified S3 path. This could be due to several reasons.

1.  The path does not exist: You should check to make sure that the S3 path is correct and that the file 'events.parquet' exists in the specified location.

2. Incorrect permissions: You may not have the necessary permissions to access the file or directory. Check your permissions and make sure you have read access to the S3 bucket.

3. Network issues: There could be a problem with the network connection between the Databricks workspace and the S3 bucket.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!