Essential-PySpark-for-Scalable-Data-Analytics "wordcount-sql.ipynb"

ChristopherAlan — Mon, 20 Jan 2025 16:51:58 GMT

I'm working through the code at the following, but getting an error:

https://github.com/PacktPublishing/Essential-PySpark-for-Scalable-Data-Analytics/blob/main/Chapter01/wordcount-sql.ipynb

Code:
%sql DROP TABLE IF EXISTS word_counts; CREATE TABLE word_counts (word STRING) USING csv OPTIONS("delimiter"=" ") LOCATION "/databricks-datasets/README.md"

Error:
UnityCatalogServiceException: [RequestId=e8a962d8-82f1-48d0-9cb8-9758daefb92d ErrorClass=INVALID_PARAMETER_VALUE] GenerateTemporaryPathCredential uri /dbfs/databricks-datasets/README.md is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.

Details:
I'm using the Databricks Community Edition, with an Apache Spark Cluster on AWS for compute. I can see that the file is an internal file and is available in the list using "%fs ls /databricks-datasets/"

Questions:

Can anyone point me in the right direction on how to resolve this? I'd like to make sure I can properly work with SQL and the internal files provided with the community edition in order to complete all my learning objectives.

#community edition

Re: Essential-PySpark-for-Scalable-Data-Analytics "wordcount-sql.ipynb"

ChristopherAlan — Mon, 20 Jan 2025 16:58:29 GMT

Correction: The error message from the screenshot is when I tried to add the dbms: prefix to the URL. The error message without that prefix is the following:

UnityCatalogServiceException: [RequestId=dbda5aee-b855-9ed9-abf8-3ee0e0dcc938 ErrorClass=INVALID_PARAMETER_VALUE] GenerateTemporaryPathCredential uri /databricks-datasets/README.md is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.

I also tried the s3: prefix, but I realize this is not an externally hosted file, as all the literature says that this is an internal file.

topic Re: Essential-PySpark-for-Scalable-Data-Analytics "wordcount-sql.ipynb" in Databricks Free Edition Help

Essential-PySpark-for-Scalable-Data-Analytics "wordcount-sql.ipynb"

Re: Essential-PySpark-for-Scalable-Data-Analytics "wordcount-sql.ipynb"