Databricks

johnb1 · ‎11-30-2022

I am doing the "Data Engineering with Databricks V2" learning path.

I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:

%run ../Includes/Classroom-Setup-04.2

Screenshot 1:

Inside the setup notebook, the code crashes at the following command (see screenshot 2):

df = pd.read_parquet(path = datasource_path.replace("dbfs:/", '/dbfs/'))

The error message is:

FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/mnt/dbacademy-datasets/data-engineering-with-databricks/v02/ecommerce/raw/users-historical'

Screenshot 2:

There seems to be an issue with the path, even though it actually exists:

Screenshot 3:

I played around a little with the path specification, but nothing helped:

Screenshot 4:

UmaMahesh1 · ‎11-30-2022

Hi @John B

Can you please try by removing the dbfs and starting with /mnt only.

Also, if this does not work, can you please upload that notebooks DBC archive, so that I would be able to check the details.

Cheers..

johnb1 · ‎12-16-2022

Hi @Uma Maheswara Rao Desula

Removing the dbfs and starting with /mnt only does not help.

Br.

UmaMahesh1 · ‎11-30-2022

Also @John B

Assuming this is an old training course, check the same using a community cluster with DBR version less than 7. Some old training courses mount points are disabled in DBR 7+.

Cheers...

UmaMahesh1 · ‎12-03-2022

@John B

Did your issue get resolved?

If not through the above methods, do ping the fix you did.

Cheers..

johnb1 · ‎12-16-2022

@Uma Maheswara Rao Desula I solved the issue using ss2's suggestion (see below). After reading in a Spark DataFrame I converted it into a pandas DataFrame using the ToPandas() method.

johnb1 · ‎12-16-2022

Hi!

I can only use Runtime 7.3, 9.1., ..., 12.0. Minimum is 7.3. I am using DBR commnunity edition.

Br.

SS2 · ‎12-03-2022

Can u try like this.spark.read.parquet("dbfs:/mnt/.......")

johnb1 · ‎12-16-2022

Hi @S S

Reading in the file was successful. However, I got a pyspark.sql.dataframe.DataFrame object. This is not the same as a pandas DataFrame, right?

Br.

Aviral-Bhardwaj · ‎12-16-2022

Hey @S S ,

I can understand your issue

so to solve this import that DBC file and instead of question one there will be a folder for all solutions so explore solution one it will work.

Please upvote if you got some hint from my answer

Thanks

Aviral Bhardwaj

smkazim · ‎03-29-2023

Hello All,

I am getting the exact issue as motioned in the first pot here. I have tried all the solutions listed: -

Changing DBR to 7.3: Gave other errors related to libraries not present in that DBR version
Using spark.read.parquet: This is giving "AnalysisException: Unable to infer schema for Parquet. It must be specified manually." error. I have checked the parquet files exists in that location and they are not empty.
Exploring solutions folder: It is giving the same errors.

Any ideas what else I can try please.

Thanks.