@sahil07, the fact that you could read a csv file using spark.read.csv() is because you're using the spark native API to access the dbfs, which works just fine. The PDF reading was not possible because PyPDF2 does not use the spark API, but python st...
@sahil07, It seems that with your current setup, you can't read from DBFS using vanilla Python. I've ran some tests and managed to reproduce the error and solve it by copying the file in DBFS to the local file system of the driver node using dbutils....
Hi @sahil07!
As you are reading using PyPDF2, which does not use the spark API to read data, you should use "/dbfs/FileStore/sahil_chowdhurry.pdf" instead of "dbfs:/FileStore/sahil_chowdhurry.pdf".
As a general rule of thumb: If you are using reader...