I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.
Generating the SAS token has been restricted in our environment due to security issues.
The below script can read out the name of pdf files in the folder.
pdf_path = "abfss:datalakename.dfs.core.windows.net/<container folder path>"
pdf_df = spark.read.format("binaryFile").load(pdf_path).cache()
display(pdf_df)
However, after above step finding difficulty in passing the pdf file to formrecognizer function.
So, if anyone has tried implementing the PDF file reading from Azure Datalake to Databricks, Please help me with the script or the way to do it.
Many thanks in advance!
Best Regards,
Punith Raj