cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Does Autoloader supports loading PDF files?

yit
Contributor III

I need to process PDF files already ingested. Based on the documentation, Autoloader does not support PDFs - or am I missing something?

Also, I've found this sparkPDF library in other discussions in the community, but from what I see it's only for batch streaming?

2 ACCEPTED SOLUTIONS

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @yit ,

Nope, here you can find supported types:

szymon_dybczak_0-1761573966308.png

 

View solution in original post

szymon_dybczak
Esteemed Contributor III
3 REPLIES 3

szymon_dybczak
Esteemed Contributor III

Hi @yit ,

Nope, here you can find supported types:

szymon_dybczak_0-1761573966308.png

 

yit
Contributor III

Any suggestions how to handle PDFs? @szymon_dybczak 

szymon_dybczak
Esteemed Contributor III

You can try to treat pdf as binary source format as suggested in below article.

Streaming Any File Type with Autoloader in Databricks: A Working Guide | by CanadianDataGuy.com | To...

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now