Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2021 08:31 AM
If you have familiarity with Scala you can use Tika. Tika is a wrapper around PDFBox. In case you want to use it in Databricks I suggest you to go through this blog and Git repo. For python based codes you may want to use PyPDF2 as a pandas UDF in Spark.