How can I use Non- Spark related libraries like spacy with Databricks and Spark
I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...
- 895 Views
- 1 replies
- 0 kudos
Latest Reply
It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...
- 0 kudos