It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataFrame of input. Then you just apply that pandas UDF to your data with Spark; Spark will automatically chunk your data into pandas DataFrames, apply your function, and handle the results.