<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How can I use Non- Spark related libraries like spacy with Databricks and Spark in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/how-can-i-use-non-spark-related-libraries-like-spacy-with/m-p/24925#M1395</link>
    <description>&lt;P&gt;It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataFrame of input. Then you just apply that pandas UDF to your data with Spark; Spark will automatically chunk your data into pandas DataFrames, apply your function, and handle the results.&lt;/P&gt;</description>
    <pubDate>Thu, 17 Jun 2021 23:23:53 GMT</pubDate>
    <dc:creator>sean_owen</dc:creator>
    <dc:date>2021-06-17T23:23:53Z</dc:date>
    <item>
      <title>How can I use Non- Spark related libraries like spacy with Databricks and Spark</title>
      <link>https://community.databricks.com/t5/machine-learning/how-can-i-use-non-spark-related-libraries-like-spacy-with/m-p/24924#M1394</link>
      <description>&lt;P&gt;I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a library like spacy with Databricks/Spark?&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jun 2021 18:55:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-can-i-use-non-spark-related-libraries-like-spacy-with/m-p/24924#M1394</guid>
      <dc:creator>User16752239203</dc:creator>
      <dc:date>2021-06-11T18:55:55Z</dc:date>
    </item>
    <item>
      <title>Re: How can I use Non- Spark related libraries like spacy with Databricks and Spark</title>
      <link>https://community.databricks.com/t5/machine-learning/how-can-i-use-non-spark-related-libraries-like-spacy-with/m-p/24925#M1395</link>
      <description>&lt;P&gt;It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataFrame of input. Then you just apply that pandas UDF to your data with Spark; Spark will automatically chunk your data into pandas DataFrames, apply your function, and handle the results.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Jun 2021 23:23:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-can-i-use-non-spark-related-libraries-like-spacy-with/m-p/24925#M1395</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2021-06-17T23:23:53Z</dc:date>
    </item>
  </channel>
</rss>

