cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Solution for - "PythonException: 'ModuleNotFoundError: No module named 'spacy'

Vicky1215
New Contributor II

I am actually trying to extract the adjective and noun phrases from the text column in spark data frame for which I've written the udf and applying on cleaned text column. However, I am getting this error.

from pyspark.sql.functions import udf

from pyspark.sql.types import ArrayType, StringType

import spacy

# Load spacy model

nlp = spacy.load("en_core_web_sm")

# Define UDF to extract key phrases

def extract_adjective_noun_key_phrases(text):

  doc = nlp(text)

  key_phrases = []

  for token in doc:

    if (token.pos_ == "ADJ" and token.nbor().pos_ == "NOUN") or (token.pos_ == "NOUN" and token.nbor().pos_ == "ADJ"):

      key_phrases.append(token.text + " " + token.nbor().text)

  return key_phrases

extract_adjective_noun_key_phrases_udf = udf(extract_adjective_noun_key_phrases, ArrayType(StringType()))

# Apply UDF to text column in DataFrame

pqms = pqms.withColumn("adjective_noun_key_phrases", extract_adjective_noun_key_phrases_udf(col("cleaned_text")))

# Print resulting DataFrame

display(pqms)

The expected output here to extract the phrases and create a new column for the same in spark data frame. Any help or suggestion on this will be a great help.

Thanks,

7 REPLIES 7

LandanG
Honored Contributor
Honored Contributor

Hi @Aditya Singh​ ,

What cluster node types and DBR version are you using? Also are you installing spacy manually? Usually, the ModuleNotFoundError indicates that the library you are importing has not been installed or installed correctly. You could try on DBR 11.3 LTS ML that comes pre-installed with spacy

Vicky1215
New Contributor II

Hi LandanG, Thanks for your quick response. I am using DBR 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), not sure what cluster node types means and I am trying to install spacy manually using- import sys

!{sys.executable} -m pip install spacy

Is there any other way we can install spacy as I don't have access to install libraries directly to clusters from pypi or maven repository?

Thanks,

LandanG
Honored Contributor
Honored Contributor

@Aditya Singh​ 

Could you try installing it like

%pip install spacy

instead? This will be a notebook-scoped library and you can run it in a notebook cell. Hopefully this works.

Thanks,

Vicky1215
New Contributor II

Thanks for your suggestion LandanG. Now, I am able to install notebook-scoped spacy library and could see when i run %pip freeze. However, when I am importing it - import spacy

Its throwing new error now- ModuleNotFoundError: No module named 'spacy'.

sher
Valued Contributor II

@Aditya Singh​ 

goto compute click the cluster you needed click the Libraries tab and select PyPI.

Enter a PyPI package name. To install a specific version of a library use this format for the library: 

<library>==<version> For example,  spacy==3.4.4.

Aviral-Bhardwaj
Esteemed Contributor III

only init script will work here

Kaniz
Community Manager
Community Manager

Hi @Aditya Singh​(Customer)​ , We haven’t heard from you on the last response from @Aviral Bhardwaj​ ​ and @sherbin w​​, and I was checking back to see if their suggestions helped you.

Or else, If you have any solution, please do share that with the community as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!