cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Solution for - "PythonException: 'ModuleNotFoundError: No module named 'spacy'

Vicky1215
New Contributor II

I am actually trying to extract the adjective and noun phrases from the text column in spark data frame for which I've written the udf and applying on cleaned text column. However, I am getting this error.

from pyspark.sql.functions import udf

from pyspark.sql.types import ArrayType, StringType

import spacy

# Load spacy model

nlp = spacy.load("en_core_web_sm")

# Define UDF to extract key phrases

def extract_adjective_noun_key_phrases(text):

  doc = nlp(text)

  key_phrases = []

  for token in doc:

    if (token.pos_ == "ADJ" and token.nbor().pos_ == "NOUN") or (token.pos_ == "NOUN" and token.nbor().pos_ == "ADJ"):

      key_phrases.append(token.text + " " + token.nbor().text)

  return key_phrases

extract_adjective_noun_key_phrases_udf = udf(extract_adjective_noun_key_phrases, ArrayType(StringType()))

# Apply UDF to text column in DataFrame

pqms = pqms.withColumn("adjective_noun_key_phrases", extract_adjective_noun_key_phrases_udf(col("cleaned_text")))

# Print resulting DataFrame

display(pqms)

The expected output here to extract the phrases and create a new column for the same in spark data frame. Any help or suggestion on this will be a great help.

Thanks,

6 REPLIES 6

LandanG
Databricks Employee
Databricks Employee

Hi @Aditya Singh​ ,

What cluster node types and DBR version are you using? Also are you installing spacy manually? Usually, the ModuleNotFoundError indicates that the library you are importing has not been installed or installed correctly. You could try on DBR 11.3 LTS ML that comes pre-installed with spacy

Vicky1215
New Contributor II

Hi LandanG, Thanks for your quick response. I am using DBR 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), not sure what cluster node types means and I am trying to install spacy manually using- import sys

!{sys.executable} -m pip install spacy

Is there any other way we can install spacy as I don't have access to install libraries directly to clusters from pypi or maven repository?

Thanks,

LandanG
Databricks Employee
Databricks Employee

@Aditya Singh​ 

Could you try installing it like

%pip install spacy

instead? This will be a notebook-scoped library and you can run it in a notebook cell. Hopefully this works.

Thanks,

Vicky1215
New Contributor II

Thanks for your suggestion LandanG. Now, I am able to install notebook-scoped spacy library and could see when i run %pip freeze. However, when I am importing it - import spacy

Its throwing new error now- ModuleNotFoundError: No module named 'spacy'.

sher
Valued Contributor II

@Aditya Singh​ 

goto compute click the cluster you needed click the Libraries tab and select PyPI.

Enter a PyPI package name. To install a specific version of a library use this format for the library: 

<library>==<version> For example,  spacy==3.4.4.

Aviral-Bhardwaj
Esteemed Contributor III

only init script will work here

AviralBhardwaj

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group